From owner-freebsd-stable@freebsd.org Sun Sep 18 17:41:56 2016 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 54830BE0813 for ; Sun, 18 Sep 2016 17:41:56 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-oi0-x22b.google.com (mail-oi0-x22b.google.com [IPv6:2607:f8b0:4003:c06::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 19CB3AB2 for ; Sun, 18 Sep 2016 17:41:56 +0000 (UTC) (envelope-from asomers@gmail.com) Received: by mail-oi0-x22b.google.com with SMTP id a62so34146061oib.1 for ; Sun, 18 Sep 2016 10:41:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=9ClkJI01npIqeY+DQUHww7kdpyrllnoJpMg9aqXpA9I=; b=TKn0iCVi2+7C4uH5qo/JBYNtu+Upg4DEpSekJwAPw9b/84RL0o7LZ4TTcWynzVixsr Gfyvf57cSnD7ZKpZzUManzD/i2UpTXogpCaYLX9AaAMqH0ACWmGlF01YEPP2r6AoTXIR N9v3y4uaFOe5I+lVJ8PYsDq25aAz0pTuS2PLmpzbNt28fCVcWayLg/ow6U9GRwyWPP1D NBE660J4jDue2FRfMfLMJjnkwwO5Zzj/FxeyAxuP2Rk5tLWMZDkPwiAWjek3stBWZcks y+yrnRkAEOIyHQGth+DdwLSfiwBoDlii/SzhAqF7TIb9PO6r3fzcGozd465Q0tekHKoV yPpA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=9ClkJI01npIqeY+DQUHww7kdpyrllnoJpMg9aqXpA9I=; b=K+YCEEd0qScv940hA4XXfs2h7Il/Jl8Rnw6WXkKqkuGeLjg5QBkQgakMdeFwNLCLEs rCWMSCuskybIS1pisSESds18kDEAUPofFrCcoJ6L73Kq8ozW3HcAqhh18vE5Ry+GkQz+ e91HXUDFh7u2t/0jFtAsuH8e3tuERRtKVPBaUHBLLXgQlZ4dZNbawqkveGtL1z67sabr bgNRZHKsyHtWXfYxkVQbFPVPVBudhrpKfNn1PyeGISRU1Kd3aJw1zq+/GMtg3ChCosYx +UrylEncYgFoSmDe8Z0Tf7B4dYkrustSwAlONYNK7fRAP7z1EWfqIP7qd7arziODZc/9 znUA== X-Gm-Message-State: AE9vXwPfR+fd6Je0HOsNFZqLIiKy9HX/5AW8LkKiQOAkC8fi13M9PQ0s3WyjEwNfuKblPsB1DgIECGhpfPVFoA== X-Received: by 10.202.104.224 with SMTP id o93mr4499951oik.82.1474220515358; Sun, 18 Sep 2016 10:41:55 -0700 (PDT) MIME-Version: 1.0 Sender: asomers@gmail.com Received: by 10.202.71.11 with HTTP; Sun, 18 Sep 2016 10:41:54 -0700 (PDT) In-Reply-To: <20160918184636.5861562661d4376e845ac75d@gmail.com> References: <20160918150917.09f9448464d84d4e50808707@gmail.com> <20160918184636.5861562661d4376e845ac75d@gmail.com> From: Alan Somers Date: Sun, 18 Sep 2016 11:41:54 -0600 X-Google-Sender-Auth: VLPVdYzJTlnKKkYEukD1lcziLsM Message-ID: Subject: Re: zfs resilver keeps restarting To: Marc UBM Bocklet Cc: freebsd-stable Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 18 Sep 2016 17:41:56 -0000 On Sun, Sep 18, 2016 at 10:46 AM, Marc UBM Bocklet via freebsd-stable wrote: > On Sun, 18 Sep 2016 10:05:52 -0600 > Alan Somers wrote: > >> On Sun, Sep 18, 2016 at 7:09 AM, Marc UBM Bocklet via freebsd-stable >> wrote: >> > >> > Hi all, >> > >> > due to two bad cables, I had two drives drop from my striped raidz2 >> > pool (built on top of geli encrypted drives). I replaced one of the >> > drives before I realized that the cabling was at fault - that's the >> > drive which is being replaced in the ouput of zpool status below. >> > >> > I have just installed the new cables and all sata errors are gone. >> > However, the resilver of the pool keeps restarting. >> > >> > I see no errors in /var/log/messages, but zpool history -i says: >> > >> > 2016-09-18.14:56:21 [txg:1219501] scan setup func=2 mintxg=3 >> > maxtxg=1219391 2016-09-18.14:56:51 [txg:1219505] scan done complete=0 >> > 2016-09-18.14:56:51 [txg:1219505] scan setup func=2 mintxg=3 >> > maxtxg=1219391 2016-09-18.14:57:20 [txg:1219509] scan done complete=0 >> > 2016-09-18.14:57:20 [txg:1219509] scan setup func=2 mintxg=3 >> > maxtxg=1219391 2016-09-18.14:57:49 [txg:1219513] scan done complete=0 >> > 2016-09-18.14:57:49 [txg:1219513] scan setup func=2 mintxg=3 >> > maxtxg=1219391 2016-09-18.14:58:19 [txg:1219517] scan done complete=0 >> > 2016-09-18.14:58:19 [txg:1219517] scan setup func=2 mintxg=3 >> > maxtxg=1219391 2016-09-18.14:58:45 [txg:1219521] scan done complete=0 >> > 2016-09-18.14:58:45 [txg:1219521] scan setup func=2 mintxg=3 >> > maxtxg=1219391 >> > >> > I assume that "scan done complete=0" means that the resilver didn't >> > finish? >> > >> > pool layout is the following: >> > >> > pool: pool >> > state: DEGRADED >> > status: One or more devices is currently being resilvered. The pool >> > will continue to function, possibly in a degraded state. >> > action: Wait for the resilver to complete. >> > scan: resilver in progress since Sun Sep 18 14:51:39 2016 >> > 235G scanned out of 9.81T at 830M/s, 3h21m to go >> > 13.2M resilvered, 2.34% done >> > config: >> > >> > NAME STATE READ WRITE CKSUM >> > pool DEGRADED 0 0 0 >> > raidz2-0 ONLINE 0 0 0 >> > da6.eli ONLINE 0 0 0 >> > da7.eli ONLINE 0 0 0 >> > ada1.eli ONLINE 0 0 0 >> > ada2.eli ONLINE 0 0 0 >> > da10.eli ONLINE 0 0 2 >> > da11.eli ONLINE 0 0 0 >> > da12.eli ONLINE 0 0 0 >> > da13.eli ONLINE 0 0 0 >> > raidz2-1 DEGRADED 0 0 0 >> > da0.eli ONLINE 0 0 0 >> > da1.eli ONLINE 0 0 0 >> > da2.eli ONLINE 0 0 1 >> > (resilvering) >> > replacing-3 DEGRADED 0 0 1 >> > 10699825708166646100 UNAVAIL 0 0 0 >> > was /dev/da3.eli da4.eli ONLINE 0 0 0 >> > (resilvering) >> > da3.eli ONLINE 0 0 0 >> > da5.eli ONLINE 0 0 0 >> > da8.eli ONLINE 0 0 0 >> > da9.eli ONLINE 0 0 0 >> > >> > errors: No known data errors >> > >> > system is >> > FreeBSD xxx 10.1-BETA1 FreeBSD 10.1-BETA1 #27 r271633: >> > Mon Sep 15 22:34:05 CEST 2014 >> > root@xxx:/usr/obj/usr/src/sys/xxx amd64 >> > >> > controller is >> > SAS2116 PCI-Express Fusion-MPT SAS-2 [Meteor] >> > >> > Drives are connected via four four-port sata cables. >> > >> > Should I upgrade to 10.3-release or did I make some sort of >> > configuration error / overlook something? >> > >> > Thanks in advance! >> > >> > Cheers, >> > Marc >> >> Resilver will start over anytime there's new damage. In your case, >> with two failed drives, resilver should've begun after you replaced >> the first drive, and restarted after you replaced the second. Have >> you seen it restart more than that? If so, keep an eye on the error >> counters in "zpool status"; they might give you a clue. You could >> also raise the loglevel of devd to "info" in /etc/syslog.conf and see >> what gets logged to /etc/devd.log. That will tell you if drives a >> dropping out and automatically rejoining the pool, for example. > > Thanks a lot for your fast reply, unfortunately (or not), devd is silent > and the error count for the pool remains at zero. The resilver, however, > just keeps restarting. The furthest it got was about 68% resilvered. > Usually, it gets to 2 - 3%, then restarts. > > I plan on offlining the pool, upgrading to 10.3, and then reimporting > the pool next. Does that make sense? > > Cheers, > Marc I suspect an upgrade won't make a difference, but it certainly won't hurt. Did you remember to change devd's loglevel to "info" and restart syslogd?