From owner-freebsd-stable@freebsd.org  Sun Sep 18 16:46:40 2016
Return-Path: <owner-freebsd-stable@freebsd.org>
Delivered-To: freebsd-stable@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8E12CBDF7D7
 for <freebsd-stable@mailman.ysv.freebsd.org>;
 Sun, 18 Sep 2016 16:46:40 +0000 (UTC)
 (envelope-from ubm.freebsd@googlemail.com)
Received: from mail-wm0-x230.google.com (mail-wm0-x230.google.com
 [IPv6:2a00:1450:400c:c09::230])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 2C5AB13AB
 for <freebsd-stable@freebsd.org>; Sun, 18 Sep 2016 16:46:40 +0000 (UTC)
 (envelope-from ubm.freebsd@googlemail.com)
Received: by mail-wm0-x230.google.com with SMTP id l68so21615428wml.1
 for <freebsd-stable@freebsd.org>; Sun, 18 Sep 2016 09:46:40 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=googlemail.com; s=20120113;
 h=from:date:to:subject:message-id:in-reply-to:references:mime-version
 :content-transfer-encoding;
 bh=Wc80ysSA0OXp4VIHJYv9VkxrXzj4P3RMdyu9ULSCDfU=;
 b=YIYweVefFw/ZMXnLsUCV8P++aQMZ2FzIccVlR9txQ0gJXa+q2oQ0CZXMtE2Moxl7qC
 LwRU9bDaflKQuT5DNqtmTG2U1iEgzYJuwKOAzld9OC9e7dxYYYD9smFroytWeWIiluWm
 nIWPI25McYk5fOjFcXEAisrCEf9Gdn/StZ1P+epy+486sSre5vO/XmRpI2MQM+iHUCN6
 RKB5tp1I46RMLxwIfUSn9tTKd25YV8JW97CGijmrLWcUqBoTUSwDLTXgBMSJQ4qERAv0
 AA/id4x70Ks58kCzyhOt5WyBtX2fJZA5/fiIPrdOx4hCAsiGMHXuBp/ipRXwmAtw8hGQ
 4avw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:from:date:to:subject:message-id:in-reply-to
 :references:mime-version:content-transfer-encoding;
 bh=Wc80ysSA0OXp4VIHJYv9VkxrXzj4P3RMdyu9ULSCDfU=;
 b=BoyHpNS4PWz1EzuEQzf0PfTjJY0YKQeU/8k7OZGj2u82rqowfIG6GRZfDJmd5SFoaq
 Z+JkY/VSt04HfD15VQStecK6QRJseFs4xP5gmmfUdV16eq2YalilSQrrpqaFHA08HyMn
 fZX3Ho0GMZy1JcBiZS2O7CT83v19+juAdNIfp3uQEW4NzW/F2fSofhUKyZuTSn2xHB6/
 boRI2nyVTg7CBVk9uFM4x4ZaLIOc5lunq7fpVdSjSiive7Z0jNhyQCum4NqfdVNENv3s
 Sn7nDhDoZvt5+R/sOFMdQtH/7srMM/T12kOBQc2AZ38Bo6jGcbayVqYYd3O1+WyCGqlN
 H9nw==
X-Gm-Message-State: AE9vXwMNYkSl+NeNCTYcggwhEHVD735lfIx+FBrQFPYS2DF6wjiW/EUHeFI36zrRziLcLg==
X-Received: by 10.194.135.76 with SMTP id pq12mr19307645wjb.114.1474217198368; 
 Sun, 18 Sep 2016 09:46:38 -0700 (PDT)
Received: from ubm.strangled.net (ipb21a85d1.dynamic.kabel-deutschland.de.
 [178.26.133.209])
 by smtp.gmail.com with ESMTPSA id r2sm11005151wmf.14.2016.09.18.09.46.37
 for <freebsd-stable@freebsd.org>
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Sun, 18 Sep 2016 09:46:37 -0700 (PDT)
From: Marc UBM Bocklet <ubm.freebsd@googlemail.com>
X-Google-Original-From: Marc "UBM" Bocklet <ubm.freebsd@gmail.com>
Date: Sun, 18 Sep 2016 18:46:36 +0200
To: freebsd-stable <freebsd-stable@freebsd.org>
Subject: Re: zfs resilver keeps restarting
Message-Id: <20160918184636.5861562661d4376e845ac75d@gmail.com>
In-Reply-To: <CAOtMX2g97kkTKs9jZhjeu-nnc4jLi_=YtacPEpAvSj1SnuTjJg@mail.gmail.com>
References: <20160918150917.09f9448464d84d4e50808707@gmail.com>
 <CAOtMX2g97kkTKs9jZhjeu-nnc4jLi_=YtacPEpAvSj1SnuTjJg@mail.gmail.com>
X-Mailer: Sylpheed 3.5.1 (GTK+ 2.24.29; amd64-portbld-freebsd11.0)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-stable>, 
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable/>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 18 Sep 2016 16:46:40 -0000

On Sun, 18 Sep 2016 10:05:52 -0600
Alan Somers <asomers@freebsd.org> wrote:

> On Sun, Sep 18, 2016 at 7:09 AM, Marc UBM Bocklet via freebsd-stable
> <freebsd-stable@freebsd.org> wrote:
> >
> > Hi all,
> >
> > due to two bad cables, I had two drives drop from my striped raidz2
> > pool (built on top of geli encrypted drives). I replaced one of the
> > drives before I realized that the cabling was at fault - that's the
> > drive which is being replaced in the ouput of zpool status below.
> >
> > I have just installed the new cables and all sata errors are gone.
> > However, the resilver of the pool keeps restarting.
> >
> > I see no errors in /var/log/messages, but zpool history -i says:
> >
> > 2016-09-18.14:56:21 [txg:1219501] scan setup func=2 mintxg=3
> > maxtxg=1219391 2016-09-18.14:56:51 [txg:1219505] scan done complete=0
> > 2016-09-18.14:56:51 [txg:1219505] scan setup func=2 mintxg=3
> > maxtxg=1219391 2016-09-18.14:57:20 [txg:1219509] scan done complete=0
> > 2016-09-18.14:57:20 [txg:1219509] scan setup func=2 mintxg=3
> > maxtxg=1219391 2016-09-18.14:57:49 [txg:1219513] scan done complete=0
> > 2016-09-18.14:57:49 [txg:1219513] scan setup func=2 mintxg=3
> > maxtxg=1219391 2016-09-18.14:58:19 [txg:1219517] scan done complete=0
> > 2016-09-18.14:58:19 [txg:1219517] scan setup func=2 mintxg=3
> > maxtxg=1219391 2016-09-18.14:58:45 [txg:1219521] scan done complete=0
> > 2016-09-18.14:58:45 [txg:1219521] scan setup func=2 mintxg=3
> > maxtxg=1219391
> >
> > I assume that "scan done complete=0" means that the resilver didn't
> > finish?
> >
> > pool layout is the following:
> >
> >  pool: pool
> >  state: DEGRADED
> > status: One or more devices is currently being resilvered.  The pool
> > will continue to function, possibly in a degraded state.
> > action: Wait for the resilver to complete.
> >   scan: resilver in progress since Sun Sep 18 14:51:39 2016
> >         235G scanned out of 9.81T at 830M/s, 3h21m to go
> >         13.2M resilvered, 2.34% done
> > config:
> >
> >         NAME                        STATE     READ WRITE CKSUM
> >         pool                        DEGRADED     0     0     0
> >           raidz2-0                  ONLINE       0     0     0
> >             da6.eli                 ONLINE       0     0     0
> >             da7.eli                 ONLINE       0     0     0
> >             ada1.eli                ONLINE       0     0     0
> >             ada2.eli                ONLINE       0     0     0
> >             da10.eli                ONLINE       0     0     2
> >             da11.eli                ONLINE       0     0     0
> >             da12.eli                ONLINE       0     0     0
> >             da13.eli                ONLINE       0     0     0
> >           raidz2-1                  DEGRADED     0     0     0
> >             da0.eli                 ONLINE       0     0     0
> >             da1.eli                 ONLINE       0     0     0
> >             da2.eli                 ONLINE       0     0     1
> > (resilvering)
> >             replacing-3             DEGRADED     0     0     1
> >               10699825708166646100  UNAVAIL      0     0     0
> > was /dev/da3.eli da4.eli            ONLINE       0     0     0
> > (resilvering)
> >             da3.eli                 ONLINE       0     0     0
> >             da5.eli                 ONLINE       0     0     0
> >             da8.eli                 ONLINE       0     0     0
> >             da9.eli                 ONLINE       0     0     0
> >
> > errors: No known data errors
> >
> > system is
> > FreeBSD xxx 10.1-BETA1 FreeBSD 10.1-BETA1 #27 r271633:
> > Mon Sep 15 22:34:05 CEST 2014
> > root@xxx:/usr/obj/usr/src/sys/xxx  amd64
> >
> > controller is
> > SAS2116 PCI-Express Fusion-MPT SAS-2 [Meteor]
> >
> > Drives are connected via four four-port sata cables.
> >
> > Should I upgrade to 10.3-release or did I make some sort of
> > configuration error / overlook something?
> >
> > Thanks in advance!
> >
> > Cheers,
> > Marc
> 
> Resilver will start over anytime there's new damage.  In your case,
> with two failed drives, resilver should've begun after you replaced
> the first drive, and restarted after you replaced the second.  Have
> you seen it restart more than that?  If so, keep an eye on the error
> counters in "zpool status"; they might give you a clue.  You could
> also raise the loglevel of devd to "info" in /etc/syslog.conf and see
> what gets logged to /etc/devd.log.  That will tell you if drives a
> dropping out and automatically rejoining the pool, for example.

Thanks a lot for your fast reply, unfortunately (or not), devd is silent
and the error count for the pool remains at zero. The resilver, however,
just keeps restarting. The furthest it got was about 68% resilvered.
Usually, it gets to 2 - 3%, then restarts. 

I plan on offlining the pool, upgrading to 10.3, and then reimporting
the pool next. Does that make sense?

Cheers,
Marc

-- 
Marc "UBM" Bocklet <eternal.ubm@gmail.com>