From owner-freebsd-stable@freebsd.org Sun Sep 18 16:05:53 2016 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D01B5BDFD11 for ; Sun, 18 Sep 2016 16:05:53 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-oi0-x232.google.com (mail-oi0-x232.google.com [IPv6:2607:f8b0:4003:c06::232]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 966BB1D70 for ; Sun, 18 Sep 2016 16:05:53 +0000 (UTC) (envelope-from asomers@gmail.com) Received: by mail-oi0-x232.google.com with SMTP id a62so31778733oib.1 for ; Sun, 18 Sep 2016 09:05:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=ONKRZH4OBX8uXIkOcBIfYmMuUV3qSfvOJc2xS/3ALzk=; b=zqFWTzBbepkbrS/DlmqSo5+yQ6mhmJJm4ZqYrYJmasVaW3eZMmvcs/Y8KkeRWsocBp 7iax7o+fPH+LVdRnB3mAGb9OFJV1OOWgIhWWrmoPmjv73V3SGFYwJuy3Hg7zw0dlO/Hl i6sSEM3y8GfELlEPyaNKqEWpYe5AEhaj3YbSVv8nhPf9CcBOYYL7u1hw8t2xRR+3o0am EOqKbaU+hGftiifZ+vd5k9kNwiEr855v8M3VVRFR/aMG+arv+IwoVvASUNetrsksu34t QsE+V3k9ou9iOOlfYelgFZrH+OgV1XO8+DNPdhkX/7XJvWsTju5XFLG2QHT5bQJihgvA KOWQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=ONKRZH4OBX8uXIkOcBIfYmMuUV3qSfvOJc2xS/3ALzk=; b=epr09876HPXAwot5vy6WYZEX/T5Qlo8Qf/nRkGSOxN2lpUiKIpwmZgY1hQqEEbfrHS cZ5ueS6OlbqABpkyppXQeAgF2HD5tkFi5wPbEmev5gvwRFXYNmSxB1daPFGMN/CaOOzj zORB3+wvyUlOoWcsqomJqN+uWdhk5dafmdAqkD1bVDky0CoNMu14eeEJmyWcA6yo1IGb AliOLcDrgfrXDXaAGztYa2JhfcJGSa3e2IzEOEb5/TKZxOVMV0rXSiiEQg2egATef8zY Kugto8/r/omvBiY0/2d3FxjYtyPZ86QI2LhJYhJKtX5wYMy4K9l89oERs5nNvLUucwU5 797w== X-Gm-Message-State: AE9vXwMTOeRQHqdxgjX1tW3oJcP1DELGXfbvmS+BbX0QJj/XAcCodkt4E1RlD+KGFM54P6AywP1c22xIIcdBMg== X-Received: by 10.202.97.2 with SMTP id v2mr25460213oib.157.1474214752854; Sun, 18 Sep 2016 09:05:52 -0700 (PDT) MIME-Version: 1.0 Sender: asomers@gmail.com Received: by 10.202.71.11 with HTTP; Sun, 18 Sep 2016 09:05:52 -0700 (PDT) In-Reply-To: <20160918150917.09f9448464d84d4e50808707@gmail.com> References: <20160918150917.09f9448464d84d4e50808707@gmail.com> From: Alan Somers Date: Sun, 18 Sep 2016 10:05:52 -0600 X-Google-Sender-Auth: CFF30-DRyyLLtwmZYEa6_hfAxlM Message-ID: Subject: Re: zfs resilver keeps restarting To: Marc UBM Bocklet Cc: freebsd-stable Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 18 Sep 2016 16:05:53 -0000 On Sun, Sep 18, 2016 at 7:09 AM, Marc UBM Bocklet via freebsd-stable wrote: > > Hi all, > > due to two bad cables, I had two drives drop from my striped raidz2 > pool (built on top of geli encrypted drives). I replaced one of the > drives before I realized that the cabling was at fault - that's the > drive which is being replaced in the ouput of zpool status below. > > I have just installed the new cables and all sata errors are gone. > However, the resilver of the pool keeps restarting. > > I see no errors in /var/log/messages, but zpool history -i says: > > 2016-09-18.14:56:21 [txg:1219501] scan setup func=2 mintxg=3 > maxtxg=1219391 2016-09-18.14:56:51 [txg:1219505] scan done complete=0 > 2016-09-18.14:56:51 [txg:1219505] scan setup func=2 mintxg=3 > maxtxg=1219391 2016-09-18.14:57:20 [txg:1219509] scan done complete=0 > 2016-09-18.14:57:20 [txg:1219509] scan setup func=2 mintxg=3 > maxtxg=1219391 2016-09-18.14:57:49 [txg:1219513] scan done complete=0 > 2016-09-18.14:57:49 [txg:1219513] scan setup func=2 mintxg=3 > maxtxg=1219391 2016-09-18.14:58:19 [txg:1219517] scan done complete=0 > 2016-09-18.14:58:19 [txg:1219517] scan setup func=2 mintxg=3 > maxtxg=1219391 2016-09-18.14:58:45 [txg:1219521] scan done complete=0 > 2016-09-18.14:58:45 [txg:1219521] scan setup func=2 mintxg=3 > maxtxg=1219391 > > I assume that "scan done complete=0" means that the resilver didn't > finish? > > pool layout is the following: > > pool: pool > state: DEGRADED > status: One or more devices is currently being resilvered. The pool > will continue to function, possibly in a degraded state. > action: Wait for the resilver to complete. > scan: resilver in progress since Sun Sep 18 14:51:39 2016 > 235G scanned out of 9.81T at 830M/s, 3h21m to go > 13.2M resilvered, 2.34% done > config: > > NAME STATE READ WRITE CKSUM > pool DEGRADED 0 0 0 > raidz2-0 ONLINE 0 0 0 > da6.eli ONLINE 0 0 0 > da7.eli ONLINE 0 0 0 > ada1.eli ONLINE 0 0 0 > ada2.eli ONLINE 0 0 0 > da10.eli ONLINE 0 0 2 > da11.eli ONLINE 0 0 0 > da12.eli ONLINE 0 0 0 > da13.eli ONLINE 0 0 0 > raidz2-1 DEGRADED 0 0 0 > da0.eli ONLINE 0 0 0 > da1.eli ONLINE 0 0 0 > da2.eli ONLINE 0 0 1 > (resilvering) > replacing-3 DEGRADED 0 0 1 > 10699825708166646100 UNAVAIL 0 0 0 > was /dev/da3.eli da4.eli ONLINE 0 0 0 > (resilvering) > da3.eli ONLINE 0 0 0 > da5.eli ONLINE 0 0 0 > da8.eli ONLINE 0 0 0 > da9.eli ONLINE 0 0 0 > > errors: No known data errors > > system is > FreeBSD xxx 10.1-BETA1 FreeBSD 10.1-BETA1 #27 r271633: > Mon Sep 15 22:34:05 CEST 2014 > root@xxx:/usr/obj/usr/src/sys/xxx amd64 > > controller is > SAS2116 PCI-Express Fusion-MPT SAS-2 [Meteor] > > Drives are connected via four four-port sata cables. > > Should I upgrade to 10.3-release or did I make some sort of > configuration error / overlook something? > > Thanks in advance! > > Cheers, > Marc Resilver will start over anytime there's new damage. In your case, with two failed drives, resilver should've begun after you replaced the first drive, and restarted after you replaced the second. Have you seen it restart more than that? If so, keep an eye on the error counters in "zpool status"; they might give you a clue. You could also raise the loglevel of devd to "info" in /etc/syslog.conf and see what gets logged to /etc/devd.log. That will tell you if drives a dropping out and automatically rejoining the pool, for example. -Alan