From owner-freebsd-fs@freebsd.org  Fri Oct 30 10:38:42 2015
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id E0AECA20F52
 for <freebsd-fs@mailman.ysv.freebsd.org>; Fri, 30 Oct 2015 10:38:42 +0000 (UTC)
 (envelope-from amdmi3@amdmi3.ru)
Received: from vps.amdmi3.ru (vps.amdmi3.ru [109.234.38.216])
 by mx1.freebsd.org (Postfix) with ESMTP id AB9B2132A
 for <freebsd-fs@freebsd.org>; Fri, 30 Oct 2015 10:38:42 +0000 (UTC)
 (envelope-from amdmi3@amdmi3.ru)
Received: from hive.panopticon (unknown [78.153.152.119])
 by vps.amdmi3.ru (Postfix) with ESMTPS id 2B4E9B0616
 for <freebsd-fs@freebsd.org>; Fri, 30 Oct 2015 13:38:35 +0300 (MSK)
Received: from hades.panopticon (hades.panopticon [192.168.0.32])
 by hive.panopticon (Postfix) with ESMTP id D6A3CB11
 for <freebsd-fs@freebsd.org>; Fri, 30 Oct 2015 13:34:29 +0300 (MSK)
Received: by hades.panopticon (Postfix, from userid 1000)
 id DA8966E; Fri, 30 Oct 2015 13:36:14 +0300 (MSK)
Date: Fri, 30 Oct 2015 13:36:14 +0300
From: Dmitry Marakasov <amdmi3@amdmi3.ru>
To: freebsd-fs@freebsd.org
Subject: ZFS resilver from disk with bad sectors constantly restarts
Message-ID: <20151030103614.GL57666@hades.panopticon>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
User-Agent: Mutt/1.5.24 (2015-08-30)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 30 Oct 2015 10:38:43 -0000

Hi!

I've just got a case where resilvering a new replacement disk in raidz2
never finished.

The problem: one disk in raidz is failing by having a large number of
unreadable sectors. It's replaced with a spare. Resilver though is
constantly restarted with log full of read error from bad disk. 

It looks like this:

---
  pool: spool
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Wed Oct 28 05:26:28 2015
        369G scanned out of 9,87T at 123M/s, 22h29m to go
        41,4G resilvered, 3,65% done
config:

	NAME                  STATE     READ WRITE CKSUM
	spool                 ONLINE       0     0     0
	  raidz1-0            ONLINE       0     0     0
	    ada0              ONLINE       0     0     0
	    ada1              ONLINE       0     0     0
	    spare-2           ONLINE       0     0   733
	      ada11           ONLINE       0     0     0
	      ada2            ONLINE       0     0     0  (resilvering)
	  raidz1-1            ONLINE       0     0     0
	    ada3              ONLINE       0     0     0
	    ada4              ONLINE       0     0     0
	    ada5              ONLINE       0     0     0
	  raidz1-2            ONLINE       0     0     0
	    ada6              ONLINE       0     0     0
	    ada7              ONLINE       0     0     0
	    ada10             ONLINE       0     0     0
	spares
	  588540573008830286  INUSE     was /dev/ada2

errors: No known data errors
---

`resilver in progress since' date is constantly reset, so resilved
progress cannot pass beyond 5% or so. My guess is that it happens on
read errors on ada11. I think I've seen (resilvering) on ada11 line
couple of times.

In the end I've had to offline ada11 and after that resilver completed
in under 16 hours. However the situation doesn't seem normal, as I'd
prefer to not lose redundancy with offlining dying disk and still be
able to use it for resilvering (imagine there were bad sectors on ada0/1
as well, but not intersecting with bad sectors on ada11), or at least
some more verbose indication of why the resilver is constantly restarted.

I should also note that's outdated FreeBSD 9.1, so maybe that problem
was fixed already.

-- 
Dmitry Marakasov   .   55B5 0596 FF1E 8D84 5F56  9510 D35A 80DD F9D2 F77D
amdmi3@amdmi3.ru  ..:  jabber: amdmi3@jabber.ru      http://amdmi3.ru