Date: Fri, 30 Oct 2015 13:36:14 +0300 From: Dmitry Marakasov <amdmi3@amdmi3.ru> To: freebsd-fs@freebsd.org Subject: ZFS resilver from disk with bad sectors constantly restarts Message-ID: <20151030103614.GL57666@hades.panopticon>
next in thread | raw e-mail | index | archive | help
Hi!
I've just got a case where resilvering a new replacement disk in raidz2
never finished.
The problem: one disk in raidz is failing by having a large number of
unreadable sectors. It's replaced with a spare. Resilver though is
constantly restarted with log full of read error from bad disk. 
It looks like this:
---
  pool: spool
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Wed Oct 28 05:26:28 2015
        369G scanned out of 9,87T at 123M/s, 22h29m to go
        41,4G resilvered, 3,65% done
config:
	NAME                  STATE     READ WRITE CKSUM
	spool                 ONLINE       0     0     0
	  raidz1-0            ONLINE       0     0     0
	    ada0              ONLINE       0     0     0
	    ada1              ONLINE       0     0     0
	    spare-2           ONLINE       0     0   733
	      ada11           ONLINE       0     0     0
	      ada2            ONLINE       0     0     0  (resilvering)
	  raidz1-1            ONLINE       0     0     0
	    ada3              ONLINE       0     0     0
	    ada4              ONLINE       0     0     0
	    ada5              ONLINE       0     0     0
	  raidz1-2            ONLINE       0     0     0
	    ada6              ONLINE       0     0     0
	    ada7              ONLINE       0     0     0
	    ada10             ONLINE       0     0     0
	spares
	  588540573008830286  INUSE     was /dev/ada2
errors: No known data errors
---
`resilver in progress since' date is constantly reset, so resilved
progress cannot pass beyond 5% or so. My guess is that it happens on
read errors on ada11. I think I've seen (resilvering) on ada11 line
couple of times.
In the end I've had to offline ada11 and after that resilver completed
in under 16 hours. However the situation doesn't seem normal, as I'd
prefer to not lose redundancy with offlining dying disk and still be
able to use it for resilvering (imagine there were bad sectors on ada0/1
as well, but not intersecting with bad sectors on ada11), or at least
some more verbose indication of why the resilver is constantly restarted.
I should also note that's outdated FreeBSD 9.1, so maybe that problem
was fixed already.
-- 
Dmitry Marakasov   .   55B5 0596 FF1E 8D84 5F56  9510 D35A 80DD F9D2 F77D
amdmi3@amdmi3.ru  ..:  jabber: amdmi3@jabber.ru      http://amdmi3.ru
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20151030103614.GL57666>
