Date: Wed, 27 May 2009 19:53:42 +0400 From: Dmitry Marakasov <amdmi3@amdmi3.ru> To: freebsd-fs@freebsd.org Subject: ZFS scrub/selfheal not really working Message-ID: <20090527155342.GA45258@hades.panopticon>
next in thread | raw e-mail | index | archive | help
Hi! I've recently moved my ZFS pool to 6x1TB hitachi HDDs. However, those turned out to be quite crappy, and tend to grow unreadable sectors. Those sectors are really nasty, cause though they are not readable, they won't be marked as bad and relocated until there's write failure. And write failure actually never happens - if the sector is rewritten it's pervectly readable again. I've tried to heal those with zpool scrub, but it does not seem to work. --- scrub 1 pool: pool state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub completed after 3h21m with 0 errors on Wed May 27 11:27:33 2009 config: NAME STATE READ WRITE CKSUM pool ONLINE 0 0 0 raidz2 ONLINE 0 0 0 ad10 ONLINE 0 0 0 ad12 ONLINE 14 0 0 13K repaired ad14 ONLINE 5 0 0 6K repaired ad16 ONLINE 0 0 0 ad18 ONLINE 35 0 0 26K repaired ad20 ONLINE 0 0 0 errors: No known data errors --- /scrub 1 --- scrub 2 pool: pool state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub completed after 3h19m with 0 errors on Wed May 27 19:19:10 2009 config: NAME STATE READ WRITE CKSUM pool ONLINE 0 0 0 raidz2 ONLINE 0 0 0 ad10 ONLINE 0 0 0 ad12 ONLINE 16 0 0 16K repaired ad14 ONLINE 5 0 0 4,50K repaired ad16 ONLINE 0 0 0 ad18 ONLINE 25 0 0 16,5K repaired ad20 ONLINE 0 0 0 errors: No known data errors --- /scrub 2 As you can see, after scrub the sectors are still there. If I run my own utility that searches for unreadable sectors and rewrite them with zeroes, READ errors will go away, obviously there will be some CKSUM errors and I assume after that the data is recovered and safe. So, my question is why doesn't ZFS rewrite those sectors with READ errors during scrub? My only guess is that it may read data from disk in a large chunks, and nasty sectors are located where no data is actually stored, however reading whole chunk fails. Data is then recovered from other drives and written over (however this is noop as the data was intact), but bad sector is not overwritten and the next read will fail as well. So, am I right in this guess? Is there a way to make ZFS wipe those sectors (cause my own program is too slow as it reads the whole disk and also needs the array to be brought offline)? An a situation where there's no parity available, will it narrow down read block size to read the data and not the unused sectors with curruption? -- Dmitry Marakasov . 55B5 0596 FF1E 8D84 5F56 9510 D35A 80DD F9D2 F77D amdmi3@amdmi3.ru ..: jabber: amdmi3@jabber.ru http://www.amdmi3.ru
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20090527155342.GA45258>