Date: Mon, 7 Jan 2008 14:59:26 +0100 From: Bernd Walter <ticso@cicely12.cicely.de> To: Tz-Huan Huang <tzhuan@csie.org> Cc: freebsd-fs@freebsd.org, Brooks Davis <brooks@freebsd.org> Subject: Re: ZFS i/o errors - which disk is the problem? Message-ID: <20080107135925.GF65134@cicely12.cicely.de> In-Reply-To: <6a7033710801061844m59f8c62dvdd3eea80f6c239c1@mail.gmail.com> References: <477B16BB.8070104@freebsd.org> <20080102070146.GH49874@cicely12.cicely.de> <477B8440.1020501@freebsd.org> <200801031750.31035.peter.schuller@infidyne.com> <477D16EE.6070804@freebsd.org> <20080103171825.GA28361@lor.one-eyed-alien.net> <6a7033710801061844m59f8c62dvdd3eea80f6c239c1@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Jan 07, 2008 at 10:44:13AM +0800, Tz-Huan Huang wrote: > 2008/1/4, Brooks Davis <brooks@freebsd.org>: > > > > We've definitely seen cases where hardware changes fixed ZFS checksum errors. > > In once case, a firmware upgrade on the raid controller fixed it. In another > > case, we'd been connecting to an external array with a SCSI card that didn't > > have a PCI bracket and the errors went away when the replacement one arrived > > and was installed. The fact that there were significant errors caught by ZFS > > was quite disturbing since we wouldn't have found them with UFS. > > Hi, > > We have a nfs server using zfs with the similar problem. > The box is i386 7.0-PRERELEASE with 3G ram: > > # uname -a > FreeBSD cml3 7.0-PRERELEASE FreeBSD 7.0-PRERELEASE #2: > Sat Jan 5 14:42:41 CST 2008 root@cml3:/usr/obj/usr/src/sys/CML2 i386 > > The zfs pool contains 3 raids now: > > 2007-11-20.11:49:17 zpool create pool /dev/label/proware263 > 2007-11-20.11:53:31 zfs create pool/project > ... (zfs create other filesystems) ... > 2007-11-20.11:54:32 zfs set atime=off pool > 2007-12-08.22:59:15 zpool add pool /dev/da0 > 2008-01-05.21:20:03 zpool add pool /dev/label/proware262 > > After a power loss yesterday, the zfs status shows > > # zpool status -v > pool: pool > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://www.sun.com/msg/ZFS-8000-8A > scrub: scrub completed with 231 errors on Mon Jan 7 08:05:35 2008 > config: > > NAME STATE READ WRITE CKSUM > pool ONLINE 0 0 516 > label/proware263 ONLINE 0 0 231 > da0 ONLINE 0 0 285 > label/proware262 ONLINE 0 0 0 > > errors: Permanent errors have been detected in the following files: > > /system/database/mysql/flickr_geo/flickr_raw_tag.MYI > pool/project:<0x0> > pool/home/master/96:<0xbf36> > > The main problem is that we cannot mount pool/project any more: > > # zfs mount pool/project > cannot mount 'pool/project': Input/output error > # grep ZFS /var/log/messages > Jan 7 10:08:35 cml3 root: ZFS: zpool I/O failure, zpool=pool error=86 > (repeat many times) > > There are many data in pool/project, probably 3.24T. zdb shows > > # zdb pool > ... > Dataset pool/project [ZPL], ID 33, cr_txg 57, 3.24T, 22267231 objects > ... > > (zdb is still running now, we can provide the output if helpful) > > Is there any way to recover any data from pool/project? The data is corrupted by controller and/or disk subsystem. You have no other data sources for the broken data, so it is lost. The only garantied way is to get it back from backup. Maybe older snapshots/clones are still readable - I don't know. Nevertheless data is corrupted and that's the purpose for alternative data sources such as raidz/mirror and at last backup. You shouldn't have ignored those errors at first, because you are running with faulty hardware. Without ZFS checksumming the system would just process the broken data with unpredictable results. If all those errors are fresh then you likely used a broken RAID controller below ZFS, which silently corrupted syncronity and then blow when disk state changed. Unfortunately many RAID controllers are broken and therefor useless. -- B.Walter http://www.bwct.de http://www.fizon.de bernd@bwct.de info@bwct.de support@fizon.de
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080107135925.GF65134>