Date: Mon, 7 Jan 2008 10:44:13 +0800 From: "Tz-Huan Huang" <tzhuan@csie.org> To: "Brooks Davis" <brooks@freebsd.org> Cc: freebsd-fs@freebsd.org Subject: Re: ZFS i/o errors - which disk is the problem? Message-ID: <6a7033710801061844m59f8c62dvdd3eea80f6c239c1@mail.gmail.com> In-Reply-To: <20080103171825.GA28361@lor.one-eyed-alien.net> References: <477B16BB.8070104@freebsd.org> <20080102070146.GH49874@cicely12.cicely.de> <477B8440.1020501@freebsd.org> <200801031750.31035.peter.schuller@infidyne.com> <477D16EE.6070804@freebsd.org> <20080103171825.GA28361@lor.one-eyed-alien.net>
next in thread | previous in thread | raw e-mail | index | archive | help
2008/1/4, Brooks Davis <brooks@freebsd.org>: > > We've definitely seen cases where hardware changes fixed ZFS checksum errors. > In once case, a firmware upgrade on the raid controller fixed it. In another > case, we'd been connecting to an external array with a SCSI card that didn't > have a PCI bracket and the errors went away when the replacement one arrived > and was installed. The fact that there were significant errors caught by ZFS > was quite disturbing since we wouldn't have found them with UFS. Hi, We have a nfs server using zfs with the similar problem. The box is i386 7.0-PRERELEASE with 3G ram: # uname -a FreeBSD cml3 7.0-PRERELEASE FreeBSD 7.0-PRERELEASE #2: Sat Jan 5 14:42:41 CST 2008 root@cml3:/usr/obj/usr/src/sys/CML2 i386 The zfs pool contains 3 raids now: 2007-11-20.11:49:17 zpool create pool /dev/label/proware263 2007-11-20.11:53:31 zfs create pool/project ... (zfs create other filesystems) ... 2007-11-20.11:54:32 zfs set atime=off pool 2007-12-08.22:59:15 zpool add pool /dev/da0 2008-01-05.21:20:03 zpool add pool /dev/label/proware262 After a power loss yesterday, the zfs status shows # zpool status -v pool: pool state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: scrub completed with 231 errors on Mon Jan 7 08:05:35 2008 config: NAME STATE READ WRITE CKSUM pool ONLINE 0 0 516 label/proware263 ONLINE 0 0 231 da0 ONLINE 0 0 285 label/proware262 ONLINE 0 0 0 errors: Permanent errors have been detected in the following files: /system/database/mysql/flickr_geo/flickr_raw_tag.MYI pool/project:<0x0> pool/home/master/96:<0xbf36> The main problem is that we cannot mount pool/project any more: # zfs mount pool/project cannot mount 'pool/project': Input/output error # grep ZFS /var/log/messages Jan 7 10:08:35 cml3 root: ZFS: zpool I/O failure, zpool=pool error=86 (repeat many times) There are many data in pool/project, probably 3.24T. zdb shows # zdb pool ... Dataset pool/project [ZPL], ID 33, cr_txg 57, 3.24T, 22267231 objects ... (zdb is still running now, we can provide the output if helpful) Is there any way to recover any data from pool/project? Thank you very much. Sincerely, Tz-Huan
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6a7033710801061844m59f8c62dvdd3eea80f6c239c1>