Date: Mon, 25 May 2009 09:19:21 -0700 From: Freddie Cash <fjwcash@gmail.com> To: freebsd-current@freebsd.org Subject: Re: ZFS panic under extreme circumstances (2/3 disks corrupted) Message-ID: <b269bc570905250919t5bf37b5cv6037f22eaf925154@mail.gmail.com> In-Reply-To: <D817D098-9C36-4B72-9DCB-027CE8A7C564@exscape.org> References: <4E6E325D-BB18-4478-BCFD-633D6F4CFD88@exscape.org> <D98FEABB-8B8A-48E6-B021-B05816B4C699@exscape.org> <b269bc570905250839r54a0f58fo5474e9e219a222ca@mail.gmail.com> <D817D098-9C36-4B72-9DCB-027CE8A7C564@exscape.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, May 25, 2009 at 9:12 AM, Thomas Backman <serenity@exscape.org> wrot= e: > On May 25, 2009, at 05:39 PM, Freddie Cash wrote: >> On Mon, May 25, 2009 at 2:13 AM, Thomas Backman <serenity@exscape.org> >> wrote: >>> On May 24, 2009, at 09:02 PM, Thomas Backman wrote: >>> >>>> So, I was playing around with RAID-Z and self-healing... >>> >>> Yet another follow-up to this. >>> It appears that all traces of errors vanish after a reboot. So, say you >>> have a dying disk; ZFS repairs the data for you, and you don't notice (= unless >>> you check zpool status). Then you reboot, and there's NO (easy?) way th= at I >>> can tell to find out that something is wrong with your hardware! >> >> On our storage server that was initially configured using 1 large >> 24-drive raidz2 vdev (don't do that, by the way), we had 1 drive go >> south. =C2=A0"zpool status" was full of errors. =C2=A0And the error coun= ts >> survived reboots. =C2=A0Either that, or the drive was so bad that the er= ror >> counts started increasing right away after a boot. =C2=A0After a week of >> fighting with it to get the new drive to resilver and get added to the >> vdev, we nuked it and re-created it using 3 raidz2 vdevs each >> comprised of 8 drives. >> >> (Un)fortunately, that was the only failure we've had so far, so can't >> really confirm/deny the "error counts reset after reboot". > > Was this on FreeBSD? 64-bit FreeBSD 7.1 using ZFS v6. SATA drives connected to 3Ware RAID controllers, but configured as "Single Drive" arrays not using hardware RAID in any way. > I have another unfortunate thing to note regarding this: after a reboot, > it's even impossible to tell *which disk* has gone bad, even if the pool = is > "uncleared" but otherwise "healed". It simply says that a device has fail= ed, > with no clue as to which one, since they're all "ONLINE"! Even when using -v? zpool status -v --=20 Freddie Cash fjwcash@gmail.com
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?b269bc570905250919t5bf37b5cv6037f22eaf925154>