Date: Mon, 25 May 2009 18:12:05 +0200 From: Thomas Backman <serenity@exscape.org> To: Freddie Cash <fjwcash@gmail.com> Cc: freebsd-current@freebsd.org Subject: Re: ZFS panic under extreme circumstances (2/3 disks corrupted) Message-ID: <D817D098-9C36-4B72-9DCB-027CE8A7C564@exscape.org> In-Reply-To: <b269bc570905250839r54a0f58fo5474e9e219a222ca@mail.gmail.com> References: <4E6E325D-BB18-4478-BCFD-633D6F4CFD88@exscape.org> <D98FEABB-8B8A-48E6-B021-B05816B4C699@exscape.org> <b269bc570905250839r54a0f58fo5474e9e219a222ca@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On May 25, 2009, at 05:39 PM, Freddie Cash wrote: > On Mon, May 25, 2009 at 2:13 AM, Thomas Backman > <serenity@exscape.org> wrote: >> On May 24, 2009, at 09:02 PM, Thomas Backman wrote: >> >>> So, I was playing around with RAID-Z and self-healing... >> >> Yet another follow-up to this. >> It appears that all traces of errors vanish after a reboot. So, say >> you have >> a dying disk; ZFS repairs the data for you, and you don't notice >> (unless you >> check zpool status). Then you reboot, and there's NO (easy?) way >> that I can >> tell to find out that something is wrong with your hardware! > > On our storage server that was initially configured using 1 large > 24-drive raidz2 vdev (don't do that, by the way), we had 1 drive go > south. "zpool status" was full of errors. And the error counts > survived reboots. Either that, or the drive was so bad that the error > counts started increasing right away after a boot. After a week of > fighting with it to get the new drive to resilver and get added to the > vdev, we nuked it and re-created it using 3 raidz2 vdevs each > comprised of 8 drives. > > (Un)fortunately, that was the only failure we've had so far, so can't > really confirm/deny the "error counts reset after reboot". Was this on FreeBSD? I have another unfortunate thing to note regarding this: after a reboot, it's even impossible to tell *which disk* has gone bad, even if the pool is "uncleared" but otherwise "healed". It simply says that a device has failed, with no clue as to which one, since they're all "ONLINE"! Regards, Thomas
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?D817D098-9C36-4B72-9DCB-027CE8A7C564>