Date: Mon, 16 Jun 2008 03:23:35 +1000 From: "Andrew Hill" <lists@thefrog.net> To: "FreeBSD Mailing List" <freebsd-fs@freebsd.org> Subject: raidz vdev marked faulted with only one faulted disk Message-ID: <16a6ef710806151023y58bd905dr3671e76c71aa1f4d@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
i'm a bit lost here as to exactly what's gone wrong - it seems like it may be a bug in zfs but also entirely likely i'm assuming something i shoudln't or am just not using the zfs tools properly (i rather hope its the latter...) background: i had a system running with 4 zpools. the two that are relevant to this issue are the raidz volumes: - 1x zpool (tank) consisting of a raidz vdev consisting of 7x250 GB slices (each slice on a separate disk) - 1x zpool (tank2) consisting of a raidz vdev consisting of 3x70 GB slices (again, separate disks from each other, but these are slices on the same disks as the other raidz vdev) (this is a cheap home system built out of parts lying around, basically intended to get a lot of storage space out of a bunch of disks with little concern for performance, so no need to point out these problems) the system was originally installed on a ufs partition then migrated onto a raidz zpool (so i was using the kernel and /boot from the ufs drive still, but the system root was on raidz), apart from well-known deadlocks and panics here and there, it generally worked well enough (uptimes of a week or so if i wasn't actively trying to trigger a deadlock/panic) problem: a couple of weeks ago, it completely stopped being able to mount root from zfs, so i booted back into the old ufs partition (which still had whatever world was originally installed on there from my 7.0-release amd64 CD but with an up-to-date -stable kernel) and i discovered that one of my disks (ad12) was now FAULTED. this is one of the disks that affects both raidz vdevs mentioned above (i.e. it has a 250gb slice in tank and a 70gb slice in tank2) so both raidz vdevs were effectively missing one disk device, but both should be able to handle this type of failure... right? i've not yet looked too far into the cause of the failure, though my guess is it relates to the silicon image sil3114 controller that disk was attached to (mainly due to the repuatation those controllers have) though for now i'm trying to figure out the other major issue... from 'zpool import' i can see that this disk (ad12) is marked "FAULTED corrupted data" in the list of 7 drives in tank (i.e. ad12s1d), and the list of 3 drives in tank2 (ad12s1e). in both zpools, the raidz vdev and the whole zpool is then marked "FAULTED corrupted data", despite only one disk in the raidz being FAULTED - my understanding is that it should be DEGRADED... right? example output showing tank2: gutter# zpool import pool: tank2 id: 8036862119610852708 state: FAULTED status: The pool was last accessed by another system. action: The pool cannot be imported due to damaged devices or data. The pool may be active on on another system, but can be imported using the '-f' flag. see: http://www.sun.com/msg/ZFS-8000-EY config: tank2 FAULTED corrupted data raidz1 FAULTED corrupted data ad8s1e ONLINE ad10s1e ONLINE ad12s1e FAULTED corrupted data tank2 is/was a zpool created as a single raidz vdev of 3 slices as shown above, so it seems like the failure/loss of one disk shouldn't be causing it to get marked FAULTED. tank is the same but with 7 drives (6 ONLINE, 1 FAULTED)... since the zpools were never exported (i'm unable to mount the root file system on the zpool to export either of them...) they obviously show up the errors above about being last accessed by another system, so attempting to override that (zpool import -f tank or tank2) gives the following messages on the console: ZFS: vdev failure, zpool=tank2 type=vdev.no_replicas ZFS: failed to load zpool tank2 cannot import 'tank2': permission denied when i first booted into the old ufs drive that the zpools were created from, they showed up on that system in zpool list (since they'd not been exported after i set it up to use one as the root fs), and zpool status told me to see: http://www.sun.com/msg/ZFS-8000-5E, which is about zpools that have a faulted device and no redundancy - *very* odd to see on a raidz vdev i've also tried completely removing the faulted disk with no better result, and removing two drives causes it to show up as UNAVAIL (as expected) or a "panic: dangling dbufs" when i try to 'zpool import', though i suspect this might be memory related (i've also been trying all of this on a second motherboard, which i can only supply with 512 MB RAM) i've tried various different combinations - hardware - two different motherboards (with different cpu and ram, the only thing common to all systems is a new sata controller - promise chip PDC20376 - to replace the silicon image sata controller so that i can put all 7 drives into the system) - software - a fresh FreeBSD install on a new hard drive (from a 7.0 i386 CD i downloaded about 3 months ago, and then again after updating to the latest -stable source), as well as the system i mentioned earlier on my boot/kernel drive, which had the latest amd64 kernel built on my zfs system but hadn't had the userland updated since 7.0 amd64 install) when i first set up the system, i tested out the behaviour of removing a drive from a raidz vdev, and definitely saw it enter the DEGRADED state, though i did not try exporting and re-importing in this state, but according to the sun zfs documentation this should be possible (i realise this doesn't mean its in the bsd port, but i've not found anything to confirm this specifically is/isn't possible) so my question is - is this a bug in zfs that is causing the raidz to be faulted when one device is faulted/corrupted (would have to be under specific conditions, since raidz vdevs can definitely go between DEGRADED and ONLINE states just fine in general), or am i misusing the zfs utilities or making invalid assumptions, e.g. is there some other method of importing or perhaps scrubbing/resilvering prior to importing that i'm missing? Andrew
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?16a6ef710806151023y58bd905dr3671e76c71aa1f4d>