Date: Tue, 19 Oct 2010 16:30:41 +0100 From: Karl Pielorz <kpielorz_lst@tdx.co.uk> To: Jeremy Chadwick <freebsd@jdc.parodius.com> Cc: freebsd-fs@freebsd.org Subject: Re: ZFS 'read-only' device / pool scan / import? Message-ID: <7BEF90D9F4D4CB985F3573C3@HexaDeca64.dmpriest.net.uk> In-Reply-To: <20101019151602.GA61733@icarus.home.lan> References: <AE519076FDEA1259C5DEA689@HexaDeca64.dmpriest.net.uk> <20101019151602.GA61733@icarus.home.lan>
next in thread | previous in thread | raw e-mail | index | archive | help
--On 19 October 2010 08:16 -0700 Jeremy Chadwick <freebsd@jdc.parodius.com> wrote: > Experts here might be able to help, but you're really going to need to > provide every little detail, in chronological order. What commands were > done, what output was seen, what physical actions took place, etc.. > > 1) Restoring from backups is probably your best bet (IMHO; this is what I > would do as well). I didn't provide much detail - as there isn't much detail left to provide (the pools been destroyed / rebuilt) - how it got messed up is almost certainly a case of human error / controller 'oddity' with failed devices [which is now suitably noted for that machine!]... It was more a 'for future reference' kind of question - does attempting to import a pool (or even running something as simple as a 'zfs status' when ZFS has not been 'loaded') actually write to the disks? i.e. could it cause a pool that is currently 'messed up' to become permanently 'messed up' - because ZFS will change metadata on the pool, if 'at the time' it deems devices to be faulted / corrupt etc. - And, if it does - is there any way of doing a 'test mount/import' (i.e. with the underlying devices only being opened 'read only' - or does [as I suspect] ZFS *need* r/w access to those devices as part of the work to actually import/mount. > There's a lot of other things I could add to the item list here > (probably reach 9 or 10 if I tried), but in general the above sounds > like its what happened. raidz2 would have been able to save you in this > situation, but would require at least 4 disks. It was RAIDZ2 - it got totally screwed: " vol UNAVAIL 0 0 0 insufficient replicas raidz2 UNAVAIL 0 0 0 insufficient replicas da3 FAULTED 0 0 0 corrupted data da4 FAULTED 0 0 0 corrupted data da5 FAULTED 0 0 0 corrupted data da6 FAULTED 0 0 0 corrupted data da7 FAULTED 0 0 0 corrupted data da8 FAULTED 0 0 0 corrupted data raidz2 UNAVAIL 0 0 0 insufficient replicas da1 ONLINE 0 0 0 da2 ONLINE 0 0 0 da9 FAULTED 0 0 0 corrupted data da10 FAULTED 0 0 0 corrupted data da11 FAULTED 0 0 0 corrupted data da11 ONLINE 0 0 0 " As there is such a large aspect of human error (and controller behaviour), I don't think it's worth digging into any deeper. It's the first pool we've ever "lost" under ZFS, and like I said a combination of the controller collapsing devices, and humans replacing wrong disks, 'twas doomed to fail from the start. We've replaced failed drives on this system before - but never rebooted after a failure, before a replacement - and never replaced the wrong drive :) Definitely a good advert for backups though :) -Karl
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?7BEF90D9F4D4CB985F3573C3>