Date: Fri, 2 Feb 2018 17:12:01 +0100 From: Ben RUBSON <ben.rubson@gmail.com> To: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org> Subject: Re: ZFS pool faulted (corrupt metadata) but the disk data appears ok... Message-ID: <FAB7C3BA-057F-4AB4-96E1-5C3208BABBA7@gmail.com> In-Reply-To: <73dd7026-534e-7212-a037-0cbf62a61acd@sorbs.net> References: <54D3E9F6.20702@sorbs.net> <54D41608.50306@delphij.net> <54D41AAA.6070303@sorbs.net> <54D41C52.1020003@delphij.net> <54D424F0.9080301@sorbs.net> <54D47F94.9020404@freebsd.org> <54D4A552.7050502@sorbs.net> <54D4BB5A.30409@freebsd.org> <54D8B3D8.6000804@sorbs.net> <54D8CECE.60909@freebsd.org> <54D8D4A1.9090106@sorbs.net> <54D8D5DE.4040906@sentex.net> <54D8D92C.6030705@sorbs.net> <54D8E189.40201@sorbs.net> <54D924DD.4000205@sorbs.net> <54DCAC29.8000301@sorbs.net> <9c995251-45f1-cf27-c4c8-30a4bd0f163c@sorbs.net> <8282375D-5DDC-4294-A69C-03E9450D9575@gmail.com> <73dd7026-534e-7212-a037-0cbf62a61acd@sorbs.net>
index | next in thread | previous in thread | raw e-mail
On 02 Feb 2018 11:51, Michelle Sullivan wrote: > Ben RUBSON wrote: >> On 02 Feb 2018 11:26, Michelle Sullivan wrote: >> >> Hi Michelle, >> >>> Michelle Sullivan wrote: >>>> Michelle Sullivan wrote: >>>>> So far (few hours in) zfs import -fFX has not faulted with this >>>>> image... >>>>> it's running out of memory currently about 16G of 32G- however 9.2-P15 >>>>> kernel died within minutes... out of memory (all 32G and swap) so am >>>>> more optimistic at the moment... Fingers Crossed. >>>> And the answer: >>>> >>>> 11-STABLE on a USB stick. >>>> >>>> Remove the drive that was replacing the hotspare (ie the replacement >>>> drive for the one that initially died) >>>> zpool import -fFX storage >>>> zpool export storage >>>> >>>> reboot back to 9.x >>>> zpool import storage >>>> re-insert drive replacement drive. >>>> reboot >>> Gotta thank people for this again, saved me again this time on a >>> non-FreeBSD system this time (with a lot of using a modified >>> recoverdisk for OSX - thanks PSK@)... Lost 3 disks out of a raidz2 and >>> 2 more had read errors on some sectors.. don't know how much (if any) >>> data I've lost but at least it's not a rebuild from back up of all >>> 48TB.. >> >> What about the root-cause ? > > 3 disks died whilst the server was in transit from Malta to Australia > (and I'm surprised that was all considering the state of some of the > stuff that came out of the container - have a 3kva UPS that is completely > destroyed despite good packing.) >> Sounds like you had 5 disks dying at the same time ? > > Turns out that one of the 3 that had 'red lights on' had bad sectors, the > other 2 were just excluded by the BIOS... I did a byte copy onto new > drives found no read errors so put them back in and forced them online. > The other 1 had 78k of bytes unreadable so new disk went in and an > convinced the controller that it was the same disk as the one it > replaced, the export/import produced 2 more disks unrecoverable read > errors that nothing had flagged previously, so byte copied them onto new > drives and the import -fFX is currently working (5 hours so far)... > >> Do you periodically run long smart tests ? > > Yup (fully automated.) > >> Zpool scrubs ? > > Both servers took a zpool scrub before they were packed into the > containers... the second one came out unscathed... but then most stuff in > the second container came out unscathed unlike the first.... What a story ! Thanks for the details. So disks died because of the carrier, as I assume the second unscathed server was OK... Heads must have scratched the platters, but they should have been parked, so... Really strange. Hope you'll recover your whole pool. Benhome | help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?FAB7C3BA-057F-4AB4-96E1-5C3208BABBA7>
