Date: Fri, 02 Feb 2018 21:51:01 +1100 From: Michelle Sullivan <michelle@sorbs.net> To: Ben RUBSON <ben.rubson@gmail.com>, "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org> Subject: Re: ZFS pool faulted (corrupt metadata) but the disk data appears ok... Message-ID: <73dd7026-534e-7212-a037-0cbf62a61acd@sorbs.net> In-Reply-To: <8282375D-5DDC-4294-A69C-03E9450D9575@gmail.com> References: <54D3E9F6.20702@sorbs.net> <54D41608.50306@delphij.net> <54D41AAA.6070303@sorbs.net> <54D41C52.1020003@delphij.net> <54D424F0.9080301@sorbs.net> <54D47F94.9020404@freebsd.org> <54D4A552.7050502@sorbs.net> <54D4BB5A.30409@freebsd.org> <54D8B3D8.6000804@sorbs.net> <54D8CECE.60909@freebsd.org> <54D8D4A1.9090106@sorbs.net> <54D8D5DE.4040906@sentex.net> <54D8D92C.6030705@sorbs.net> <54D8E189.40201@sorbs.net> <54D924DD.4000205@sorbs.net> <54DCAC29.8000301@sorbs.net> <9c995251-45f1-cf27-c4c8-30a4bd0f163c@sorbs.net> <8282375D-5DDC-4294-A69C-03E9450D9575@gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Ben RUBSON wrote: > On 02 Feb 2018 11:26, Michelle Sullivan wrote: > > Hi Michelle, > >> Michelle Sullivan wrote: >>> Michelle Sullivan wrote: >>>> So far (few hours in) zfs import -fFX has not faulted with this >>>> image... >>>> it's running out of memory currently about 16G of 32G- however 9.2-P15 >>>> kernel died within minutes... out of memory (all 32G and swap) so am >>>> more optimistic at the moment... Fingers Crossed. >>>> >>> And the answer: >>> >>> 11-STABLE on a USB stick. >>> >>> Remove the drive that was replacing the hotspare (ie the replacement >>> drive for the one that initially died) >>> zpool import -fFX storage >>> zpool export storage >>> >>> reboot back to 9.x >>> zpool import storage >>> re-insert drive replacement drive. >>> reboot >> Gotta thank people for this again, saved me again this time on a >> non-FreeBSD system this time (with a lot of using a modified >> recoverdisk for OSX - thanks PSK@)... Lost 3 disks out of a raidz2 >> and 2 more had read errors on some sectors.. don't know how much (if >> any) data I've lost but at least it's not a rebuild from back up of >> all 48TB.. > > What about the root-cause ? 3 disks died whilst the server was in transit from Malta to Australia (and I'm surprised that was all considering the state of some of the stuff that came out of the container - have a 3kva UPS that is completely destroyed despite good packing.) > Sounds like you had 5 disks dying at the same time ? Turns out that one of the 3 that had 'red lights on' had bad sectors, the other 2 were just excluded by the BIOS... I did a byte copy onto new drives found no read errors so put them back in and forced them online. The other 1 had 78k of bytes unreadable so new disk went in and an convinced the controller that it was the same disk as the one it replaced, the export/import produced 2 more disks unrecoverable read errors that nothing had flagged previously, so byte copied them onto new drives and the import -fFX is currently working (5 hours so far)... > Do you periodically run long smart tests ? Yup (fully automated.) > Zpool scrubs ? > Both servers took a zpool scrub before they were packed into the containers... the second one came out unscathed... but then most stuff in the second container came out unscathed unlike the first.... Regards, -- Michelle Sullivan http://www.mhix.org/
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?73dd7026-534e-7212-a037-0cbf62a61acd>