FreeBSD Mail Archives

Date:      Fri, 02 Feb 2018 21:51:01 +1100
From:      Michelle Sullivan <michelle@sorbs.net>
To:        Ben RUBSON <ben.rubson@gmail.com>, "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Subject:   Re: ZFS pool faulted (corrupt metadata) but the disk data appears ok...
Message-ID:  <73dd7026-534e-7212-a037-0cbf62a61acd@sorbs.net>
In-Reply-To: <8282375D-5DDC-4294-A69C-03E9450D9575@gmail.com>
References:  <54D3E9F6.20702@sorbs.net> <54D41608.50306@delphij.net> <54D41AAA.6070303@sorbs.net> <54D41C52.1020003@delphij.net> <54D424F0.9080301@sorbs.net> <54D47F94.9020404@freebsd.org> <54D4A552.7050502@sorbs.net> <54D4BB5A.30409@freebsd.org> <54D8B3D8.6000804@sorbs.net> <54D8CECE.60909@freebsd.org> <54D8D4A1.9090106@sorbs.net> <54D8D5DE.4040906@sentex.net> <54D8D92C.6030705@sorbs.net> <54D8E189.40201@sorbs.net> <54D924DD.4000205@sorbs.net> <54DCAC29.8000301@sorbs.net> <9c995251-45f1-cf27-c4c8-30a4bd0f163c@sorbs.net> <8282375D-5DDC-4294-A69C-03E9450D9575@gmail.com>

Ben RUBSON wrote:
> On 02 Feb 2018 11:26, Michelle Sullivan wrote:
>
> Hi Michelle,
>
>> Michelle Sullivan wrote:
>>> Michelle Sullivan wrote:
>>>> So far (few hours in) zfs import -fFX has not faulted with this 
>>>> image...
>>>> it's running out of memory currently about 16G of 32G- however 9.2-P15
>>>> kernel died within minutes... out of memory (all 32G and swap) so am
>>>> more optimistic at the moment...  Fingers Crossed.
>>>>
>>> And the answer:
>>>
>>> 11-STABLE on a USB stick.
>>>
>>> Remove the drive that was replacing the hotspare (ie the replacement
>>> drive for the one that initially died)
>>> zpool import -fFX storage
>>> zpool export storage
>>>
>>> reboot back to 9.x
>>> zpool import storage
>>> re-insert drive replacement drive.
>>> reboot
>> Gotta thank people for this again, saved me again this time on a 
>> non-FreeBSD system this time (with a lot of using a modified 
>> recoverdisk for OSX - thanks PSK@)... Lost 3 disks out of a raidz2 
>> and 2 more had read errors on some sectors.. don't know how much (if 
>> any) data I've lost but at least it's not a rebuild from back up of 
>> all 48TB..
>
> What about the root-cause ?

3 disks died whilst the server was in transit from Malta to Australia 
(and I'm surprised that was all considering the state of some of the 
stuff that came out of the container - have a 3kva UPS that is 
completely destroyed despite good packing.)
> Sounds like you had 5 disks dying at the same time ?

Turns out that one of the 3 that had 'red lights on' had bad sectors, 
the other 2 were just excluded by the BIOS...  I did a byte copy onto 
new drives found no read errors so put them back in and forced them 
online.  The other 1 had 78k of bytes unreadable so new disk went in and 
an convinced the controller that it was the same disk as the one it 
replaced, the export/import produced 2 more disks unrecoverable read 
errors that nothing had flagged previously, so byte copied them onto new 
drives and the import -fFX is currently working (5 hours so far)...

> Do you periodically run long smart tests ?

Yup (fully automated.)

> Zpool scrubs ?
>

Both servers took a zpool scrub before they were packed into the 
containers... the second one came out unscathed... but then most stuff 
in the second container came out unscathed unlike the first....

Regards,

-- 
Michelle Sullivan
http://www.mhix.org/

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?73dd7026-534e-7212-a037-0cbf62a61acd>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation