Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 2 Feb 2018 17:12:01 +0100
From:      Ben RUBSON <ben.rubson@gmail.com>
To:        "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Subject:   Re: ZFS pool faulted (corrupt metadata) but the disk data appears ok...
Message-ID:  <FAB7C3BA-057F-4AB4-96E1-5C3208BABBA7@gmail.com>
In-Reply-To: <73dd7026-534e-7212-a037-0cbf62a61acd@sorbs.net>
References:  <54D3E9F6.20702@sorbs.net> <54D41608.50306@delphij.net> <54D41AAA.6070303@sorbs.net> <54D41C52.1020003@delphij.net> <54D424F0.9080301@sorbs.net> <54D47F94.9020404@freebsd.org> <54D4A552.7050502@sorbs.net> <54D4BB5A.30409@freebsd.org> <54D8B3D8.6000804@sorbs.net> <54D8CECE.60909@freebsd.org> <54D8D4A1.9090106@sorbs.net> <54D8D5DE.4040906@sentex.net> <54D8D92C.6030705@sorbs.net> <54D8E189.40201@sorbs.net> <54D924DD.4000205@sorbs.net> <54DCAC29.8000301@sorbs.net> <9c995251-45f1-cf27-c4c8-30a4bd0f163c@sorbs.net> <8282375D-5DDC-4294-A69C-03E9450D9575@gmail.com> <73dd7026-534e-7212-a037-0cbf62a61acd@sorbs.net>

index | next in thread | previous in thread | raw e-mail

On 02 Feb 2018 11:51, Michelle Sullivan wrote:

> Ben RUBSON wrote:
>> On 02 Feb 2018 11:26, Michelle Sullivan wrote:
>>
>> Hi Michelle,
>>
>>> Michelle Sullivan wrote:
>>>> Michelle Sullivan wrote:
>>>>> So far (few hours in) zfs import -fFX has not faulted with this  
>>>>> image...
>>>>> it's running out of memory currently about 16G of 32G- however 9.2-P15
>>>>> kernel died within minutes... out of memory (all 32G and swap) so am
>>>>> more optimistic at the moment...  Fingers Crossed.
>>>> And the answer:
>>>>
>>>> 11-STABLE on a USB stick.
>>>>
>>>> Remove the drive that was replacing the hotspare (ie the replacement
>>>> drive for the one that initially died)
>>>> zpool import -fFX storage
>>>> zpool export storage
>>>>
>>>> reboot back to 9.x
>>>> zpool import storage
>>>> re-insert drive replacement drive.
>>>> reboot
>>> Gotta thank people for this again, saved me again this time on a  
>>> non-FreeBSD system this time (with a lot of using a modified  
>>> recoverdisk for OSX - thanks PSK@)... Lost 3 disks out of a raidz2 and  
>>> 2 more had read errors on some sectors.. don't know how much (if any)  
>>> data I've lost but at least it's not a rebuild from back up of all  
>>> 48TB..
>>
>> What about the root-cause ?
>
> 3 disks died whilst the server was in transit from Malta to Australia  
> (and I'm surprised that was all considering the state of some of the  
> stuff that came out of the container - have a 3kva UPS that is completely  
> destroyed despite good packing.)
>> Sounds like you had 5 disks dying at the same time ?
>
> Turns out that one of the 3 that had 'red lights on' had bad sectors, the  
> other 2 were just excluded by the BIOS...  I did a byte copy onto new  
> drives found no read errors so put them back in and forced them online.   
> The other 1 had 78k of bytes unreadable so new disk went in and an  
> convinced the controller that it was the same disk as the one it  
> replaced, the export/import produced 2 more disks unrecoverable read  
> errors that nothing had flagged previously, so byte copied them onto new  
> drives and the import -fFX is currently working (5 hours so far)...
>
>> Do you periodically run long smart tests ?
>
> Yup (fully automated.)
>
>> Zpool scrubs ?
>
> Both servers took a zpool scrub before they were packed into the  
> containers... the second one came out unscathed... but then most stuff in  
> the second container came out unscathed unlike the first....

What a story ! Thanks for the details.

So disks died because of the carrier, as I assume the second unscathed  
server was OK...
Heads must have scratched the platters, but they should have been parked,  
so... Really strange.

Hope you'll recover your whole pool.

Ben



home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?FAB7C3BA-057F-4AB4-96E1-5C3208BABBA7>