From owner-freebsd-fs@freebsd.org  Fri Feb  2 10:51:15 2018
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id DD11EECFDE0
 for <freebsd-fs@mailman.ysv.freebsd.org>; Fri,  2 Feb 2018 10:51:14 +0000 (UTC)
 (envelope-from michelle@sorbs.net)
Received: from hades.sorbs.net (hades.sorbs.net [72.12.213.40])
 by mx1.freebsd.org (Postfix) with ESMTP id 8750985E20
 for <freebsd-fs@freebsd.org>; Fri,  2 Feb 2018 10:51:14 +0000 (UTC)
 (envelope-from michelle@sorbs.net)
MIME-version: 1.0
Content-transfer-encoding: 7BIT
Content-type: text/plain; CHARSET=US-ASCII; format=flowed
Received: from isux.com (203-206-128-220.perm.iinet.net.au [203.206.128.220])
 by hades.sorbs.net
 (Oracle Communications Messaging Server 7.0.5.29.0 64bit (built Jul 9 2013))
 with ESMTPSA id <0P3I003A8RWJK500@hades.sorbs.net> for freebsd-fs@freebsd.org; 
 Fri, 02 Feb 2018 03:00:26 -0800 (PST)
Subject: Re: ZFS pool faulted (corrupt metadata) but the disk data appears
 ok...
To: Ben RUBSON <ben.rubson@gmail.com>,
 "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
References: <54D3E9F6.20702@sorbs.net> <54D41608.50306@delphij.net>
 <54D41AAA.6070303@sorbs.net> <54D41C52.1020003@delphij.net>
 <54D424F0.9080301@sorbs.net> <54D47F94.9020404@freebsd.org>
 <54D4A552.7050502@sorbs.net> <54D4BB5A.30409@freebsd.org>
 <54D8B3D8.6000804@sorbs.net> <54D8CECE.60909@freebsd.org>
 <54D8D4A1.9090106@sorbs.net> <54D8D5DE.4040906@sentex.net>
 <54D8D92C.6030705@sorbs.net> <54D8E189.40201@sorbs.net>
 <54D924DD.4000205@sorbs.net> <54DCAC29.8000301@sorbs.net>
 <9c995251-45f1-cf27-c4c8-30a4bd0f163c@sorbs.net>
 <8282375D-5DDC-4294-A69C-03E9450D9575@gmail.com>
From: Michelle Sullivan <michelle@sorbs.net>
Message-id: <73dd7026-534e-7212-a037-0cbf62a61acd@sorbs.net>
Date: Fri, 02 Feb 2018 21:51:01 +1100
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:51.0)
 Gecko/20100101 Firefox/51.0 SeaMonkey/2.48
In-reply-to: <8282375D-5DDC-4294-A69C-03E9450D9575@gmail.com>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.25
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 02 Feb 2018 10:51:15 -0000

Ben RUBSON wrote:
> On 02 Feb 2018 11:26, Michelle Sullivan wrote:
>
> Hi Michelle,
>
>> Michelle Sullivan wrote:
>>> Michelle Sullivan wrote:
>>>> So far (few hours in) zfs import -fFX has not faulted with this 
>>>> image...
>>>> it's running out of memory currently about 16G of 32G- however 9.2-P15
>>>> kernel died within minutes... out of memory (all 32G and swap) so am
>>>> more optimistic at the moment...  Fingers Crossed.
>>>>
>>> And the answer:
>>>
>>> 11-STABLE on a USB stick.
>>>
>>> Remove the drive that was replacing the hotspare (ie the replacement
>>> drive for the one that initially died)
>>> zpool import -fFX storage
>>> zpool export storage
>>>
>>> reboot back to 9.x
>>> zpool import storage
>>> re-insert drive replacement drive.
>>> reboot
>> Gotta thank people for this again, saved me again this time on a 
>> non-FreeBSD system this time (with a lot of using a modified 
>> recoverdisk for OSX - thanks PSK@)... Lost 3 disks out of a raidz2 
>> and 2 more had read errors on some sectors.. don't know how much (if 
>> any) data I've lost but at least it's not a rebuild from back up of 
>> all 48TB..
>
> What about the root-cause ?

3 disks died whilst the server was in transit from Malta to Australia 
(and I'm surprised that was all considering the state of some of the 
stuff that came out of the container - have a 3kva UPS that is 
completely destroyed despite good packing.)
> Sounds like you had 5 disks dying at the same time ?

Turns out that one of the 3 that had 'red lights on' had bad sectors, 
the other 2 were just excluded by the BIOS...  I did a byte copy onto 
new drives found no read errors so put them back in and forced them 
online.  The other 1 had 78k of bytes unreadable so new disk went in and 
an convinced the controller that it was the same disk as the one it 
replaced, the export/import produced 2 more disks unrecoverable read 
errors that nothing had flagged previously, so byte copied them onto new 
drives and the import -fFX is currently working (5 hours so far)...

> Do you periodically run long smart tests ?

Yup (fully automated.)

> Zpool scrubs ?
>

Both servers took a zpool scrub before they were packed into the 
containers... the second one came out unscathed... but then most stuff 
in the second container came out unscathed unlike the first....

Regards,

-- 
Michelle Sullivan
http://www.mhix.org/