From owner-freebsd-fs@freebsd.org  Fri Feb  2 16:12:06 2018
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3CDD1EE14A0
 for <freebsd-fs@mailman.ysv.freebsd.org>; Fri,  2 Feb 2018 16:12:06 +0000 (UTC)
 (envelope-from ben.rubson@gmail.com)
Received: from mail-wm0-x234.google.com (mail-wm0-x234.google.com
 [IPv6:2a00:1450:400c:c09::234])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 9ED8571AFE
 for <freebsd-fs@freebsd.org>; Fri,  2 Feb 2018 16:12:05 +0000 (UTC)
 (envelope-from ben.rubson@gmail.com)
Received: by mail-wm0-x234.google.com with SMTP id g1so13687075wmg.2
 for <freebsd-fs@freebsd.org>; Fri, 02 Feb 2018 08:12:05 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:subject:from:in-reply-to:date
 :content-transfer-encoding:message-id:references:to;
 bh=dBa/yMdx6UX1aeD9NcSfriEjh6/EApfZa0CM0crQLzs=;
 b=Wb9emGNi6q8y0ooORDvnp/bWimKNU3uZNz+oEKCcDEQ2bKE0P27lb6SSoEQutXiMZb
 dc+aD+Q/1gtM38pgFap3zyKYdYygJOME7Z4LiO1JnzkyZkn9o+KS802TZ4Qn5M0TOs4P
 kzrUb9GQLSosZRfMdenkgCoWwdUwyBPsk2OZOegPkLopLszLd9hF2pTv2kIzBJZR/yMj
 KfAhKCYUDV+KCw3tY1ul4RWF2Inuh/Ov7lbtV31ua2/lJV9SuYCizW3oTLMgsSNc6e6j
 CogxwKXj+KUjnbP2AxH1JIEQMPbj+s1mwxs8POYsLGEhLyUWEyM0eXn1+E96UAL5+JYw
 xWfw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:subject:from:in-reply-to:date
 :content-transfer-encoding:message-id:references:to;
 bh=dBa/yMdx6UX1aeD9NcSfriEjh6/EApfZa0CM0crQLzs=;
 b=iq5pHWVDrw04P3q9GbIqZdwKHIa8yE9dXQtmNyuPzDi64NYQuZ17CX+u/R6X12pYei
 8IayZpYnFyc61L2X/8zldpExOTNRap0QZna/RaqyMYuHlPCWgL+wTCDPVX3gSjUXo6yx
 a5Wny+IMSHzhx39vj7EhKfVQhUIVbp8QN7X/N3Bk8LeGi0fbpxic087UkrY6BkjIClix
 c1Z3biVfKBE2/Otxpp1PmnrJUYE5ApfMO2IhSfwI+yGV9IoAC9gaNkYMJgdp/0fXaDty
 zR4AjfFGqwv9SLcRQ/ngNv5X1wBGwRGjm0NH4IENF2kTgyXzMLO1c6Bvj2HZRRPutbbw
 S9iQ==
X-Gm-Message-State: AKwxytdkf+v/s2+q6MHDZOJT5xORdEvhXSbsmXCo/8/yhDTJOUYVanQa
 BrFq0fRMFvYPowoDS+jMSNWaF5ol
X-Google-Smtp-Source: AH8x224Y7FCrhjfVWAZ1wV9iZSpm6gd12q0sMtAtEgCGCU8qOhtwR9mcu4x7hh+yrk2iKxYtYP/DCA==
X-Received: by 10.28.150.86 with SMTP id y83mr23361250wmd.42.1517587924429;
 Fri, 02 Feb 2018 08:12:04 -0800 (PST)
Received: from bens-mac.home (LFbn-MAR-1-467-113.w2-15.abo.wanadoo.fr.
 [2.15.58.113])
 by smtp.gmail.com with ESMTPSA id b9sm1202524wmc.14.2018.02.02.08.12.02
 for <freebsd-fs@freebsd.org>
 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128);
 Fri, 02 Feb 2018 08:12:03 -0800 (PST)
Content-Type: text/plain; charset=us-ascii; delsp=yes; format=flowed
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
Subject: Re: ZFS pool faulted (corrupt metadata) but the disk data appears
 ok...
From: Ben RUBSON <ben.rubson@gmail.com>
In-Reply-To: <73dd7026-534e-7212-a037-0cbf62a61acd@sorbs.net>
Date: Fri, 2 Feb 2018 17:12:01 +0100
Content-Transfer-Encoding: 7bit
Message-Id: <FAB7C3BA-057F-4AB4-96E1-5C3208BABBA7@gmail.com>
References: <54D3E9F6.20702@sorbs.net> <54D41608.50306@delphij.net>
 <54D41AAA.6070303@sorbs.net> <54D41C52.1020003@delphij.net>
 <54D424F0.9080301@sorbs.net> <54D47F94.9020404@freebsd.org>
 <54D4A552.7050502@sorbs.net> <54D4BB5A.30409@freebsd.org>
 <54D8B3D8.6000804@sorbs.net> <54D8CECE.60909@freebsd.org>
 <54D8D4A1.9090106@sorbs.net> <54D8D5DE.4040906@sentex.net>
 <54D8D92C.6030705@sorbs.net> <54D8E189.40201@sorbs.net>
 <54D924DD.4000205@sorbs.net> <54DCAC29.8000301@sorbs.net>
 <9c995251-45f1-cf27-c4c8-30a4bd0f163c@sorbs.net>
 <8282375D-5DDC-4294-A69C-03E9450D9575@gmail.com>
 <73dd7026-534e-7212-a037-0cbf62a61acd@sorbs.net>
To: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
X-Mailer: Apple Mail (2.3124)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.25
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 02 Feb 2018 16:12:06 -0000

On 02 Feb 2018 11:51, Michelle Sullivan wrote:

> Ben RUBSON wrote:
>> On 02 Feb 2018 11:26, Michelle Sullivan wrote:
>>
>> Hi Michelle,
>>
>>> Michelle Sullivan wrote:
>>>> Michelle Sullivan wrote:
>>>>> So far (few hours in) zfs import -fFX has not faulted with this  
>>>>> image...
>>>>> it's running out of memory currently about 16G of 32G- however 9.2-P15
>>>>> kernel died within minutes... out of memory (all 32G and swap) so am
>>>>> more optimistic at the moment...  Fingers Crossed.
>>>> And the answer:
>>>>
>>>> 11-STABLE on a USB stick.
>>>>
>>>> Remove the drive that was replacing the hotspare (ie the replacement
>>>> drive for the one that initially died)
>>>> zpool import -fFX storage
>>>> zpool export storage
>>>>
>>>> reboot back to 9.x
>>>> zpool import storage
>>>> re-insert drive replacement drive.
>>>> reboot
>>> Gotta thank people for this again, saved me again this time on a  
>>> non-FreeBSD system this time (with a lot of using a modified  
>>> recoverdisk for OSX - thanks PSK@)... Lost 3 disks out of a raidz2 and  
>>> 2 more had read errors on some sectors.. don't know how much (if any)  
>>> data I've lost but at least it's not a rebuild from back up of all  
>>> 48TB..
>>
>> What about the root-cause ?
>
> 3 disks died whilst the server was in transit from Malta to Australia  
> (and I'm surprised that was all considering the state of some of the  
> stuff that came out of the container - have a 3kva UPS that is completely  
> destroyed despite good packing.)
>> Sounds like you had 5 disks dying at the same time ?
>
> Turns out that one of the 3 that had 'red lights on' had bad sectors, the  
> other 2 were just excluded by the BIOS...  I did a byte copy onto new  
> drives found no read errors so put them back in and forced them online.   
> The other 1 had 78k of bytes unreadable so new disk went in and an  
> convinced the controller that it was the same disk as the one it  
> replaced, the export/import produced 2 more disks unrecoverable read  
> errors that nothing had flagged previously, so byte copied them onto new  
> drives and the import -fFX is currently working (5 hours so far)...
>
>> Do you periodically run long smart tests ?
>
> Yup (fully automated.)
>
>> Zpool scrubs ?
>
> Both servers took a zpool scrub before they were packed into the  
> containers... the second one came out unscathed... but then most stuff in  
> the second container came out unscathed unlike the first....

What a story ! Thanks for the details.

So disks died because of the carrier, as I assume the second unscathed  
server was OK...
Heads must have scratched the platters, but they should have been parked,  
so... Really strange.

Hope you'll recover your whole pool.

Ben