From owner-freebsd-fs@freebsd.org Fri Feb 2 16:12:06 2018 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3CDD1EE14A0 for ; Fri, 2 Feb 2018 16:12:06 +0000 (UTC) (envelope-from ben.rubson@gmail.com) Received: from mail-wm0-x234.google.com (mail-wm0-x234.google.com [IPv6:2a00:1450:400c:c09::234]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 9ED8571AFE for ; Fri, 2 Feb 2018 16:12:05 +0000 (UTC) (envelope-from ben.rubson@gmail.com) Received: by mail-wm0-x234.google.com with SMTP id g1so13687075wmg.2 for ; Fri, 02 Feb 2018 08:12:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; bh=dBa/yMdx6UX1aeD9NcSfriEjh6/EApfZa0CM0crQLzs=; b=Wb9emGNi6q8y0ooORDvnp/bWimKNU3uZNz+oEKCcDEQ2bKE0P27lb6SSoEQutXiMZb dc+aD+Q/1gtM38pgFap3zyKYdYygJOME7Z4LiO1JnzkyZkn9o+KS802TZ4Qn5M0TOs4P kzrUb9GQLSosZRfMdenkgCoWwdUwyBPsk2OZOegPkLopLszLd9hF2pTv2kIzBJZR/yMj KfAhKCYUDV+KCw3tY1ul4RWF2Inuh/Ov7lbtV31ua2/lJV9SuYCizW3oTLMgsSNc6e6j CogxwKXj+KUjnbP2AxH1JIEQMPbj+s1mwxs8POYsLGEhLyUWEyM0eXn1+E96UAL5+JYw xWfw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; bh=dBa/yMdx6UX1aeD9NcSfriEjh6/EApfZa0CM0crQLzs=; b=iq5pHWVDrw04P3q9GbIqZdwKHIa8yE9dXQtmNyuPzDi64NYQuZ17CX+u/R6X12pYei 8IayZpYnFyc61L2X/8zldpExOTNRap0QZna/RaqyMYuHlPCWgL+wTCDPVX3gSjUXo6yx a5Wny+IMSHzhx39vj7EhKfVQhUIVbp8QN7X/N3Bk8LeGi0fbpxic087UkrY6BkjIClix c1Z3biVfKBE2/Otxpp1PmnrJUYE5ApfMO2IhSfwI+yGV9IoAC9gaNkYMJgdp/0fXaDty zR4AjfFGqwv9SLcRQ/ngNv5X1wBGwRGjm0NH4IENF2kTgyXzMLO1c6Bvj2HZRRPutbbw S9iQ== X-Gm-Message-State: AKwxytdkf+v/s2+q6MHDZOJT5xORdEvhXSbsmXCo/8/yhDTJOUYVanQa BrFq0fRMFvYPowoDS+jMSNWaF5ol X-Google-Smtp-Source: AH8x224Y7FCrhjfVWAZ1wV9iZSpm6gd12q0sMtAtEgCGCU8qOhtwR9mcu4x7hh+yrk2iKxYtYP/DCA== X-Received: by 10.28.150.86 with SMTP id y83mr23361250wmd.42.1517587924429; Fri, 02 Feb 2018 08:12:04 -0800 (PST) Received: from bens-mac.home (LFbn-MAR-1-467-113.w2-15.abo.wanadoo.fr. [2.15.58.113]) by smtp.gmail.com with ESMTPSA id b9sm1202524wmc.14.2018.02.02.08.12.02 for (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Fri, 02 Feb 2018 08:12:03 -0800 (PST) Content-Type: text/plain; charset=us-ascii; delsp=yes; format=flowed Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: ZFS pool faulted (corrupt metadata) but the disk data appears ok... From: Ben RUBSON In-Reply-To: <73dd7026-534e-7212-a037-0cbf62a61acd@sorbs.net> Date: Fri, 2 Feb 2018 17:12:01 +0100 Content-Transfer-Encoding: 7bit Message-Id: References: <54D3E9F6.20702@sorbs.net> <54D41608.50306@delphij.net> <54D41AAA.6070303@sorbs.net> <54D41C52.1020003@delphij.net> <54D424F0.9080301@sorbs.net> <54D47F94.9020404@freebsd.org> <54D4A552.7050502@sorbs.net> <54D4BB5A.30409@freebsd.org> <54D8B3D8.6000804@sorbs.net> <54D8CECE.60909@freebsd.org> <54D8D4A1.9090106@sorbs.net> <54D8D5DE.4040906@sentex.net> <54D8D92C.6030705@sorbs.net> <54D8E189.40201@sorbs.net> <54D924DD.4000205@sorbs.net> <54DCAC29.8000301@sorbs.net> <9c995251-45f1-cf27-c4c8-30a4bd0f163c@sorbs.net> <8282375D-5DDC-4294-A69C-03E9450D9575@gmail.com> <73dd7026-534e-7212-a037-0cbf62a61acd@sorbs.net> To: "freebsd-fs@freebsd.org" X-Mailer: Apple Mail (2.3124) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Feb 2018 16:12:06 -0000 On 02 Feb 2018 11:51, Michelle Sullivan wrote: > Ben RUBSON wrote: >> On 02 Feb 2018 11:26, Michelle Sullivan wrote: >> >> Hi Michelle, >> >>> Michelle Sullivan wrote: >>>> Michelle Sullivan wrote: >>>>> So far (few hours in) zfs import -fFX has not faulted with this >>>>> image... >>>>> it's running out of memory currently about 16G of 32G- however 9.2-P15 >>>>> kernel died within minutes... out of memory (all 32G and swap) so am >>>>> more optimistic at the moment... Fingers Crossed. >>>> And the answer: >>>> >>>> 11-STABLE on a USB stick. >>>> >>>> Remove the drive that was replacing the hotspare (ie the replacement >>>> drive for the one that initially died) >>>> zpool import -fFX storage >>>> zpool export storage >>>> >>>> reboot back to 9.x >>>> zpool import storage >>>> re-insert drive replacement drive. >>>> reboot >>> Gotta thank people for this again, saved me again this time on a >>> non-FreeBSD system this time (with a lot of using a modified >>> recoverdisk for OSX - thanks PSK@)... Lost 3 disks out of a raidz2 and >>> 2 more had read errors on some sectors.. don't know how much (if any) >>> data I've lost but at least it's not a rebuild from back up of all >>> 48TB.. >> >> What about the root-cause ? > > 3 disks died whilst the server was in transit from Malta to Australia > (and I'm surprised that was all considering the state of some of the > stuff that came out of the container - have a 3kva UPS that is completely > destroyed despite good packing.) >> Sounds like you had 5 disks dying at the same time ? > > Turns out that one of the 3 that had 'red lights on' had bad sectors, the > other 2 were just excluded by the BIOS... I did a byte copy onto new > drives found no read errors so put them back in and forced them online. > The other 1 had 78k of bytes unreadable so new disk went in and an > convinced the controller that it was the same disk as the one it > replaced, the export/import produced 2 more disks unrecoverable read > errors that nothing had flagged previously, so byte copied them onto new > drives and the import -fFX is currently working (5 hours so far)... > >> Do you periodically run long smart tests ? > > Yup (fully automated.) > >> Zpool scrubs ? > > Both servers took a zpool scrub before they were packed into the > containers... the second one came out unscathed... but then most stuff in > the second container came out unscathed unlike the first.... What a story ! Thanks for the details. So disks died because of the carrier, as I assume the second unscathed server was OK... Heads must have scratched the platters, but they should have been parked, so... Really strange. Hope you'll recover your whole pool. Ben