From owner-freebsd-fs@freebsd.org Fri Feb 2 10:51:15 2018 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id DD11EECFDE0 for ; Fri, 2 Feb 2018 10:51:14 +0000 (UTC) (envelope-from michelle@sorbs.net) Received: from hades.sorbs.net (hades.sorbs.net [72.12.213.40]) by mx1.freebsd.org (Postfix) with ESMTP id 8750985E20 for ; Fri, 2 Feb 2018 10:51:14 +0000 (UTC) (envelope-from michelle@sorbs.net) MIME-version: 1.0 Content-transfer-encoding: 7BIT Content-type: text/plain; CHARSET=US-ASCII; format=flowed Received: from isux.com (203-206-128-220.perm.iinet.net.au [203.206.128.220]) by hades.sorbs.net (Oracle Communications Messaging Server 7.0.5.29.0 64bit (built Jul 9 2013)) with ESMTPSA id <0P3I003A8RWJK500@hades.sorbs.net> for freebsd-fs@freebsd.org; Fri, 02 Feb 2018 03:00:26 -0800 (PST) Subject: Re: ZFS pool faulted (corrupt metadata) but the disk data appears ok... To: Ben RUBSON , "freebsd-fs@freebsd.org" References: <54D3E9F6.20702@sorbs.net> <54D41608.50306@delphij.net> <54D41AAA.6070303@sorbs.net> <54D41C52.1020003@delphij.net> <54D424F0.9080301@sorbs.net> <54D47F94.9020404@freebsd.org> <54D4A552.7050502@sorbs.net> <54D4BB5A.30409@freebsd.org> <54D8B3D8.6000804@sorbs.net> <54D8CECE.60909@freebsd.org> <54D8D4A1.9090106@sorbs.net> <54D8D5DE.4040906@sentex.net> <54D8D92C.6030705@sorbs.net> <54D8E189.40201@sorbs.net> <54D924DD.4000205@sorbs.net> <54DCAC29.8000301@sorbs.net> <9c995251-45f1-cf27-c4c8-30a4bd0f163c@sorbs.net> <8282375D-5DDC-4294-A69C-03E9450D9575@gmail.com> From: Michelle Sullivan Message-id: <73dd7026-534e-7212-a037-0cbf62a61acd@sorbs.net> Date: Fri, 02 Feb 2018 21:51:01 +1100 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:51.0) Gecko/20100101 Firefox/51.0 SeaMonkey/2.48 In-reply-to: <8282375D-5DDC-4294-A69C-03E9450D9575@gmail.com> X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Feb 2018 10:51:15 -0000 Ben RUBSON wrote: > On 02 Feb 2018 11:26, Michelle Sullivan wrote: > > Hi Michelle, > >> Michelle Sullivan wrote: >>> Michelle Sullivan wrote: >>>> So far (few hours in) zfs import -fFX has not faulted with this >>>> image... >>>> it's running out of memory currently about 16G of 32G- however 9.2-P15 >>>> kernel died within minutes... out of memory (all 32G and swap) so am >>>> more optimistic at the moment... Fingers Crossed. >>>> >>> And the answer: >>> >>> 11-STABLE on a USB stick. >>> >>> Remove the drive that was replacing the hotspare (ie the replacement >>> drive for the one that initially died) >>> zpool import -fFX storage >>> zpool export storage >>> >>> reboot back to 9.x >>> zpool import storage >>> re-insert drive replacement drive. >>> reboot >> Gotta thank people for this again, saved me again this time on a >> non-FreeBSD system this time (with a lot of using a modified >> recoverdisk for OSX - thanks PSK@)... Lost 3 disks out of a raidz2 >> and 2 more had read errors on some sectors.. don't know how much (if >> any) data I've lost but at least it's not a rebuild from back up of >> all 48TB.. > > What about the root-cause ? 3 disks died whilst the server was in transit from Malta to Australia (and I'm surprised that was all considering the state of some of the stuff that came out of the container - have a 3kva UPS that is completely destroyed despite good packing.) > Sounds like you had 5 disks dying at the same time ? Turns out that one of the 3 that had 'red lights on' had bad sectors, the other 2 were just excluded by the BIOS... I did a byte copy onto new drives found no read errors so put them back in and forced them online. The other 1 had 78k of bytes unreadable so new disk went in and an convinced the controller that it was the same disk as the one it replaced, the export/import produced 2 more disks unrecoverable read errors that nothing had flagged previously, so byte copied them onto new drives and the import -fFX is currently working (5 hours so far)... > Do you periodically run long smart tests ? Yup (fully automated.) > Zpool scrubs ? > Both servers took a zpool scrub before they were packed into the containers... the second one came out unscathed... but then most stuff in the second container came out unscathed unlike the first.... Regards, -- Michelle Sullivan http://www.mhix.org/