From owner-freebsd-fs@freebsd.org Sun Mar 10 02:29:44 2019 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D42D715434AC for ; Sun, 10 Mar 2019 02:29:43 +0000 (UTC) (envelope-from michelle@sorbs.net) Received: from hades.sorbs.net (hades.sorbs.net [72.12.213.40]) by mx1.freebsd.org (Postfix) with ESMTP id D9AA78754B; Sun, 10 Mar 2019 02:29:39 +0000 (UTC) (envelope-from michelle@sorbs.net) MIME-version: 1.0 Content-transfer-encoding: 7BIT Content-type: text/plain; CHARSET=US-ASCII; format=flowed Received: from isux.com (gate.mhix.org [203.206.128.220]) by hades.sorbs.net (Oracle Communications Messaging Server 7.0.5.29.0 64bit (built Jul 9 2013)) with ESMTPSA id <0PO400A7XNFBBG10@hades.sorbs.net>; Sat, 09 Mar 2019 17:42:50 -0800 (PST) Subject: Re: ZFS pool faulted (corrupt metadata) but the disk data appears ok... From: Michelle Sullivan To: Ben RUBSON , "freebsd-fs@freebsd.org" , Stefan Esser References: <54D3E9F6.20702@sorbs.net> <54D41608.50306@delphij.net> <54D41AAA.6070303@sorbs.net> <54D41C52.1020003@delphij.net> <54D424F0.9080301@sorbs.net> <54D47F94.9020404@freebsd.org> <54D4A552.7050502@sorbs.net> <54D4BB5A.30409@freebsd.org> <54D8B3D8.6000804@sorbs.net> <54D8CECE.60909@freebsd.org> <54D8D4A1.9090106@sorbs.net> <54D8D5DE.4040906@sentex.net> <54D8D92C.6030705@sorbs.net> <54D8E189.40201@sorbs.net> <54D924DD.4000205@sorbs.net> <54DCAC29.8000301@sorbs.net> <9c995251-45f1-cf27-c4c8-30a4bd0f163c@sorbs.net> <8282375D-5DDC-4294-A69C-03E9450D9575@gmail.com> <73dd7026-534e-7212-a037-0cbf62a61acd@sorbs.net> <027070fb-f7b5-3862-3a52-c0f280ab46d1@sorbs.net> <42C31457-1A84-4CCA-BF14-357F1F3177DA@gmail.com> <5eb35692-37ab-33bf-aea1-9f4aa61bb7f7@sorbs.net> Message-id: <3be04f0b-bded-9b77-896b-631824a14c4a@sorbs.net> Date: Sun, 10 Mar 2019 12:29:26 +1100 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:51.0) Gecko/20100101 Firefox/51.0 SeaMonkey/2.48 In-reply-to: <5eb35692-37ab-33bf-aea1-9f4aa61bb7f7@sorbs.net> X-Rspamd-Queue-Id: D9AA78754B X-Spamd-Bar: - Authentication-Results: mx1.freebsd.org; spf=pass (mx1.freebsd.org: domain of michelle@sorbs.net designates 72.12.213.40 as permitted sender) smtp.mailfrom=michelle@sorbs.net X-Spamd-Result: default: False [-1.46 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+a:hades.sorbs.net]; MX_GOOD(-0.01)[battlestar.sorbs.net,anaconda.sorbs.net,ninja.sorbs.net,catapilla.sorbs.net,scorpion.sorbs.net,desperado.sorbs.net]; FREEMAIL_TO(0.00)[gmail.com]; RCVD_NO_TLS_LAST(0.10)[]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; R_DKIM_NA(0.00)[]; CTE_CASE(0.50)[]; ASN(0.00)[asn:11114, ipnet:72.12.192.0/19, country:US]; MID_RHS_MATCH_FROM(0.00)[]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.97)[-0.971,0]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; NEURAL_SPAM_SHORT(0.23)[0.229,0]; NEURAL_HAM_LONG(-1.00)[-0.995,0]; TAGGED_RCPT(0.00)[]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[sorbs.net]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[40.213.12.72.list.dnswl.org : 127.0.10.0]; IP_SCORE(-0.01)[country: US(-0.07)]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 Mar 2019 02:29:44 -0000 Michelle Sullivan wrote: > Ben RUBSON wrote: >> On 02 Feb 2018 21:48, Michelle Sullivan wrote: >> >>> Ben RUBSON wrote: >>> >>>> So disks died because of the carrier, as I assume the second >>>> unscathed server was OK... >>> >>> Pretty much. >>> >>>> Heads must have scratched the platters, but they should have been >>>> parked, so... Really strange. >>> >>> You'd have thought... though 2 of the drives look like it was wear >>> and wear issues (the 2 not showing red lights) just not picked up on >>> the periodic scrub.... Could be that the recovery showed that one >>> up... you know - how you can have an array working fine, but one >>> disk dies then others fail during the rebuild because of the extra >>> workload. >> >> Yes... To try to mitigate this, when I add a new vdev to a pool, I >> spread the new disks I have among the existing vdevs, and construct >> the new vdev with the remaining new disk(s) + other disks retrieved >> from the other vdevs. Thus, when possible, avoiding vdevs with all >> disks at the same runtime. >> However I only use mirrors, applying this with raid-Z could be a >> little bit more tricky... >> > Believe it or not... > > # zpool status -v > pool: VirtualDisks > state: ONLINE > status: One or more devices are configured to use a non-native block > size. > Expect reduced performance. > action: Replace affected devices with devices that support the > configured block size, or migrate data to a properly configured > pool. > scan: none requested > config: > > NAME STATE READ WRITE CKSUM > VirtualDisks ONLINE 0 0 0 > zvol/sorbs/VirtualDisks ONLINE 0 0 0 block size: > 512B configured, 8192B native > > errors: No known data errors > > pool: sorbs > state: ONLINE > scan: resilvered 2.38T in 307445734561816429h29m with 0 errors on > Sat Aug 26 09:26:53 2017 > config: > > NAME STATE READ WRITE CKSUM > sorbs ONLINE 0 0 0 > raidz2-0 ONLINE 0 0 0 > mfid0 ONLINE 0 0 0 > mfid1 ONLINE 0 0 0 > mfid7 ONLINE 0 0 0 > mfid8 ONLINE 0 0 0 > mfid12 ONLINE 0 0 0 > mfid10 ONLINE 0 0 0 > mfid14 ONLINE 0 0 0 > mfid11 ONLINE 0 0 0 > mfid6 ONLINE 0 0 0 > mfid15 ONLINE 0 0 0 > mfid2 ONLINE 0 0 0 > mfid3 ONLINE 0 0 0 > spare-12 ONLINE 0 0 3 > mfid13 ONLINE 0 0 0 > mfid9 ONLINE 0 0 0 > mfid4 ONLINE 0 0 0 > mfid5 ONLINE 0 0 0 > spares > 185579620420611382 INUSE was /dev/mfid9 > > errors: No known data errors > > > It would appear that the when I replaced the damaged drives it picked > one of them up as being rebuilt from back in August (before it was > packed up to go) and that was why it saw it as 'corrupted metadata' > and spent the last 3 weeks importing it, it rebuilt it as it was > importing it.. no dataloss that I can determine. (literally just > finished in the middle of the night here.) > And back to this little nutmeg... We had a fire last night ... and it (the same pool) was resilvering again... Corrupted the metadata.. import -fFX worked and it started rebuilding, then during the early hours when the pool was at 50%(ish) rebuilt/resilvered (one vdev) there was at last one more issue on the powerline... UPSs went out after multiple hits and now can't get it imported - the server was in single user mode - on a FBSD-12 USB stick ... so it was only resilvering... "zdb -AAA -L -uhdi -FX -e storage" returns sanely... anyone any thoughts how I might get the data back/pool to import? (zpool import -fFX storage spends a long time working and eventually comes back with unable to import as one or more of the vdevs are unavailable - however they are all there as far as I can tell) THanks, -- Michelle Sullivan http://www.mhix.org/