Date: Sun, 31 Mar 2019 23:11:46 +1100 From: Michelle Sullivan <michelle@sorbs.net> To: Stefan Esser <se@freebsd.org>, "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org> Subject: Re: zfs corruption (again) due to interupted resilver and power faults. Message-ID: <bb0b900c-b83e-915f-5ff7-c0e6cb4e6e1b@sorbs.net> In-Reply-To: <8a4047bd-d6fe-12a3-a659-bdaa387d90ae@freebsd.org> References: <fdfbf579-db87-a173-5fc1-1364bf091ca2@sorbs.net> <CAGMYy3ufghESzidxz0ss%2BYtGmL0dvK6t6Sct%2BghX%2BjNCpJi4sw@mail.gmail.com> <21433606-416E-4BB9-9D17-01339F53E3B4@sorbs.net> <f4937be2-2026-d70d-d4cf-f315e3a8a9bd@sorbs.net> <8a4047bd-d6fe-12a3-a659-bdaa387d90ae@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Stefan Esser wrote: > Am 20.03.19 um 08:15 schrieb Michelle Sullivan: >> Michelle Sullivan wrote: >>> Trying now thanks (and no I hadn’t - wasn’t aware of the sysctl) >> Failed with the same old... >> >> http://flashback.sorbs.net/packages/zfs/image6.jpeg > Hi Michelle, > > when I was in a somewhat similar situation, I recovered my pool > (at least to copy it to new disk drives) by patching the ZFS code > to ignore certain error aborts. > > Testing is possible with zdb, since it uses the same source files > as the kernel module for all ZFS accesses. > > I identified the test that failed and made it non-fatal (issue a > warning but continue). This lead to inconsistent checksums, since > they were not correctly updated in the failure case. I had to make > these checksum checks non-fatal, too. > > All testing can be done by issuing zdb commands, but I do not > remember the exact options. Option -AAA is at least required, to > make most checks non-fatal, but it was not sufficient. > > I cannot offer any more specific help, I'm afraid. > > Good luck in recovering your pool! > > Regards, STefan Finally made progress.. Booted 12-STABLE on a USB key - installed to a USB external drive and booted that. Built a debug kernel, installed and booted it, then installed mdb... after playing with it and getting no symbol errors finally worked it out... This worked. *root@colossus:/usr/src # mdb -Mkwe "spa_load_verify_metadata/W 0" Preloading module symbols: [ kernel uhid.ko ums.ko mac_ntpd.ko zfs.ko opensolaris.ko ] zfs.ko`spa_load_verify_metadata:0x1 = 0x0 Segmentation fault (core dumped) root@colossus:/usr/src #* (I had already worked out with mdb *spa_load_verify_metadata=0* causes a 'LOADED' state)... Then I was able to run the following, I had already noted and identified transaction 24628146 was the latest, but the latest that was 'complete' (commited/uncorrupt) is 24628138 so... root@colossus:/usr/src # zpool import -fT 24628138 storage cannot mount 'storage': Input/output error Unsupported share protocol: 1. root@colossus:/usr/src # zpool status -v pool: storage state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Thu Mar 7 19:06:14 2019 14.9T scanned at 2.06G/s, 13.4T issued at 615M/s, 28.8T total 863G resilvered, 46.39% done, 0 days 07:19:25 to go config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 2 raidz2-0 ONLINE 0 0 8 mfid8 ONLINE 0 0 0 mfid7 ONLINE 0 0 0 mfid12 ONLINE 0 0 0 mfid11 ONLINE 0 0 0 mfid0 ONLINE 0 0 0 mfid5 ONLINE 0 0 0 mfid4 ONLINE 0 0 0 mfid3 ONLINE 0 0 0 mfid2 ONLINE 0 0 0 spare-9 ONLINE 0 0 4.38K mfid14 ONLINE 0 0 0 mfid15 ONLINE 0 0 0 mfid10 ONLINE 0 0 0 mfid6 ONLINE 0 0 0 mfid13 ONLINE 0 0 0 mfid9 ONLINE 0 0 0 mfid1 ONLINE 0 0 0 spares 12144659313369122799 INUSE was /dev/mfid15 errors: Permanent errors have been detected in the following files: <metadata>:<0x5d> storage:<0x0> root@colossus:/usr/src # So currently it appears imported but not mounted (don't care) and it's currently resilvering. When complete I intend to scrub, export and reimport which hopefully will have resolved the issues... will let you all know... but for the forums and archives.... This is a God-send: https://www.delphix.com/blog/openzfs-pool-import-recovery To get mdb working you *must* currently use -M to preload the modules. Regards, Michelle -- Michelle Sullivan http://www.mhix.org/
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bb0b900c-b83e-915f-5ff7-c0e6cb4e6e1b>