From owner-freebsd-fs@freebsd.org Sat Feb 17 07:52:05 2018 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E4DDFF17FF1 for ; Sat, 17 Feb 2018 07:52:04 +0000 (UTC) (envelope-from anti_spam256@yahoo.ca) Received: from sonic317-22.consmr.mail.gq1.yahoo.com (sonic317-22.consmr.mail.gq1.yahoo.com [98.137.66.148]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6D7D36D159 for ; Sat, 17 Feb 2018 07:52:03 +0000 (UTC) (envelope-from anti_spam256@yahoo.ca) X-YMail-OSG: 7gvGVHgVM1mWCj10VNqIoj8ZzhaGMfVT1Wc_mWk8i5ANG8cVLeHoyXgkqu2iCf7 1baXUDtZ0m77nnj16tuoBzx5PDwav9g3vmW65wPLRF2ZtXNTVbRpU1VPY9nU7Ao5Id2wII_JcRZB ddGsqbcwcYYTG2m.f0jmICDey8NtLfb62xp9u6usaas0ZTj2MhdVxEO1gK3hJM9RJJgaQi.NG4ci HNTkmnFKus5RTo_qpxCQp6eaK2DXrvaESgt7rCnvI9wPIY4jTRtU8Y9YRF29sPFeKul4lpsUxBcw qH82BRuojh7LxAtUyRPkQLjPRNXg8st3WSlo1gY9l7ZVP.uRktROCUtfALrdeciACvIb0NaPi8i9 jfAjq.VVu86L7tdyQOcD4Lfd3IcVR2NdCNSQCUoQEGqLaYIzgjlRtELziEtasg21tCwdpnbMBtN_ Dpu0NkR7Q8a_AfcvhpE2ey.0xwcCWutuU0ZjWO_Rd7mvpTnjYFTDrQlu40A-- Received: from sonic.gate.mail.ne1.yahoo.com by sonic317.consmr.mail.gq1.yahoo.com with HTTP; Sat, 17 Feb 2018 07:51:57 +0000 Date: Sat, 17 Feb 2018 07:41:49 +0000 (UTC) From: James Phillips Reply-To: James Phillips To: Message-ID: <366879496.203508.1518853309719@mail.yahoo.com> Subject: ZFS trashed by bad import MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit References: <366879496.203508.1518853309719.ref@mail.yahoo.com> X-Mailer: WebService/1.1.11419 YahooMailBasic Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:37.0) Gecko/20100101 Firefox/37.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Feb 2018 07:52:05 -0000 Was considering posting this on the forum: but the rules on topic selection suggested really specific things should be on the mailing list. Short version (reconstructed from notes): On a fresh 11.1 install: # zpool import -> shows list of available pools, including degraded striped mirror. # zpool import -f 8255478166520290766 granny # zpool status (any zpool command cause same error): internal error: failed to initialize ZFS library Upon reboot, I was not able to switch VT consoles or log in. Tried telling the BIOS to boot from my old installation (granny), and it failed after kernel device detection. *Background*: Granny was originally a 160GB ZFS mirrored with FreeBSD 10. I later expanded the pool with a mirrored pair of 80GB drives. I had successfully tested booting with a simulated controller failure. (each mirror was on a different disk controller + all drives had a boot partition set) About a week ago, one of my drives appeared to fail: (ada1:ata2:0:1:0): Error 5. retries exhausted GEOM_ELI: g_eli_read_done() failed (error=5) label/granny3p1.eli[READ(offset=32083968, lengt... swap_pager: I/O error - pagein failed; blkno 2367129, size 4096, erro 5 va_fault: pager read error, pid 91969 (xfdesktop) (ada1:ata2:0:1:0): READ_DMA. ACB: c8 00 32 9b 00 40 00 00 00 00 18 00 (ada1:ata2:0:1:0): CAM status: ATA Status Error (ada1:ata2:0:1:0): ATA status: 51 (DRDY SERV ERR), error: 40 (UNC ) (ada1:ata2:0:1:0): RES: 51 40 42 9b 00 00 00 00 00 08 (ada1:ata2:0:1:0): Retrying command (ada1:ata2:0:1:0): READ_DMA. ACB: c8 00 32 9b 00 40 00 00 00 00 18 00 ... I was able to (temporarily) use my computer again by pulling one of the IDE cables. (by luck guessed which side the first time -- did not notice the label above until I typed this.) Was a little surprised it was not the drive re-certified by manufacturer software after throwing errors (years ago). I decided to resolve the problem by moving to a ZFS mirror on a pair of 2TB drives. Incidentally, I accidentally deleted pkg while trying to update the ports collection, so decided a fresh BSD 11 install may be a good idea as well. *Confounding variables*: While pulling the defective half of the mirror, I tentatively ruled heat death due to dust build up on air intake. However, I also noticed the Northbridge heatsink was loose due to a broken clip. Because my "real" machine (with ECC RAM even) is going to be delayed at least a week, I decided to do a temporary board swap with an older machine I had laying around. This machine was overclocked by under-volting, and pushing thermal limits of the CPU (while under-clocking RAM), then backing off a bit to tolerate summer heat. I mention the over-clocking because the system failed to boot properly after installation. I bumped the voltage a little, but it may have had to do with BIOS Booting from an unexpected drive instead. (the 2TB disks were seen as ad2 and ad3). The Over-clock was stable when that machine went in storage around a year ago. However it is now in a case with a different PSU (same wattage, more efficient), and more drives. Tried all the ZFS options in the BSD 11 install wizard: 2 disk mirror 4k sectors - GPT partition Encrypted disks - 50GB swap (large for the memory: 3200MB) Mirror swap - Encrypt swap -> Note: granny only had encrypted (non mirrored) swap: could not get encrypted striping to work. System hardening: - clean /tmp on startup - disable opening sylogd network socket At the time of the failure, I was running mprime (prime95) in the back-ground, and periodically monitoring CPU temperature and fan speed. This implies that ZFS had only ~1600MB to work with (3200MB-1600MB used by mprime) *Next Steps*: 1. image all 4 drives (one at a time) onto a third 2TB drive with the System Rescue CD and dd-rescue. 2. Try to import the degraded mirror with a BSD live DVD (and re-export if successful, I guess) Depending on results of step 2: - find machine with ECC RAM, put granny3 on a fresh drive, and tell ZFS to scrub? - copy over boot partitions that may have been clobbered by BSD 11 install? If all else fails, I did do a full export in the last 90 days. Regards, James Phillips Note: not subscribed to the list.