Date: Sat, 17 Feb 2018 07:41:49 +0000 (UTC) From: James Phillips <anti_spam256@yahoo.ca> To: <freebsd-fs@freebsd.org> Subject: ZFS trashed by bad import Message-ID: <366879496.203508.1518853309719@mail.yahoo.com> References: <366879496.203508.1518853309719.ref@mail.yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Was considering posting this on the forum: but the rules on topic selection suggested really specific things should be on the mailing list. Short version (reconstructed from notes): On a fresh 11.1 install: # zpool import -> shows list of available pools, including degraded striped mirror. # zpool import -f 8255478166520290766 granny # zpool status (any zpool command cause same error): internal error: failed to initialize ZFS library Upon reboot, I was not able to switch VT consoles or log in. Tried telling the BIOS to boot from my old installation (granny), and it failed after kernel device detection. *Background*: Granny was originally a 160GB ZFS mirrored with FreeBSD 10. I later expanded the pool with a mirrored pair of 80GB drives. I had successfully tested booting with a simulated controller failure. (each mirror was on a different disk controller + all drives had a boot partition set) About a week ago, one of my drives appeared to fail: (ada1:ata2:0:1:0): Error 5. retries exhausted GEOM_ELI: g_eli_read_done() failed (error=5) label/granny3p1.eli[READ(offset=32083968, lengt... swap_pager: I/O error - pagein failed; blkno 2367129, size 4096, erro 5 va_fault: pager read error, pid 91969 (xfdesktop) (ada1:ata2:0:1:0): READ_DMA. ACB: c8 00 32 9b 00 40 00 00 00 00 18 00 (ada1:ata2:0:1:0): CAM status: ATA Status Error (ada1:ata2:0:1:0): ATA status: 51 (DRDY SERV ERR), error: 40 (UNC ) (ada1:ata2:0:1:0): RES: 51 40 42 9b 00 00 00 00 00 08 (ada1:ata2:0:1:0): Retrying command (ada1:ata2:0:1:0): READ_DMA. ACB: c8 00 32 9b 00 40 00 00 00 00 18 00 ... I was able to (temporarily) use my computer again by pulling one of the IDE cables. (by luck guessed which side the first time -- did not notice the label above until I typed this.) Was a little surprised it was not the drive re-certified by manufacturer software after throwing errors (years ago). I decided to resolve the problem by moving to a ZFS mirror on a pair of 2TB drives. Incidentally, I accidentally deleted pkg while trying to update the ports collection, so decided a fresh BSD 11 install may be a good idea as well. *Confounding variables*: While pulling the defective half of the mirror, I tentatively ruled heat death due to dust build up on air intake. However, I also noticed the Northbridge heatsink was loose due to a broken clip. Because my "real" machine (with ECC RAM even) is going to be delayed at least a week, I decided to do a temporary board swap with an older machine I had laying around. This machine was overclocked by under-volting, and pushing thermal limits of the CPU (while under-clocking RAM), then backing off a bit to tolerate summer heat. I mention the over-clocking because the system failed to boot properly after installation. I bumped the voltage a little, but it may have had to do with BIOS Booting from an unexpected drive instead. (the 2TB disks were seen as ad2 and ad3). The Over-clock was stable when that machine went in storage around a year ago. However it is now in a case with a different PSU (same wattage, more efficient), and more drives. Tried all the ZFS options in the BSD 11 install wizard: 2 disk mirror 4k sectors - GPT partition Encrypted disks - 50GB swap (large for the memory: 3200MB) Mirror swap - Encrypt swap -> Note: granny only had encrypted (non mirrored) swap: could not get encrypted striping to work. System hardening: - clean /tmp on startup - disable opening sylogd network socket At the time of the failure, I was running mprime (prime95) in the back-ground, and periodically monitoring CPU temperature and fan speed. This implies that ZFS had only ~1600MB to work with (3200MB-1600MB used by mprime) *Next Steps*: 1. image all 4 drives (one at a time) onto a third 2TB drive with the System Rescue CD and dd-rescue. 2. Try to import the degraded mirror with a BSD live DVD (and re-export if successful, I guess) Depending on results of step 2: - find machine with ECC RAM, put granny3 on a fresh drive, and tell ZFS to scrub? - copy over boot partitions that may have been clobbered by BSD 11 install? If all else fails, I did do a full export in the last 90 days. Regards, James Phillips Note: not subscribed to the list.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?366879496.203508.1518853309719>