Date: Tue, 07 Apr 2020 19:10:07 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 245430] ZFS boot failure following memory exhaustion Message-ID: <bug-245430-227@https.bugs.freebsd.org/bugzilla/>
index | next in thread | raw e-mail
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=245430 Bug ID: 245430 Summary: ZFS boot failure following memory exhaustion Product: Base System Version: 12.1-RELEASE Hardware: Any OS: Any Status: New Severity: Affects Some People Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: jwb@freebsd.org The error message here is about the same as PR 221077, but I'm opening a new PR due to differences in configuration and some new info. The exact error is as follows: ZFS: i/o error - all block copies unavailable ZFS: can't read object set for dataset u ZFS: can't open root filesystem gptzfsboot: failed to mount default pool zroot This occurred following a memory exhaustion event that flooded my messages file with entries like the following: Apr 6 09:56:14 compute-001 kernel: swap_pager_getswapspace(11): failed This was caused by some computational jobs exhausting available memory. The system was still up and accepting logins, but was no longer reading or writing the ZFS pool. All attempts to reboot resulted in the message above. The system was installed recently using default ZFS configuration provided by bsdinstall on 4 disks configured as RAID-0 on a Dell PERC H700 (which does not support JBOD). I.e. ZFS is running a RAID-Z on mfid0 through mfid3. I updated an old forum thread on the issue and included my successful fix: https://forums.freebsd.org/threads/10-1-doesnt-boot-anymore-from-zroot-after-applying-p25.54422/ Unlike that thread, this did not appear to be triggered by an upgrade. The gist of it is that some of the datasets (filesystems) appear to have been corrupted and the out of swap errors seem likely to be the cause. zpool scrub did not find any errors. All drives are reported as online and optimal by the RAID controller. My fix was as follows: Boot from USB drive, go to live image, log in as root. # mount -u -o rw / # Allow creating directories on USB drive # zpool import -R /mnt -fF zroot # cd /mnt # mount zroot/ROOT/default # Not mounted by default, canmount defaults to noauto # mv boot boot.orig # cp -Rp /boot . # Note -p to make sure permissions are correct in the new /boot # cp boot.orig/loader.conf boot/ # Restore customizations # cp boot.orig/zfs/zpool.cache boot/zfs/ # cd # zfs get canmount # zfs set canmount=on var/log (and a couple others that did not match defaults) # zpool export # reboot After successful reboot, ran "freebsd-update fetch install" and rebooted again, so my /boot would be up-to-date. Everything seems fine now. I've made a backup of my /boot directory and plan to do so following every freebsd-update so I can hopefully correct this quickly if it happens again. I am seeing the same error on a workstation using 4 vanilla SATA ports, but have not had physical access to it due to COVID 19. This is the first time I've seen the error without the presence of an underlying hardware RAID. I'll update this thread when I can gather more information. -- You are receiving this mail because: You are the assignee for the bug.help
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-245430-227>
