Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 27 Jan 2022 14:57:03 +0100
From:      Ulrich =?utf-8?B?U3DDtnJsZWlu?= <uqs@freebsd.org>
To:        freebsd-stable@freebsd.org
Subject:   gptzfsboot can't boot from 4TB SSD
Message-ID:  <YfKkr%2BmC6rMrRlF9@acme.spoerlein.net>

next in thread | raw e-mail | index | archive | help
Hey folks, I'm stumped on what I assume is a BIOS bug...

Upgraded a system from 2013 and a 60GB SSD with UFS (plus GELI and ZFS) 
with a shiny new 4TB Samsung SSD, on ZFS and with some parts of the pool 
encrypted.

This was fine for half a year or so, but I noticed a stream of `zio_read 
error 5` from boot0 (I think?) sometimes during boot on the serial 
console, but it came up fine anyway.

Did another installworld/installkernel dance yesterday and the system no 
longer boots. First it was showing streams of that zio_read error, then 
failed to load /tank/ROOT/default:/boot/kernel/kernel. Using the '?' 
command I could see / just fine, but I could get no combination to work 
to read inside dirs, the manpage makes me think I should try /boot?, but 
maybe I should've added a space?.

Typed in /boot/kernel.old/kernel (!) and it started the spinner, but 
died shortly afterwards.

Put the SSD into a different system (from ca. 2014) and it boots up just 
fine. Put the old 60GB SSD back in the old system, and it also boots 
just fine.

Ok, on the newer system, I re-wrote the bootcode with `gpart bootcode -b 
/boot/pmbr -p /boot/gptzfsboot -i 1 ada0` but that seems to have made 
things even worse on the old system. Now the tank pool isn't even found 
anymore (due to the protective MBR maybe?), all I get is some zio_read 
errors and (from memory, sorry):

ZFS: i/o error - all block copies unavailable
ZFS: can't read MOS of pool tank

Issuing '?' finds nothing anymore.

My hypothesis is that this worked initially, as the loader was under 
some specific LBA threshold, but with more data on the disk, every 
update moved it back further and this triggers a BIOS bug.

There was even a BIOS update from 2018 that I flashed, but it didn't fix 
any of this, only some Intel ME bugs. Sigh.

This is what it looks like:

% gpart show
=>        40  7814037088  ada0  GPT  (3.6T)
           40        1024     1  freebsd-boot  (512K)
         1064         984        - free -  (492K)
         2048    33554432     2  freebsd-swap  (16G)
     33556480  7780478976     3  freebsd-zfs  (3.6T)
   7814035456        1672        - free -  (836K)

So can this be a shortcoming in the BIOS with large drives? I had 
thought that only applies to boot0, not the loader itself.

I thought I can maybe boot from an USB stick and have it find the root 
of the pool, but the CMOS battery is dead, so I can't switch the boot 
drive unattended. I can't even turn on UEFI boot as, due to the battery, 
it won't stick. And this being an "industrial" PC, the CMOS battery is 
actually rechargeable but soldered onto the board, so I would have to 
get that fixed as well.

Sigh. Should I try to switch to UEFI? Would I have to move all 4T around 
and re-partition, or could I steal 256M from the swap partition?

Should I try with, gasp, GRUB2?

I'm kinda stuck on GPT and BIOS here for a while, I think.

Thanks for reading all of that,
Uli



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YfKkr%2BmC6rMrRlF9>