Date: Tue, 10 Dec 2019 10:35:26 -0500 From: Marc Branchaud <marcnarc@gmail.com> To: Mark Martinec <Mark.Martinec+freebsd@ijs.si>, freebsd-stable@freebsd.org Subject: Re: Boot loader stuck after first stage upgrading 11.2 to 12.0-RC2 Message-ID: <f599acfb-20f7-fc53-9753-fcd37a923e8e@gmail.com> In-Reply-To: <4c4019102b63054f8de93324dba0e776@ijs.si> References: <22f5b92a09ea4d62ac3feb74457067f7@ijs.si> <5EEBAFC0-4FA3-4219-A918-7376F4223656@me.com> <f2737ffb236d39761767aa10a603c084@ijs.si> <0F5FCC70-EADB-4F9E-A391-F1A73BE5608F@me.com> <dc762bdf408c92daae826425fdba98d9@ijs.si> <B3C7194D-93B8-406B-9E8E-BA55D49D657A@me.com> <1543954753.1860.243.camel@freebsd.org> <EC8DD049-8BBE-4E96-A68B-A2846CED00BA@me.com> <53ceda24-fa1b-8546-3511-bd500b440dfe@digiware.nl> <4c4019102b63054f8de93324dba0e776@ijs.si>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2019-12-10 9:18 a.m., Mark Martinec wrote: > Commenting on a thread from 2018-12 and from 2019-09-20, with my solution > to the boot problem at the end, in case anyone is still interested. Thank you very much for this. A couple of questions: (1) Why do you say "raw devices for historical reasons"? Glancing through the zpool man page and the Handbook, I see nothing recommending or requiring GPT partitions. (2) Just to be 100% clear, my 11.3 non-root zpool looks like this: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 ada2 ONLINE 0 0 0 ada3 ONLINE 0 0 0 ada4 ONLINE 0 0 0 ada5 ONLINE 0 0 0 ada6 ONLINE 0 0 0 ada7 ONLINE 0 0 0 So this is using raw devices. Are you saying that if I upgrade this machine to 12 that it won't be able to boot? Thanks again! M. > ======= > > On 2018-11-29 myself wrote: > (after upgrading from 11.2 to 12.0): >> While booting, the 'BTX loader' comes up, lists the BIOS drives, >> then the spinner below the list comes up and begins turning, >> stuttering, and after a couple of seconds it grinds to a standstill >> and nothing happens afterwards. >> At this point the ZFS and the bootstrap loader is supposed to >> come up, but it doesn't. > [...] (on 2018-12-04): >> The situation has not changed: the BTX loader lists all BIOS drives >> C..J (disk0..disk7), then a spinner starts and gets stuck forever. >> It never reaches the 'BIOS 635kB/3537856kB available memory' line. >> >> While trying to restore the old /boot from 11.2, I tried booting >> a live image from a 12.0-RC3 memory stick - and the loader got >> stuck again, same as when booting from a disk. >> So I had to boot from an 11.2 memstick to be able to regain control. > > ======= > > 2018-12-04, Ian Lepore writes: >> Toomas Soome wrote: >> | ok, if you could perform 2 tests: >> | 1. from loader prompt enter 0x413 0xa000 - @w . cr >> | 2. on first spinner, press space and type on boot: prompt: >> | /boot/loader_4th and see if that will do better >> | thanks, toomas >> I don't think that will be an option. If it hasn't gotten to the point >> of saying how much BIOS available memory there is, it's only halfway >> through loader main() and has hung before getting to interact(). >> >> In fact, if that line hasn't printed, but some disk drives have been >> listed, it pretty much has to be hung in the "March through the device >> switch probing for things" loop. If all the disks are listed, then it >> got through that entry in the devsw, and is likely hanging in the >> dv_init calls for either the pxedisk or zfsdev devices. > > ======= > > 2018-12-07 19:08, Willem Jan Withagen wrote: >> Ended up more or less in the same situation this afternoon with >> freebsd-upgrade to [12.0]-RC3 >> Boot stops after listing all DOS disks, in a spinner. >> So that is no fix. >> >> I booted from USB 11.2 and replaced the /boot/zfs{boot,loader} by the >> 11.2 ones. >> That makes my server again happy. > > =======are > > 2019-09-19 16:02, Kurt Jaeger wrote: > Subject: Re: Lockdown adaX numbers to allow booting ? >> | Kurt Jaeger writes: >> | The problem is that if all 10 disks are connected, the system >> | looses track from where it should boot and fails to boot (serial >> boot log): >> | >> | Consoles: internal video/keyboard serial port >> | BTX loader 1.00 BTX version is 1.02 >> | Consoles: internal video/keyboard serial port >> | BIOS drive C: is disk0 >> | BIOS drive D: is disk1 >> | BIOS drive E: is disk2 >> | BIOS drive F: is disk3 >> | BIOS drive G: is disk4 >> | BIOS drive H: is disk5 >> | BIOS drive I: is disk6 >> | BIOS drive J: is disk7 >> | BIOS drive K: is disk8 >> | BIOS drive L: is disk9 >> | // >> | [...] >> | The solution right now is this to unplug all disks of the 'bck' >> pool, >> | reboot, and re-insert the data disks after the boot is finished. >> | [...] >> | No gpart on the bck pool, raw drives. > > 2019-09-20 17:27, Mark Martinec wrote: > Subject: Re: Lockdown adaX numbers to allow booting ? >> >> This sounds very much like my experience: >> >> 2018-11-29, Boot loader stuck after first stage upgrading 11.2 to >> 12.0-RC2 >> https://lists.freebsd.org/pipermail/freebsd-stable/2018-November/090129.html >> >> https://lists.freebsd.org/pipermail/freebsd-stable/2018-December/090159.html >> >> >> I now have three SuperMicro machines which are unable to boot after >> upgrading 11.2 to 12.0. After unsuccessfully fiddling with boot loaders, >> I have reverted two back to 11.2 (which boots and works fine again), >> and the third one is now at 12.0 but needs the boot hack as described >> by Kurt, i.e. pull out half the disks (of the 'data' pool), boot the >> system, plug the disks back in and zfs mount the remaining pool. >> >> Considering that the 11.2 boots and works fine on these machines, >> I consider it a btx loader failure and not a BIOS issue. >> >> What is common with these three machines is that they have one pool >> on raw devices for historical reasons (not on gpt partitions). >> My guess is that the new loader gets confused by these raw disks. > > ======= > > Ok, now to my current situation and solution/workaround. > > What was common with these hosts (and similar) is that a machine > has more than a couple of disks, with a zfs pool (non-root) on > raw devices (for historical reasons), not on gpt partitions. > > Three workarounds seem possible: > > - replace a boot loader with the one from 11.2, or > > - using a default loader from 12, disconnect a sufficient number > of data disks, boot, then reconnect disks and zfs attach the pool, > > - or my current solution: zfs offline one disk at a time from > a data pool, wipe it, set up a gpt partition on it and > put it back to the pool by 'zfs replace', letting it resilver. > It was a painful and slightly risky procedure (9 hours of > resilvering each of the seven disks), but this procedure > has now salvaged our remaining hosts which could not be > upgraded from 11.2 to 12. > > Mark > > > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?f599acfb-20f7-fc53-9753-fcd37a923e8e>