Date: Fri, 7 Mar 2008 19:32:42 -0800 From: "Peter Wemm" <peter@wemm.org> To: "Bob Johnson" <bob89@eng.ufl.edu> Cc: freebsd-amd64@freebsd.org Subject: Re: amd64/111955: [install] Install CD boot panic due to missing BIOS smap on 5.5 through to 7.0-Current Snapshot 200704 Message-ID: <e7db6d980803071932h38c2b4a9pfaa9a97a5b495599@mail.gmail.com> In-Reply-To: <200803072340.m27Ne4vf059878@freefall.freebsd.org> References: <200803072340.m27Ne4vf059878@freefall.freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Mar 7, 2008 at 3:40 PM, Bob Johnson <bob89@eng.ufl.edu> wrote: > The following reply was made to PR amd64/111955; it has been noted by GNATS. > > From: Bob Johnson <bob89@eng.ufl.edu> > To: bug-followup@freebsd.org, > Eamon Roque <Roque@itg.uni-muenchen.de> > Cc: > Subject: Re: amd64/111955: [install] Install CD boot panic due to missing BIOS smap on 5.5 through to 7.0-Current Snapshot 200704 > Date: Fri, 7 Mar 2008 17:48:41 -0500 > > "FreeBSD only calls the BIOS SMAP call from virtual 86 mode both > in the loader and in the i386 kernel. The fix is quite complicated and > involves rewriting the boot code to invoke BIOS calls from real mode > rather than virtual 86 mode." > > ?? but FreeBSD i386 boots and runs fine on an HP dc7700 that gives the "No > BIOS SMAP" error when booting AMD64. I'm completely ignorant of the boot > process for AMD64, but could code be lifted from i386 and moved to AMD64 to > solve this? Here's what actually happens and explains the differences. On the i386 kernel, we can make bios calls in vm86 mode during startup and have various code to find memory the "old" ways, using increasingly poor alternatives. It can fall back to bios calls and memory locations that have limits of 512MB or 64MB of ram, etc. The amd64 kernel cannot make vm86 mode calls or bios calls. It is the nature of the cpu mode. In theory, the kernel could have a mini-32-bit sub-kernel inside it and switch between 64 bit mode and 32 bit mode on the fly in order to make vm86 calls, but that is a lot of work. The AMD64 certification specs explicitly listed certain minimum bios specs as part of the logo certification requirements. For example, they must be PC2001 at a minimum. This means that it has to have USB, ACPI, etc etc. It has to have the 0xe820 memory map bios function which completely specifies the memory layout in an ACPI-compliant fashion. It lists memory that is reserved for ACPI, etc. Windows logo certifications also require PC2001 or later these days as well. For all intents and purposes, there is never going to be amd64-compatible system that doesn't have at least this level of functionaility. When I was doing the amd64 kernel boot code, I was faced with all the VM86 nastiness in the kernel. I had to do it another way. I realized that since the loader was already getting the memory map itself, and since it was running purely in 32 bit mode, then it made sense to simply pass the bios smap data through to the kernel that the loader already had. But here's where it went horribly wrong. Over recent years, bios makers have put more and more hacks into the bios code. The bioses themselves sometimes switch from 16 bit real mode to 32 bit protected mode and then back again. They do this to emulate things like driver floppies, usb and cdrom boot, etc etc. The frequency of this is increasing rather than decreasing. And here's the rub. If we call a bios function in vm86 mode, the bios code *CANNOT* switch to 32 bit protected mode. Usually what we see is that you get a BTX fault. This is because vm86 trapped an illegal or priviliged instruction, and BTX reports the problem. We've seen bios vendors start to put code that TESTS to see if it is being called in vm86 mode, and either silently fail or return an error, rather than cause btx crashes etc. Here's the rub. Some bios vendors decided that the 0xe820 call needed this treatment. This is the bios SMAP call. When the loader calls the memory map functions via a vm86 bios call, the bios returns an error. The loader then falls back to the ancient bios calls and limps along. Of course, we can't pass the non-existing SMAP code to the kernel, so when the kernel starts, it panics. There is work afoot that solves this. There should also be more seatbelts. First and foremost, John has done a non-vm86 version of btx. This completely and utterly solves the root cause of the problem. int 15 function 0xe820 will get called in real mode, just like windows, linux, grub, netbsd, old freebsd bootblocks etc do. Our boot code will behave just like everybody else's and we won't have these strange freebsd-specific problems anymore. (The downside of this change is that bad bios code won't cause BTX faults anymore. Bios crashes will reset the machine instead of reporting a btx fault that we can debug) Secondly, loader should report the missing SMAP data before starting an amd64 kernel. I've been meaning to do this for a while. if there is no SMAP data, explain the problem right there rather than letting the kernel blow up. Third.. we might be able to generate a fake SMAP table in a sort of limp-along mode. eg: if the bios doens't have it, use the legacy memory sizing code in loader to generate a fake table and pass that through to the kernel. The kernel might be stuck with only 512MB, but it might be better than nothing. It might be possible to use some getenv type calls to override the limited data. I think the work John has done to change the bios call method in BTX is the right solution though. I don't know what the MFC potential is. If it doesn't get backported, then the other hacks / workarounds / seatbelts might be in order for older branches. -- Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com "All of this is for nothing if we don't go to the stars" - JMS/B5 "If Java had true garbage collection, most programs would delete themselves upon execution." -- Robert Sewell
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?e7db6d980803071932h38c2b4a9pfaa9a97a5b495599>