Date: Fri, 11 Apr 2008 19:08:53 GMT From: Mike Hibler <mike@flux.utah.edu> To: freebsd-gnats-submit@FreeBSD.org Subject: i386/122668: FreeBSD boot loader doesn't work on Dell R900 (+workaround) Message-ID: <200804111908.m3BJ8rEj079928@www.freebsd.org> Resent-Message-ID: <200804111910.m3BJA1np076809@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 122668 >Category: i386 >Synopsis: FreeBSD boot loader doesn't work on Dell R900 (+workaround) >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-i386 >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Fri Apr 11 19:10:01 UTC 2008 >Closed-Date: >Last-Modified: >Originator: Mike Hibler >Release: 6.2-RELEASE >Organization: University of Utah, Flux Research Group >Environment: N/A >Description: As far as I can tell, this isn't a bug in the BSD bootloader, rather it is a bug in the Dell BIOS. However, googling around I see that other people have seen this problem and I have worked around it, so thought I would report it. Note also that I am seeing this bug in the Emulab bootloader which is derived from the FreeBSD 6.2-RELEASE version of the bootloader, but I believe that the problem would be the same in the actual boot loader (based on the posts I have seen). The symptom is that I try to boot over the net using a PXE (currdev="pxe0:") and the loader complains that it "cannot load kernel". The problem is that on this machine one of BIOS calls (int15/fn0x820) in bios_getsmap (src/sys/boot/i386/libi386/biossmap.c) is returning more than the 20 bytes of data it is supposed to--it appears to return the value 0x09 in the 21st byte (or 24th, I forget my little-endian lore). As the data are being read into a 20-byte static heap buffer, the result is that the following variable gets clobbered. In this case 'smap' is the buffer, and the following BSS allocated region is 'smapbase': static struct bios_smap smap; static struct bios_smap *smapbase; smapbase is the dynamically allocated area where the individual smap entries are copied into via: bcopy(&smap, &smapbase[smaplen], sizeof(struct bios_smap)); What I see then is that the first couple of iterations of read-an-entry, copy-to-buffer work fine, but then one call returns the extra data and the low-order byte of smapbase gets changed to 0x09 from something like 0xb4. The result is still a legit address so the bcopy goes without incident but the smap entry data winds up getting bcopy()ed to an earlier address, overwriting other malloc()ed memory. In this case it is overwriting some entries in the 'environ' environment linked list, corrupting the chain. The result is that I no longer have a "currdev" environment variable, and so the loader tries to load from the default (hard drive) rather than the net. Since there is nothing on the hard drive, it cannot read loader.rc or boot.conf or ..., and ultimately winds up trying to load "kernel" which fails with an error. Note that there are two read-data loops in this function, and the problem does occur in the first loop as well, but since smapbase has not yet been initialized (i.e., no bcopy happens here) it does not matter. Note also that one post I read mentioned that another BSD boots fine on the machine. That could be because in that BSD they are reading the data directly into the smapbase buffer and not via a temporary smap buffer. There what is getting clobbered with 0x09 is just a yet-to-be-filled, later part of the smapbase buffer. >How-To-Repeat: Try booting from Dell R900 >Fix: The work around is to (arbitrarily) pad the temporary smap buffer with another 4 bytes. I tried padding up to an extra 32 bytes, but never saw more than the single overwrite, and that was always in the 0-3rd byte after. >Release-Note: >Audit-Trail: >Unformatted:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200804111908.m3BJ8rEj079928>