Date: Fri, 30 Jul 2004 18:23:20 -0700 (PDT) From: Matthew Dillon <dillon@apollo.backplane.com> To: Alexander Kabaev <kan@freebsd.org> Cc: freebsd-current@freebsd.org Subject: Re: boot2 -- Round 2 Message-ID: <200407310123.i6V1NKHf085934@apollo.backplane.com> References: <20040730212843.GA33955@parodius.com> <20040731002713.GA6709@freefall.freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
I had a similar problem with boot2 displaying 0:ad(0,<garbage>) [garbage]. The problem turned out to be boot1's fault. Boot1 is apparently responsible for clearing boot2's BSS. The problem is that boot1 does not really have all that clear an idea about where boot2's BSS is or, especially, how large it is. boot1 only clears to the end of the segment but I'm not even sure that it is calculating the *start* address properly. Now this might not be the same problem. I've completely reorganized most of the boot code in DragonFly and the cause of the mismatched BSS in our boot1/boot2 might have been due to something I did. But the error was that boot1 was not clearing enough of the BSS which caused a number of boot2's globals to be garbage on startup, which in turn caused boot2 to believe that the partition and load path had already been set (to garbage). Hence the displayed garbage. The fix I made in DragonFly was to move the BSS clearing out of boot1 and into i386/btx/lib/btxcsu.s where the size of the BSS is known. This required some surgery, however, because it bloated boot2 past its size limit (but also made boot1 smaller). Actually if someone over in FreeBSD land is interested in cleaning up your boot code, I would recommend starting with DragonFly's and then making it work with FreeBSD again (which shouldn't be too difficult). Amoung other things I reorganized *ALL* the hardwired origins into a single header file and it is now possible to change most of them at will and still get something that works out of it. The FreeBSD boot code has some historical issues which could cause interference with certain BIOSes, such as using 0x1000 as the top of the transfer stack and using other similarly nasty addresses that it probably shouldn't be. (But, that said, I still can't get either the FreeBSD or the DFly boot code to boot my Shuttle AMD64 boxes if the mouse is not plugged in. The BIOS gets ultra confused over the amount of BIOS memory available and trashes the memory table... but boots linux just fine). -Matt Matthew Dillon <dillon@backplane.com> :On Fri, Jul 30, 2004 at 02:28:43PM -0700, Jeremy Chadwick wrote: :> So, in regards to the commited fix: :> :> This seemed to fix the issue on one of my boxes (the one which was :> flat-out panic'ing, not the one which was reporting 0:ad(0,`) as the :> default slice to load /boot/loader from). I'll refer to the one which :> panic'd as "Box A" while the one which is doing the backtick as "Box B". :> :> After pulling cvs down last night and rebuilding world+kernel+boot :> blocks, running disklabel -B ad0s1, all on Box B, I found the machine :> once again spitting out "Invalid partition", trying to load loader(8) :> off of 0:ad(0,`) instead of 0:ad(0,a). I double-checked boot2/Makefile :> to see if -fno-unit-at-a-time was in place -- and it was. :> :> I've tried using /boot/boot off of Box A and applying it to Box B using :> disklabel -B -b /boot/box_b/boot ad0s1 to no avail. :> :> It seems almost as if the boot2 code is broken in such a way that it :> resembles an "off-by-one" error (ASCII 0x60 == `, ASCII 0x61 == a). :> Why it's picking ` is beyond me... :> :> Can someone shed some light as to how I can go about debugging this, :> as well as mention how I can temporarily work around this? Box B :> happens to run mysqld, and is suffering from some issues mentioned on :> freebsd-threads (re: machine randomly hard-locking), so it definitely :> needs to be able to boot back up on it's own without my intervention. :> :> Thanks! :Hi, : :I guess I would like to get your /boot/boot. The one I got simply works :on all boxes in my home :(. : :As another option, you can try an alternative patch which was proposed :by Tim Robbins. Since the problem was apparently caused by me going back to :static memcpy implementation, I am currenly working on using builtin :memcpy as it was used before. I will post it later after I've done some :more testing and if things will look good. : :-- :Alexander Kabaev : :======== Begin quote ============== : :After a few hours of head-scratching, I've tracked down the problem with :boot2 and -funit-at-a-time, and come up with a patch that makes it work: : :==== //depot/user/tjr/freebsd-tjr/src/sys/boot/i386/boot2/boot2.c#7 - /home/tim/p4/src/sys/boot/i386/boot2/boot2.c ==== :@@ -139,7 +139,16 @@ : static int xgetc(int); : static int getc(int); : :-static void memcpy(void *, const void *, int); :+/* :+ * GCC 3.4 with -funit-at-a-time (implied by -Os) may use a non-standard :+ * calling convention for static functions, using registers to pass arguments :+ * instead of the stack. However, GCC may emit calls to memcpy() when a :+ * program copies a struct with the assignment operator, and the code it :+ * emits to call memcpy() uses the standard convention, not the register :+ * convention. This means we must declare our memcpy() implementation "__used" :+ * to disable the register calling convention. :+ */ :+static void memcpy(void *, const void *, int) __used; : static void : memcpy(void *dst, const void *src, int len) : { : : :I think this is a bug in GCC; it should emit a warning if it's about to emit :code to call memcpy(), but finds that memcpy() has a prototype that conflicts :with the assumptions it makes. : : :Tim
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200407310123.i6V1NKHf085934>