Date: Fri, 02 Sep 2005 13:57:57 -0400 From: Ben Thomas <bthomas@virtualiron.com> To: freeBSD-gnats-submit@FreeBSD.org Subject: i386/85652: [patch] deal with out-of-memory errors during booting Message-ID: <431892A5.7090303@virtualiron.com> Resent-Message-ID: <200509021800.j82I0aXW055224@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 85652 >Category: i386 >Synopsis: [patch] deal with out-of-memory errors during booting >Confidential: no >Severity: non-critical >Priority: low >Responsible: freebsd-i386 >State: open >Quarter: >Keywords: >Date-Required: >Class: change-request >Submitter-Id: current-users >Arrival-Date: Fri Sep 02 18:00:35 GMT 2005 >Closed-Date: >Last-Modified: >Originator: Ben Thomas >Release: FreeBSD 5.4-RELEASE i386 >Organization: Virtual Iron Software >Environment: System: FreeBSD bthomas4.katana-technology.com 5.4-RELEASE FreeBSD 5.4-RELEASE #10: Sun Aug 28 13:48:00 EDT 2005 ben@bthomas4.katana-technology.com:/usr/obj/usr/home/ben/BSD/RELENG_5_4_0_RELEASE/src/sys/BEN i386 >Description: A new machine arrived, which provided a smaller amount of memory in the first memory region, and booting stopped working. Actually, "stopped working" is the summation. The reality is that the systems don't boot, and do some really weird and somewhat random things. Debugging this was at first amusing. A number of changes were needed to resolve this, or to at least get to a point of error messages rather than weird behavior. Booting uses the first region of memory. btx uses the very top of this region, and grows the stack down from beneath its usage. The program heap grows up. Unfortunately, the amount of memory known to main, and that to btx, may vary due to differing means to obtain it. That's the first issue. Next, the top of the heap is set to the top of the region of memory. That, by definition, includes the stack and btx data area. This is bad. When the heap grows, it is zeroed. When the heap gets too big, this is zeroing the stack and very, very strange behavior results. Properly setting the top of heap would allow for error detection and messaging. The best answer is to use the stack pointer (this is i386 specific code). The stack pointer was set by btx and accomodates the btx memory usage and the btx view of memory. Using the stack pointer allows setting a more realistic top of heap. The changes are: - sbrk.c - generate error message on failure - interp.c - check malloc return, and return message and error on failure - main.c - make better attempt to properly set the top of the heap. This, by itself, doesn't solve the problem, but does make it easier to figure out what happened. This patch is against the 5_4_0_RELEASE code >How-To-Repeat: >Fix: --- interp.c-DIFF begins here --- --- /usr/src.original/sys/boot/common/interp.c Mon Aug 25 19:30:41 2003 +++ /usr/src/sys/boot/common/interp.c Thu Aug 11 17:11:09 2005 @@ -237,6 +237,16 @@ #endif /* Allocate script line structure and copy line, flags */ sp = malloc(sizeof(struct includeline) + strlen(cp) + 1); + /* On malloc failure (it happens !), free as much as possible and exit */ + if (sp == NULL) { + while(script != NULL) { + se = script; + script = script->next; + free(se); + } + sprintf(command_errbuf, "file '%s' line %d: memory allocation failure - aborting\n", __FUNCTION__, filename, line); + return(CMD_ERROR); + } sp->text = (char *)sp + sizeof(struct includeline); strcpy(sp->text, cp); #ifndef BOOT_FORTH --- interp.c-DIFF ends here --- --- main.c-DIFF begins here --- --- /usr/src.original/sys/boot/i386/loader/main.c Sun Jan 30 07:22:08 2005 +++ /usr/src/sys/boot/i386/loader/main.c Thu Aug 11 17:19:29 2005 @@ -72,6 +72,14 @@ /* XXX debugging */ extern char end[]; +/* 386 specific routine to return the current stack pointer */ + +static __inline unsigned char * read_esp(void) { + unsigned char *data; + __asm __volatile("movl %%esp,%0" : "=r" (data)); + return (data); +} + int main(void) { @@ -88,7 +96,35 @@ */ bios_getmem(); - setheap((void *)end, (void *)bios_basemem); + /* + * Let's take a few data points to tell this story: + * - the size of the boot code has grown in the last few releases + * - boot code must all fit into the very first region of memory + * - that region of memory can vary from BIOS to BIOS + * - the setheap call is all that's between the heap overwriting the + * stack and disaster, or a more reasonable error message + * - the BIOS data area size value doesn't always match the memory + * information (which is in bios_basemem) + * - the loader (btx) takes some space for itself and also uses the + * BIOS data area information. + * + * The end result is that this code tells sbrk that the heap can + * grow and take over all of the memory region . At the same time, btx + * has taken over the top of the memory region. btx is using the memory + * just below bios_basemem (best case), or an even smaller value (worst + * case), as it turns out that btx uses a different means to get the + * "top" value. btx claims the very top, and sets the stack to grow + * down from there. Now, add a machine with a smaller available + * memory space and BOOM - you get sbrk zeroing the stack as it + * grows the heap and things get seriously weird. At this point, we know + * that the stack pointer has been set properly by btx. The only solid + * answer here is to get the current stack pointer and allow some room + * for growth and to use that as the top of the heap. This instantly + * accounts for whatever BTX was using for data and storage and + * allows for a reasonable failure as opposed to very strange + * results from stack corruption. + */ + setheap((void *)end, (read_esp() - 0x1000)); /* Give a page of stack */ /* * XXX Chicken-and-egg problem; we want to have console output early, but some --- main.c-DIFF ends here --- --- sbrk.c-DIFF begins here --- --- /usr/src.original/lib/libstand/sbrk.c Sun Sep 30 18:28:01 2001 +++ /usr/src/lib/libstand/sbrk.c Thu Aug 11 17:15:36 2005 @@ -56,6 +56,9 @@ heapsize += incr; return(ret); } + else + printf("%s - heap would overrun stack - aborting\n", __FUNCTION__); + errno = ENOMEM; return((char *)-1); } --- sbrk.c-DIFF ends here --- >Release-Note: >Audit-Trail: >Unformatted:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?431892A5.7090303>