From owner-freebsd-bugs@FreeBSD.ORG Tue Nov 18 16:40:03 2008 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AA5CE1065672 for ; Tue, 18 Nov 2008 16:40:03 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 9AF078FC14 for ; Tue, 18 Nov 2008 16:40:03 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id mAIGe2eK050984 for ; Tue, 18 Nov 2008 16:40:02 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.3/8.14.3/Submit) id mAIGe2rY050983; Tue, 18 Nov 2008 16:40:02 GMT (envelope-from gnats) Date: Tue, 18 Nov 2008 16:40:02 GMT Message-Id: <200811181640.mAIGe2rY050983@freefall.freebsd.org> To: freebsd-bugs@FreeBSD.org From: Luigi Rizzo Cc: Subject: Re: kern/118222: [pxeboot] FreeBSD 7.0 PXE NFS / "Can't work out which disk we are booting from" on AMD CPU X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Luigi Rizzo List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Nov 2008 16:40:03 -0000 The following reply was made to PR kern/118222; it has been noted by GNATS. From: Luigi Rizzo To: bug-followup@FreeBSD.org, "http://www.freebsd.org/send-pr.html"@FreeBSD.org Cc: Subject: Re: kern/118222: [pxeboot] FreeBSD 7.0 PXE NFS / "Can't work out which disk we are booting from" on AMD CPU Date: Tue, 18 Nov 2008 17:23:24 +0100 We have identified the problem as related to a heap overflow: instrumenting the code at /usr/src/sys/boot/common/interp.c::include() you will see that at some point the call to malloc sp = malloc(sizeof(struct includeline) + strlen(cp) + 1); at some point will not return and cause the pxeboot to be restarted. Tracking the values returned by malloc() we found that the last successful return is something around 0x77384, with the stack being dangerously close to it (somewhere around 0x77900 on the first call to include but the function is recursive and has a 256 byte local variable). When the heap overflows, my system is processing line 1500 of file /boot/support.4th, which is 1700 lines long and is the last (third or fourth) of a set of nested includes. Why this occurs only on AMD64 is not completely clear, but probably it is related to less memory made available by the bios on those boards compared to the i386 machines. In any case the following patch is enough to save enough memory so that pxeboot run to completion with our set of includes, and it does this by not saving empty lines (about 200 of them in the offending file, which saves some 6k of memory) and making a buffer static (saving another 1-2k of memory due to the recursive calls). Clearly, this is not the way to go on a system with 2GB of memory, and we need to make the entire system more robust :) cheers luigi Index: common/interp.c =================================================================== RCS file: /home/ncvs/src/sys/boot/common/interp.c,v retrieving revision 1.29 diff -u -r1.29 interp.c --- common/interp.c 25 Aug 2003 23:30:41 -0000 1.29 +++ common/interp.c 18 Nov 2008 16:00:57 -0000 @@ -192,7 +192,7 @@ include(const char *filename) { struct includeline *script, *se, *sp; - char input[256]; /* big enough? */ + static char input[256]; /* big enough? */ #ifdef BOOT_FORTH int res; char *cp; @@ -236,6 +239,8 @@ } #endif /* Allocate script line structure and copy line, flags */ + if (*cp == '\0') + continue; sp = malloc(sizeof(struct includeline) + strlen(cp) + 1); sp->text = (char *)sp + sizeof(struct includeline); strcpy(sp->text, cp);