From owner-freebsd-current@FreeBSD.ORG Fri Nov 21 23:09:29 2008 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2E0211065670 for ; Fri, 21 Nov 2008 23:09:29 +0000 (UTC) (envelope-from luigi@onelab2.iet.unipi.it) Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.9.129]) by mx1.freebsd.org (Postfix) with ESMTP id E02E18FC08 for ; Fri, 21 Nov 2008 23:09:28 +0000 (UTC) (envelope-from luigi@onelab2.iet.unipi.it) Received: by onelab2.iet.unipi.it (Postfix, from userid 275) id 38C4B7309E; Sat, 22 Nov 2008 00:14:00 +0100 (CET) Date: Sat, 22 Nov 2008 00:14:00 +0100 From: Luigi Rizzo To: current@freebsd.org, jhb@freebsd.org, kib@freebsd.org Message-ID: <20081121231400.GA94863@onelab2.iet.unipi.it> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.3i Cc: Subject: Recent versions of pxeboot hang/panic on AMD platform. X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Nov 2008 23:09:29 -0000 [copying some people involved with recent related commits] As reported in kern/118222 recent versions of pxeboot hang/panic on AMD platform. Initial reports mentioned that the RELENG_6 versions worked well, however i found out that even the recent RELENG_6 code is problematic. Specifically, the problem i see on two machines with AMD CPU (one is an Asus M2N-VM) motherboard netbooting with PXEboot, is that the loading of config files or binary modules (kernel, etc.) randomly hangs with recent version of pxeboot (RELENG_6, RELENG_7 and HEAD all give the same behaviour). The same system works fine with an old version of pxeboot from RELENG_6. Things seem to work fine on i386 (tried a Pentium4, N270 and on qemu) with all the versions below. To make some investigation i started with a reliable version (RELENG_6, early 2008) and moved forward to figure out where the problem was introduced. I found the following: RELENG_6 as of 2008.03.01 (svn 176674) works RELENG_6 as of 2008.03.15 (svn 177190) works (same as previous) RELENG_6 as of 2008.03.31 (svn 177768) does NOT work changed files: Index: RELENG_6/sys/boot/i386/boot2/boot2.c Index: RELENG_6/sys/boot/i386/btx/btx/Makefile Index: RELENG_6/sys/boot/i386/btx/btx/btx.S Index: RELENG_6/sys/boot/i386/gptboot/gptboot.c Index: RELENG_6/sys/boot/i386/libi386/biossmap.c Index: RELENG_6/sys/boot/i386/libi386/biosmem.c There is a recent, related change (august 2008) which however does not seem to fix the bug. (all the above is basically an MFC of something applied slightly earlier to head and RELENG_7 . I have experienced the same exact bug with a fresh head and RELENG_7, even though I have not found the exact point there where the problem arised). The fact that the failure occurs at random times, even quite early (e.g. while reading the Forth config files) suggests that the problem may be related to interrupts coming at the wrong time. Unfortunately the changes to btx.S (which i believe may be related to the problem, as the changes to the other files seem innocuous or unrelated) are beyond my knowledge. So, anyone has ideas on what could be happening here, and especially how likely it is that we might see the same problem with a disk or usb-based booting ? cheers luigi be the case to back out this