From owner-freebsd-current@FreeBSD.ORG Fri Sep 7 15:07:52 2007 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F181A16A420 for ; Fri, 7 Sep 2007 15:07:51 +0000 (UTC) (envelope-from rizzo@icir.org) Received: from xorpc.icir.org (xorpc.icir.org [192.150.187.68]) by mx1.freebsd.org (Postfix) with ESMTP id EA5E713C478 for ; Fri, 7 Sep 2007 15:07:51 +0000 (UTC) (envelope-from rizzo@icir.org) Received: from xorpc.icir.org (localhost [127.0.0.1]) by xorpc.icir.org (8.12.11/8.13.6) with ESMTP id l87F6ZAl096897; Fri, 7 Sep 2007 08:06:35 -0700 (PDT) (envelope-from rizzo@xorpc.icir.org) Received: (from rizzo@localhost) by xorpc.icir.org (8.12.11/8.12.3/Submit) id l87F6Zd1096896; Fri, 7 Sep 2007 08:06:35 -0700 (PDT) (envelope-from rizzo) Date: Fri, 7 Sep 2007 08:06:35 -0700 From: Luigi Rizzo To: current@freebsd.org Message-ID: <20070907080635.A96828@xorpc.icir.org> References: <20070907064851.A95655@xorpc.icir.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <20070907064851.A95655@xorpc.icir.org>; from rizzo@icir.org on Fri, Sep 07, 2007 at 06:48:51AM -0700 Cc: Subject: Re: diskless system freeze in bios16_call() on some Intel motherboards X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Sep 2007 15:07:52 -0000 [note, all these tests have been done on -stable, though -current kernel exhibits the same problems in some of the tests so i suspect there is a common problem] On Fri, Sep 07, 2007 at 06:48:51AM -0700, Luigi Rizzo wrote: > Hi, > we are having some annoying problems with a number of Intel > motherboards (Pentium4, ich6 and ich7 based, the laters are on > D945PAW boards with SN94510J.86A bios if that matters). > > The symptoms are that booting a 6.x or 7.x kernel with > etherboot causes a system freeze. This happens also if we > try to etherboot the kernel from a 6.2 install CD. > > On the other hand, on the same hardware: > - a 4.11 kernel booted with etherboot boots ok. > - a 6.2 install CD boots ok; > - a 6.2 install CD with the kernel replaced with ours boots ok. > > So it seems that at least part of the problem is how > the execution environment is set up by etherboot as opposed to > /boot/loader . However, it is still unclear to me why the 4.11 kernel > works. > > After some instrumenting, it turns out that the freeze is in the > call to bios16_call, and specifically in this line in > sys/i386/i386/bioscall.s > > lcallw *bioscall_vector /* 16-bit call */ > > Looking at the arguments there is nothing strange - the selector is > 0x70 as on other machines, the address seems reasonable. > If I comment out the lcallw, then things proceed, but apparently the > interrupt for the network card is not set up correctly because the > subsequent bootp replies are not received (i see them on the > server with tcpdump) and have 'watchdog timeout' messages on the console > of the diskless client. > > Any ideas on what could the problem be ? For the benefit of the archives, and upon further investigation: the problem is definitely related to pnpbios/bios16 calls. One of the difference between the CD and etherboot is that the CD loads acpi.ko as a module. This apparently prevent the calls to the offending pnpbios stuff, and also lets the apic code correctly configure things. The following causes a PANIC: - booting from etherboot with acpi compiled-in the kernel itself panics in pmap_mapbios(), right after calling AcpiOsMapMemory() . The following causes the system to FREEZE: - booting from etherboot without acpi, with SMP+apic, and with bios16_call() uncommented (essentially it is this call that causes the freeze). - booting from CD without loading acpi.ko (the 'safe mode'!). This too causes the call to bios16_call() which in turn freezes. the following causes 'WATCHDOG TIMEOUT' on the network card: - booting from etherboot with SMP+apic, bios16_call() commented out, and no acpi. Presumably, the apic does not route interrupts properly on this hardware without acpi. finally, the following WORKS WELL: - boot from etherboot without SMP, without acpi, without apic., and commenting out the call to bios16_call() in bios.c This is probably using the hardware in a similar way to what 4.11 does. Note however that we need to patch the kernel source. - boot from the CD, loading acpi.ko as a module, and irrespective of SMP+apic. I don't know why loading acpi.ko as a module works better than compiled in, but perhaps it is related to the order in which the functions are called ? I cannot do more tests now, but surely it would be interesting to see what changes in acpi between compiled-in and kldloaded. cheers luigi