Date: Fri, 7 Sep 2007 08:06:35 -0700 From: Luigi Rizzo <rizzo@icir.org> To: current@freebsd.org Subject: Re: diskless system freeze in bios16_call() on some Intel motherboards Message-ID: <20070907080635.A96828@xorpc.icir.org> In-Reply-To: <20070907064851.A95655@xorpc.icir.org>; from rizzo@icir.org on Fri, Sep 07, 2007 at 06:48:51AM -0700 References: <20070907064851.A95655@xorpc.icir.org>
next in thread | previous in thread | raw e-mail | index | archive | help
[note, all these tests have been done on -stable, though -current kernel exhibits the same problems in some of the tests so i suspect there is a common problem] On Fri, Sep 07, 2007 at 06:48:51AM -0700, Luigi Rizzo wrote: > Hi, > we are having some annoying problems with a number of Intel > motherboards (Pentium4, ich6 and ich7 based, the laters are on > D945PAW boards with SN94510J.86A bios if that matters). > > The symptoms are that booting a 6.x or 7.x kernel with > etherboot causes a system freeze. This happens also if we > try to etherboot the kernel from a 6.2 install CD. > > On the other hand, on the same hardware: > - a 4.11 kernel booted with etherboot boots ok. > - a 6.2 install CD boots ok; > - a 6.2 install CD with the kernel replaced with ours boots ok. > > So it seems that at least part of the problem is how > the execution environment is set up by etherboot as opposed to > /boot/loader . However, it is still unclear to me why the 4.11 kernel > works. > > After some instrumenting, it turns out that the freeze is in the > call to bios16_call, and specifically in this line in > sys/i386/i386/bioscall.s > > lcallw *bioscall_vector /* 16-bit call */ > > Looking at the arguments there is nothing strange - the selector is > 0x70 as on other machines, the address seems reasonable. > If I comment out the lcallw, then things proceed, but apparently the > interrupt for the network card is not set up correctly because the > subsequent bootp replies are not received (i see them on the > server with tcpdump) and have 'watchdog timeout' messages on the console > of the diskless client. > > Any ideas on what could the problem be ? For the benefit of the archives, and upon further investigation: the problem is definitely related to pnpbios/bios16 calls. One of the difference between the CD and etherboot is that the CD loads acpi.ko as a module. This apparently prevent the calls to the offending pnpbios stuff, and also lets the apic code correctly configure things. The following causes a PANIC: - booting from etherboot with acpi compiled-in the kernel itself panics in pmap_mapbios(), right after calling AcpiOsMapMemory() . The following causes the system to FREEZE: - booting from etherboot without acpi, with SMP+apic, and with bios16_call() uncommented (essentially it is this call that causes the freeze). - booting from CD without loading acpi.ko (the 'safe mode'!). This too causes the call to bios16_call() which in turn freezes. the following causes 'WATCHDOG TIMEOUT' on the network card: - booting from etherboot with SMP+apic, bios16_call() commented out, and no acpi. Presumably, the apic does not route interrupts properly on this hardware without acpi. finally, the following WORKS WELL: - boot from etherboot without SMP, without acpi, without apic., and commenting out the call to bios16_call() in bios.c This is probably using the hardware in a similar way to what 4.11 does. Note however that we need to patch the kernel source. - boot from the CD, loading acpi.ko as a module, and irrespective of SMP+apic. I don't know why loading acpi.ko as a module works better than compiled in, but perhaps it is related to the order in which the functions are called ? I cannot do more tests now, but surely it would be interesting to see what changes in acpi between compiled-in and kldloaded. cheers luigi
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070907080635.A96828>