Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 7 Sep 2007 08:06:35 -0700
From:      Luigi Rizzo <rizzo@icir.org>
To:        current@freebsd.org
Subject:   Re: diskless system freeze in bios16_call() on some Intel motherboards
Message-ID:  <20070907080635.A96828@xorpc.icir.org>
In-Reply-To: <20070907064851.A95655@xorpc.icir.org>; from rizzo@icir.org on Fri, Sep 07, 2007 at 06:48:51AM -0700
References:  <20070907064851.A95655@xorpc.icir.org>

next in thread | previous in thread | raw e-mail | index | archive | help
[note, all these tests have been done on -stable, though -current
kernel exhibits the same problems in some  of the tests so i
suspect there is a common problem]

On Fri, Sep 07, 2007 at 06:48:51AM -0700, Luigi Rizzo wrote:
> Hi,
> we are having some annoying problems with a number of Intel
> motherboards (Pentium4, ich6 and ich7 based, the laters are on
> D945PAW boards with SN94510J.86A bios if that matters).
> 
> The symptoms are that booting a 6.x or 7.x kernel with
> etherboot causes a system freeze. This happens also if we
> try to etherboot the kernel from a 6.2 install CD.
> 
> On the other hand, on the same hardware:
> - a 4.11 kernel booted with etherboot boots ok.
> - a 6.2 install CD boots ok;
> - a 6.2 install CD with the kernel replaced with ours boots ok.
> 
> So it seems that at least part of the problem is how
> the execution environment is set up by etherboot as opposed to
> /boot/loader . However, it is still unclear to me why the 4.11 kernel
> works.
> 
> After some instrumenting, it turns out that the freeze is in the
> call to bios16_call, and specifically in this line in
> sys/i386/i386/bioscall.s
> 
> 	        lcallw  *bioscall_vector        /* 16-bit call */
> 
> Looking at the arguments there is nothing strange - the selector is
> 0x70 as on other machines, the address seems reasonable.
> If I comment out the lcallw, then things proceed, but apparently the
> interrupt for the network card is not set up correctly because the
> subsequent bootp replies are not received (i see them on the
> server with tcpdump) and have 'watchdog timeout' messages on the console
> of the diskless client.
> 
> Any ideas on what could the problem be ?

For the benefit of the archives, and upon further investigation:

    the problem is definitely related to pnpbios/bios16 calls.

One of the difference between the CD and etherboot is that the CD
loads acpi.ko as a module. This apparently prevent the calls to the
offending pnpbios stuff, and also lets the apic code correctly
configure things.

The following causes a PANIC:
- booting from etherboot with acpi compiled-in
  the kernel itself panics in pmap_mapbios(), right after calling
  AcpiOsMapMemory() .

The following causes the system to FREEZE:
- booting from etherboot without acpi, with SMP+apic, and with bios16_call()
  uncommented (essentially it is this call that causes the freeze).

- booting from CD without loading acpi.ko (the 'safe mode'!). This too
  causes the call to bios16_call() which in turn freezes.

the following causes 'WATCHDOG TIMEOUT' on the network card:
- booting from etherboot with SMP+apic, bios16_call() commented out,
  and no acpi. Presumably, the apic does not route interrupts
  properly on this hardware without acpi.

finally, the following WORKS WELL:
- boot from etherboot without SMP, without acpi, without apic., and
  commenting out the call to bios16_call() in bios.c
  This is probably using the hardware in a similar way to what
  4.11 does. Note however that we need to patch the kernel source.

- boot from the CD, loading acpi.ko as a module, and irrespective of SMP+apic.
  I don't know why loading acpi.ko as a module works better than compiled in,
  but perhaps it is related to the order in which the functions are
  called ?

I cannot do more tests now, but surely it would be interesting to
see what changes in acpi between compiled-in and kldloaded.

	cheers
	luigi



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070907080635.A96828>