Date: Tue, 5 Jun 2007 20:32:17 +0200 From: Peter Holm <peter@holm.cc> To: John Baldwin <jhb@freebsd.org> Cc: freebsd-acpi@freebsd.org Subject: Re: Possible ACPI relared panic with Tyan S2720 Message-ID: <20070605183216.GA23211@peter.osted.lan> In-Reply-To: <200706051326.22581.jhb@freebsd.org> References: <20070604183419.GA73268@peter.osted.lan> <200706051027.29879.jhb@freebsd.org> <20070605164402.GA18091@peter.osted.lan> <200706051326.22581.jhb@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Jun 05, 2007 at 01:26:22PM -0400, John Baldwin wrote: > On Tuesday 05 June 2007 12:44:02 pm Peter Holm wrote: > > On Tue, Jun 05, 2007 at 10:27:29AM -0400, John Baldwin wrote: > > > On Tuesday 05 June 2007 04:44:54 am Nate Lawson wrote: > > > > Peter Holm wrote: > > > > > On Mon, Jun 04, 2007 at 12:45:23PM -0700, Nate Lawson wrote: > > > > >> This is a really confusing issue. All the trace you have shows is > that > > > > >> it occurs while transitioning the system from legacy to ACPI mode. > > > > >> Unfortunately, the details of what is going on are hidden in the BIOS > > > > >> since that write to a port triggers an SMI and the BIOS does the > rest. > > > > >> > > > > >> However, it seems like the BIOS is reserving more memory, using > memory > > > > >> it didn't reserve, or FreeBSD is using memory we shouldn't. John, > any > > > > >> insight on the SMAP output? > > > > >> > > > > >>> SMAP type=01 base=0000000000000000 len=000000000009fc00 > > > > >>> SMAP type=02 base=000000000009fc00 len=0000000000000400 > > > > >>> SMAP type=02 base=00000000000e0000 len=0000000000020000 > > > > >>> SMAP type=01 base=0000000000100000 len=000000003fef0000 > > > > >>> SMAP type=03 base=000000003fff0000 len=000000000000f000 > > > > >>> SMAP type=04 base=000000003ffff000 len=0000000000001000 > > > > >>> SMAP type=02 base=00000000fec00000 len=0000000000100000 > > > > >>> SMAP type=02 base=00000000fee00000 len=0000000000001000 > > > > >>> SMAP type=02 base=00000000fff80000 len=0000000000080000 > > > > >> Peter, can you figure out what phys address is getting overwritten? > > > > >> Seems like it's the loader that sets up the module list and the > loader's > > > > >> allocator may be using RAM it shouldn't. > > > > >> > > > > > > > > > > If I did it right (I used a vtophys() on the address): > > > > > > > > > > Address of mod->name(if_tun): 0xc3eed5ec, phys: 0x985ec > > > > > > > > So it's somewhere near 620K and the first region goes to 640K - 1 K. > > > > The last 1 K is type 2 (reserved). Nothing seems to show why switching > > > > to acpi mode results in an overwrite of data at 620K. I'm not sure > > > > where to look. > > > > > > > > There should be some way to write a guard pattern to that area but I'll > > > > have to think about it a bit first. Can you see if a BIOS update is > > > > available and try it out? What about seeing if you can pre-alloc (by > > > > hacking loader's SMAP code to reserve more of the first 640 K) and > > > > writing a pattern there, then verifying it at various points during boot > > > > to be sure we know exactly where the BIOS is writing? > > > > > > Err, the loader should not be storing modules that low. Did you kldload > the > > > module or load it via the loader? > > > > > > > I did not load the module. It's loaded automatically by the loader. > > > > This is my /boot/loader.conf > > > > kernel_options="-D" > > machdep.hyperthreading_allowed=1 > > hw.ata.atapi_dma=0 > > Are you sure it isn't loaded by ifconfig during boot and thus via an implicit > kldload? The loader only loads modules into memory > KERNLOAD (2MB for PAE, > 4MB for non-PAE). > No, I'm not sure at all! I have tried to manually load acpi.ko at the loader prompt and also to add acpi_load="YES" to /boot/loader.conf. This still overwrites the if_tun entry in the modules list. Typing unset acpi_load at the loader prompt works and I can then later load acpi: $ kldstat Id Refs Address Size Name 1 1 0xc0400000 889124 kernel $ kldload acpi.ko $ kldstat Id Refs Address Size Name 1 3 0xc0400000 889124 kernel 2 1 0xc48af000 57000 acpi.ko Just to summarize the problem: The memory corruption comes and goes depending on the kernel config file. I first identified the "cause" to be files committed by scottl at 2007/05/14 21:48, which just introduces new malloc types. Right now GENERIC works fine again, but if I remove the newly added: nodevice fwip # IP over FireWire (RFC 2734,3146) nodevice dcons # Dumb console driver nodevice dcons_crom # Configuration ROM for dcons the problem pops up again. -- Peter
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070605183216.GA23211>