Date: Mon, 12 Jul 2010 14:41:51 +0200 From: Markus Gebert <markus.gebert@hostpoint.ch> To: John Baldwin <jhb@freebsd.org> Cc: freebsd-stable <freebsd-stable@freebsd.org> Subject: Re: 8.1-RC2 - PCI fatal error or MCE triggered by USB/ehci on Sun X4100M2? Message-ID: <FFB367B2-232D-460D-82B8-C3F03F1B53BE@hostpoint.ch> In-Reply-To: <08562D52-02AA-46CF-BFCD-00D0A3C4DC34@hostpoint.ch> References: <6B57591F-9FA2-45EB-825F-1DB025C0635D@hostpoint.ch> <201007091603.31843.jhb@freebsd.org> <08562D52-02AA-46CF-BFCD-00D0A3C4DC34@hostpoint.ch>
next in thread | previous in thread | raw e-mail | index | archive | help
On 10.07.2010, at 01:53, Markus Gebert wrote: >> I'm curious if disabling USB legacy support in the BIOS causes it to = still die=20 >> even with ehci not loaded. If so, then the SMI# for the ehci = controller must=20 >> somehow prevent the issue, perhaps by triggering frequently enough to = slow the=20 >> rate of I/O requests down? >=20 >=20 > I disabled usb legacy support in the BIOS and booted a kernel with = usb+ohci+ukbd+ums but without ehci. Unfortunately, I cannot reproduce = the MCE. Well, the situation has changed. Machine died over the weekend running = our test load with above kernel configuration. It seems that not having = ehci in the kernel at boot just makes the MCE much more unlikely to = occur, but it occurs. With ehci, I can panic the machine within a = minute, without ehci it seems to take at least hours. Still, I don't get = why not having the ehci driver in the kernel should have any effect, = especially because nothing is attached to it. Panic message: ---- MCA: Bank 4, Status 0xb400004000030c2b MCA: Global Cap 0x0000000000000105, Status 0x0000000000000007 MCA: Vendor "AuthenticAMD", ID 0x40f13, APIC ID 2 MCA: CPU 2 UNCOR BUSLG Observer WR I/O MCA: Address 0xfd00000000 panic: blockable sleep lock (sleep mutex) 128 @ = /usr/src/sys/vm/uma_core.c:1992 cpuid =3D 2 KDB: enter: panic [thread pid 12 tid 100039 ] Stopped at kdb_enter+0x3d: movq $0,0x69ccb0(%rip) ---- Don't know, why it's not a fatal trap 28 this time despite an MCE was = detected. Seen this before though, also with kernels that have ehci and = with usb legacy support, so seeing a different panic this time seems not = related to the way the kernel was configured. Maybe a symptom? Or may it = even be useful? If yes, what should I pull out of DDB? In the meantime, I'll try harder to reproduce the MCE on current... Markus
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?FFB367B2-232D-460D-82B8-C3F03F1B53BE>