From owner-freebsd-stable@FreeBSD.ORG  Mon Jul 12 12:41:53 2010
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A8731106566B;
	Mon, 12 Jul 2010 12:41:53 +0000 (UTC)
	(envelope-from markus.gebert@hostpoint.ch)
Received: from mail.adm.hostpoint.ch (mail.adm.hostpoint.ch [217.26.48.124])
	by mx1.freebsd.org (Postfix) with ESMTP id 6B3438FC08;
	Mon, 12 Jul 2010 12:41:53 +0000 (UTC)
Received: from [77.109.131.203] (port=60539
	helo=ch4buk-en0.office.hostpoint.internal)
	by mail.adm.hostpoint.ch with esmtpsa (TLSv1:AES128-SHA:128)
	(Exim 4.69 (FreeBSD)) (envelope-from <markus.gebert@hostpoint.ch>)
	id 1OYIKW-0006iZ-47; Mon, 12 Jul 2010 14:41:52 +0200
Mime-Version: 1.0 (Apple Message framework v1078)
Content-Type: text/plain; charset=us-ascii
From: Markus Gebert <markus.gebert@hostpoint.ch>
In-Reply-To: <08562D52-02AA-46CF-BFCD-00D0A3C4DC34@hostpoint.ch>
Date: Mon, 12 Jul 2010 14:41:51 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <FFB367B2-232D-460D-82B8-C3F03F1B53BE@hostpoint.ch>
References: <6B57591F-9FA2-45EB-825F-1DB025C0635D@hostpoint.ch>
	<201007091603.31843.jhb@freebsd.org>
	<08562D52-02AA-46CF-BFCD-00D0A3C4DC34@hostpoint.ch>
To: John Baldwin <jhb@freebsd.org>
X-Mailer: Apple Mail (2.1078)
Cc: freebsd-stable <freebsd-stable@freebsd.org>
Subject: Re: 8.1-RC2 - PCI fatal error or MCE triggered by USB/ehci on Sun
	X4100M2?
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 12 Jul 2010 12:41:53 -0000


On 10.07.2010, at 01:53, Markus Gebert wrote:

>> I'm curious if disabling USB legacy support in the BIOS causes it to =
still die=20
>> even with ehci not loaded.  If so, then the SMI# for the ehci =
controller must=20
>> somehow prevent the issue, perhaps by triggering frequently enough to =
slow the=20
>> rate of I/O requests down?
>=20
>=20
> I disabled usb legacy support in the BIOS and booted a kernel with =
usb+ohci+ukbd+ums but without ehci. Unfortunately, I cannot reproduce =
the MCE.


Well, the situation has changed. Machine died over the weekend running =
our test load with above kernel configuration. It seems that not having =
ehci in the kernel at boot just makes the MCE much more unlikely to =
occur, but it occurs. With ehci, I can panic the machine within a =
minute, without ehci it seems to take at least hours. Still, I don't get =
why not having the ehci driver in the kernel should have any effect, =
especially because nothing is attached to it.

Panic message:

----
MCA: Bank 4, Status 0xb400004000030c2b
MCA: Global Cap 0x0000000000000105, Status 0x0000000000000007
MCA: Vendor "AuthenticAMD", ID 0x40f13, APIC ID 2
MCA: CPU 2 UNCOR BUSLG Observer WR I/O
MCA: Address 0xfd00000000
panic: blockable sleep lock (sleep mutex) 128 @ =
/usr/src/sys/vm/uma_core.c:1992
cpuid =3D 2
KDB: enter: panic
[thread pid 12 tid 100039 ]
Stopped at      kdb_enter+0x3d: movq    $0,0x69ccb0(%rip)
----

Don't know, why it's not a fatal trap 28 this time despite an MCE was =
detected. Seen this before though, also with kernels that have ehci and =
with usb legacy support, so seeing a different panic this time seems not =
related to the way the kernel was configured. Maybe a symptom? Or may it =
even be useful? If yes, what should I pull out of DDB?

In the meantime, I'll try harder to reproduce the MCE on current...


Markus