From owner-freebsd-stable@FreeBSD.ORG Fri Jul 9 20:03:45 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D2B10106564A for ; Fri, 9 Jul 2010 20:03:45 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id A3B548FC16 for ; Fri, 9 Jul 2010 20:03:45 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 41D5446B86; Fri, 9 Jul 2010 16:03:45 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id CE6298A03C; Fri, 9 Jul 2010 16:03:32 -0400 (EDT) From: John Baldwin To: freebsd-stable@freebsd.org Date: Fri, 9 Jul 2010 16:03:31 -0400 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20100217; KDE/4.4.5; amd64; ; ) References: <6B57591F-9FA2-45EB-825F-1DB025C0635D@hostpoint.ch> In-Reply-To: <6B57591F-9FA2-45EB-825F-1DB025C0635D@hostpoint.ch> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201007091603.31843.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Fri, 09 Jul 2010 16:03:32 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: Markus Gebert Subject: Re: 8.1-RC2 - PCI fatal error or MCE triggered by USB/ehci on Sun X4100M2? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 09 Jul 2010 20:03:45 -0000 On Friday, July 09, 2010 11:26:00 am Markus Gebert wrote: > -- > MCA: Bank 4, Status 0xb400004000030c2b > MCA: Global Cap 0x0000000000000105, Status 0x0000000000000007 > MCA: Vendor "AuthenticAMD", ID 0x40f13, APIC ID 2 > MCA: CPU 2 UNCOR BUSLG Observer WR I/O > MCA: Address 0xfd00000000 Using my local port of mcelog this is what I get for this check: CPU 2 4 northbridge ADDR fd00000000 Northbridge Master abort link number = 4 bit61 = error uncorrected bus error 'local node observed, request didn't time out generic write mem transaction i/o access, level generic' STATUS b400004000030c2b MCGSTATUS 7 MCGCAP 105 APICID 2 SOCKETID 0 CPUID Vendor AMD Family 15 Model 65 I don't know what to tell you off hand. Did you buy this hardware from Sun directly? If so, I would try bugging them about this, especially given the error that the BIOS is logging. It does sound like a hardware issue, but in the chipset, not in the RAM, so you might need to swap out the main board rather than the RAM. I'm curious if disabling USB legacy support in the BIOS causes it to still die even with ehci not loaded. If so, then the SMI# for the ehci controller must somehow prevent the issue, perhaps by triggering frequently enough to slow the rate of I/O requests down? -- John Baldwin