From owner-freebsd-stable@FreeBSD.ORG Mon Jul 12 15:08:46 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9CC791065976 for ; Mon, 12 Jul 2010 15:08:45 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 18D888FC21 for ; Mon, 12 Jul 2010 15:08:42 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id BFFFE46B85; Mon, 12 Jul 2010 11:08:41 -0400 (EDT) Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 041558A04F; Mon, 12 Jul 2010 11:08:41 -0400 (EDT) From: John Baldwin To: Markus Gebert Date: Mon, 12 Jul 2010 11:06:59 -0400 User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20100217; KDE/4.4.5; amd64; ; ) References: <6B57591F-9FA2-45EB-825F-1DB025C0635D@hostpoint.ch> <201007120851.35529.jhb@freebsd.org> <0CF6CF2B-907C-42EF-B57E-DF50F0564455@hostpoint.ch> In-Reply-To: <0CF6CF2B-907C-42EF-B57E-DF50F0564455@hostpoint.ch> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201007121106.59454.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Mon, 12 Jul 2010 11:08:41 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: freebsd-stable Subject: Re: 8.1-RC2 - PCI fatal error or MCE triggered by USB/ehci on Sun X4100M2? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Jul 2010 15:08:46 -0000 On Monday, July 12, 2010 9:57:29 am Markus Gebert wrote: > > On 12.07.2010, at 14:51, John Baldwin wrote: > > >> Well, the situation has changed. Machine died over the weekend running our > >> test load with above kernel configuration. It seems that not having ehci in > >> the kernel at boot just makes the MCE much more unlikely to occur, but it > >> occurs. With ehci, I can panic the machine within a minute, without ehci it > >> seems to take at least hours. Still, I don't get why not having the ehci > >> driver in the kernel should have any effect, especially because nothing is > >> attached to it. > > > > Ok, so maybe the SMI# interrupts do play a role somehow, at least as far as > > altering the timing. > > Hm, if I've understood your other email correctly, disabling usb legacy support should get rid of SMIs just as well as loading the ehci driver. What I tested was kernel with ehci (panic within a minute) versus kernel without ehci (panic within hours), but both cases with usb legacy support disabled in BIOS. So, again, if I understand this correctly, the "SMI rate" should have been the same in both cases, because usb legacy support was turned off entirely, and therefore loading or not loading ehci should not impact the SMI rate. If this should be the case, why would there be an altering of timings between these two test cases? Oh, I didn't know that USB legacy support was disabled in both cases. That should disable all the SMIs in both cases as you say. Are you using Cx states other than C1 for the CPUs at all? -- John Baldwin