From owner-freebsd-stable@FreeBSD.ORG Mon Jul 12 13:57:32 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A15201065677; Mon, 12 Jul 2010 13:57:32 +0000 (UTC) (envelope-from markus.gebert@hostpoint.ch) Received: from mail.adm.hostpoint.ch (mail.adm.hostpoint.ch [217.26.48.124]) by mx1.freebsd.org (Postfix) with ESMTP id 638808FC19; Mon, 12 Jul 2010 13:57:32 +0000 (UTC) Received: from [77.109.131.203] (port=60716 helo=ch4buk-en0.office.hostpoint.internal) by mail.adm.hostpoint.ch with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.69 (FreeBSD)) (envelope-from ) id 1OYJVh-000DBb-VD; Mon, 12 Jul 2010 15:57:29 +0200 Mime-Version: 1.0 (Apple Message framework v1078) Content-Type: text/plain; charset=us-ascii From: Markus Gebert In-Reply-To: <201007120851.35529.jhb@freebsd.org> Date: Mon, 12 Jul 2010 15:57:29 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <0CF6CF2B-907C-42EF-B57E-DF50F0564455@hostpoint.ch> References: <6B57591F-9FA2-45EB-825F-1DB025C0635D@hostpoint.ch> <08562D52-02AA-46CF-BFCD-00D0A3C4DC34@hostpoint.ch> <201007120851.35529.jhb@freebsd.org> To: John Baldwin X-Mailer: Apple Mail (2.1078) Cc: freebsd-stable Subject: Re: 8.1-RC2 - PCI fatal error or MCE triggered by USB/ehci on Sun X4100M2? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Jul 2010 13:57:32 -0000 On 12.07.2010, at 14:51, John Baldwin wrote: >> Well, the situation has changed. Machine died over the weekend = running our=20 >> test load with above kernel configuration. It seems that not having = ehci in=20 >> the kernel at boot just makes the MCE much more unlikely to occur, = but it=20 >> occurs. With ehci, I can panic the machine within a minute, without = ehci it=20 >> seems to take at least hours. Still, I don't get why not having the = ehci=20 >> driver in the kernel should have any effect, especially because = nothing is=20 >> attached to it. >=20 > Ok, so maybe the SMI# interrupts do play a role somehow, at least as = far as=20 > altering the timing. Hm, if I've understood your other email correctly, disabling usb legacy = support should get rid of SMIs just as well as loading the ehci driver. = What I tested was kernel with ehci (panic within a minute) versus kernel = without ehci (panic within hours), but both cases with usb legacy = support disabled in BIOS. So, again, if I understand this correctly, the = "SMI rate" should have been the same in both cases, because usb legacy = support was turned off entirely, and therefore loading or not loading = ehci should not impact the SMI rate. If this should be the case, why = would there be an altering of timings between these two test cases? Since SMM is out the the OS' control, I guess there's no good way to = track SMIs? Markus=