From owner-freebsd-stable@FreeBSD.ORG Tue Jul 20 08:16:08 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BBBE6106566C for ; Tue, 20 Jul 2010 08:16:08 +0000 (UTC) (envelope-from jhellenthal@gmail.com) Received: from mail-yx0-f182.google.com (mail-yx0-f182.google.com [209.85.213.182]) by mx1.freebsd.org (Postfix) with ESMTP id 68F3F8FC0A for ; Tue, 20 Jul 2010 08:16:08 +0000 (UTC) Received: by yxe42 with SMTP id 42so1501638yxe.13 for ; Tue, 20 Jul 2010 01:16:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:date:from:to:cc :subject:in-reply-to:message-id:references:user-agent :x-openpgp-key-id:x-openpgp-key-fingerprint:mime-version :content-type; bh=/RJuHTsPzZk/1OZDFF4KpA4KA94IA9qpphp7BwL8LZQ=; b=VUV/JVfjitkSOChWeVkiWgw5atA7rzo5tJN6gi6nXnvBMafIB5sjM6Jd8ylfi1xXeK OWjhQzZsh+4Q2BNg82VPmfNHDPX5/g0PzqolXaNaqEGvhq3w07qUm8Tnluem9WE728AB z4QQ0Y2zpA8Z0QBh5q9eLYcuxLfCCY2qy9kd8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:x-openpgp-key-id:x-openpgp-key-fingerprint:mime-version :content-type; b=i5w33/whh4k1nM+gHdcWX2786rh0H8HcUzYPvT42pzPbsTJ1Ir7PCN67XaSXs35d05 bPfZQlUnh8patRps4Nqq5la83gQdkjgzajN4z+MYAqvrELh5AmEE78YQ47ViVWWOpHUj Rtrafp5ksbJUi6zZ5SNmF7vhMcku4OsjAoAT4= Received: by 10.100.112.10 with SMTP id k10mr2413491anc.14.1279613767495; Tue, 20 Jul 2010 01:16:07 -0700 (PDT) Received: from centel.dataix.local (adsl-99-181-132-254.dsl.klmzmi.sbcglobal.net [99.181.132.254]) by mx.google.com with ESMTPS id q7sm39238889anf.26.2010.07.20.01.16.05 (version=TLSv1/SSLv3 cipher=RC4-MD5); Tue, 20 Jul 2010 01:16:06 -0700 (PDT) Sender: "J. Hellenthal" Date: Tue, 20 Jul 2010 04:15:59 -0400 From: jhell To: Markus Gebert In-Reply-To: Message-ID: References: <6B57591F-9FA2-45EB-825F-1DB025C0635D@hostpoint.ch> <201007091603.31843.jhb@freebsd.org> <08562D52-02AA-46CF-BFCD-00D0A3C4DC34@hostpoint.ch> <9DCFE2F6-D7CB-49CB-8EBC-06C1E5EBB727@hostpoint.ch> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) X-OpenPGP-Key-Id: 0x89D8547E X-OpenPGP-Key-Fingerprint: 85EF E26B 07BB 3777 76BE B12A 9057 8789 89D8 547E MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-stable Subject: Re: 8.1-RC2 MCE caused by some LAPIC/clock changes? (was: 8.1-RC2 - PCI fatal error or MCE triggered by USB/ehci on Sun X4100M2?) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Jul 2010 08:16:08 -0000 On Sat, 17 Jul 2010 14:35, Markus Gebert wrote: In Message-Id: > > On 13.07.2010, at 16:02, Markus Gebert wrote: > >> Unfortunately, I have not been able to get anything useful out the svn >> commit logs, which could explain this. Maybe someone else has an idea >> what could have changed between 7 and 8 to break it, and again between >> 8 and CURRENT to magically fix it again. > > I tracked this down further. I couldn't easily downgrade my 8.1 > installation to see when the problem was introduced because the zpool > version used is 14. So I tried to figure out, when the problem was > solved in CURRENT. > > I started with the first possible revision that can boot off my v14 pool > (r201143, Dec 28, zfs v14 commit). With this revision, I was able to > trigger the MCE. > > Then I took some later revision (rev206010, Apr 1, chosen randomly), and > I couldn't reproduce the problem. I started narrowing the revisions down > until I found out, that while on r202386 I'm still able to trigger the > MCE, r202387 seems to solve the problem on CURRENT: > > http://svn.freebsd.org/viewvc/base?view=revision&revision=202387 > > Since John Baldwin mentioned this problem could be timing related, it > seems reasonable, that a clock-related change could be fix it. But this > commit seems to have been MFC'd to 8-STABLE and 8.1 (at least as far as > I can tell) along with some other changes to amd64 specific code. I > thought that maybe these other changes that have been MFC'd could have > reintroduced the problem later on, but so far I could not reproduce the > problem with newer CURRENT revisions. So, I actually nailed this one > done to a single commit on CURRENT, but still cannot tell what the > actual difference is compared to 8-STABLE/8.1. > > Any ideas how to proceed? > Adding to this I remembered some specific commits that caught my attention when they happened. Specifically they were to mca.c (locate mca) on my machine provided the file paths and svn log provided the commit log. When you said April and I seen the log it rang a bell. These may be of interest to you: ------------------------------------------------------------------------ r210079 | jhb | 2010-07-14 17:10:14 -0400 (Wed, 14 Jul 2010) | 13 lines MFC 208507,208556,208621: Add support for corrected machine check interrupts. CMCI is a new local APIC interrupt that fires when a threshold of corrected machine check events is reached. CMCI also includes a count of events when reporting corrected errors in the bank's status register. Note that individual banks may or may not support CMCI. If they do, each bank includes its own threshold register that determines when the interrupt fires. Currently the code uses a very simple strategy where it doubles the threshold on each interrupt until it succeeds in throttling the interrupt to occur only once a minute (this interval can be tuned via sysctl). The threshold is also adjusted on each hourly poll which will lower the threshold once events stop occurring. ------------------------------------------------------------------------ r206183 | alc | 2010-04-05 12:11:42 -0400 (Mon, 05 Apr 2010) | 6 lines MFC r204907, r204913, r205402, r205573, r205573 Implement AMD's recommended workaround for Erratum 383 on Family 10h processors. Enable machine check exceptions by default. ------------------------------------------------------------------------ And a list of mca.c's within the stable/8 src tree: /usr/src/sbin/mca/mca.c /usr/src/sys/amd64/amd64/mca.c /usr/src/sys/dev/aha/aha_mca.c /usr/src/sys/dev/buslogic/bt_mca.c /usr/src/sys/dev/ep/if_ep_mca.c /usr/src/sys/i386/i386/mca.c /usr/src/sys/ia64/ia64/mca.c Regards & Good luck, -- jhell