Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 17 Jul 2010 20:35:21 +0200
From:      Markus Gebert <markus.gebert@hostpoint.ch>
To:        freebsd-stable <freebsd-stable@freebsd.org>
Subject:   Re: 8.1-RC2 MCE caused by some LAPIC/clock changes? (was: 8.1-RC2 - PCI fatal error or MCE triggered by USB/ehci on Sun X4100M2?)
Message-ID:  <F744F475-3D2B-4BC6-856A-A5D302AA8681@hostpoint.ch>
In-Reply-To: <9DCFE2F6-D7CB-49CB-8EBC-06C1E5EBB727@hostpoint.ch>
References:  <6B57591F-9FA2-45EB-825F-1DB025C0635D@hostpoint.ch> <201007091603.31843.jhb@freebsd.org> <08562D52-02AA-46CF-BFCD-00D0A3C4DC34@hostpoint.ch> <FFB367B2-232D-460D-82B8-C3F03F1B53BE@hostpoint.ch> <9DCFE2F6-D7CB-49CB-8EBC-06C1E5EBB727@hostpoint.ch>

next in thread | previous in thread | raw e-mail | index | archive | help


On 13.07.2010, at 16:02, Markus Gebert wrote:

> Unfortunately, I have not been able to get anything useful out the svn commit logs, which could explain this. Maybe someone else has an idea what could have changed between 7 and 8 to break it, and again between 8 and CURRENT to magically fix it again.

I tracked this down further. I couldn't easily downgrade my 8.1 installation to see when the problem was introduced because the zpool version used is 14. So I tried to figure out, when the problem was solved in CURRENT.

I started with the first possible revision that can boot off my v14 pool (r201143, Dec 28, zfs v14 commit). With this revision, I was able to trigger the MCE.

Then I took some later revision (rev206010, Apr 1, chosen randomly), and I couldn't reproduce the problem. I started narrowing the revisions down until I found out, that while on r202386 I'm still able to trigger the MCE, r202387 seems to solve the problem on CURRENT:

http://svn.freebsd.org/viewvc/base?view=revision&revision=202387

Since John Baldwin mentioned this problem could be timing related, it seems reasonable, that a clock-related change could be fix it. But this commit seems to have been MFC'd to 8-STABLE and 8.1 (at least as far as I can tell) along with some other changes to amd64 specific code. I thought that maybe these other changes that have been MFC'd could have reintroduced the problem later on, but so far I could not reproduce the problem with newer CURRENT revisions. So, I actually nailed this one done to a single commit on CURRENT, but still cannot tell what the actual difference is compared to 8-STABLE/8.1.

Any ideas how to proceed?


Markus


Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?F744F475-3D2B-4BC6-856A-A5D302AA8681>