Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 1 Oct 2011 09:41:27 -0800
From:      Royce Williams <royce.williams@gmail.com>
To:        freebsd-stable <freebsd-stable@freebsd.org>
Subject:   Re: Interpreting MCA error output
Message-ID:  <CA%2BE3k93mrADymauA8Up0XgBWxujRp1HhWBSVfw30bSzi0UCcbA@mail.gmail.com>
In-Reply-To: <20111001102327.GA37434@icarus.home.lan>
References:  <CAFU734y3WsVFTpnGoGfbPH4vVBnoz8f=qGvYS4c%2BLya8PFQP_A@mail.gmail.com> <20111001102327.GA37434@icarus.home.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Oct 1, 2011 at 2:23 AM, Jeremy Chadwick
<freebsd@jdc.parodius.com> wrote:

[snip]

> Decoding the MCE can be done using Linux's mcelog program -- you'll need
> to download the source and apply the patch by hand *and* put in place a
> heavily modified version of memstream.c -- which requires a lot of
> patching to work on FreeBSD, and can only be used to decode
> ASCII-provided MCEs; DMI support does not work. =A0So, you have to apply
> patches then use "mcelog --no-dmi --ascii" and provide the MCE text via
> stdin (or use --file).

I'm glad to see this thread; I have a different error, for which I
wanted to make sure I was fixing the right problem before randomly
swapping hardware.

> John Baldwin tends to keep up-to-date patches for mcelog here:
>
> http://people.freebsd.org/~jhb/mcelog/
>
> The last build of mcelog I did on FreeBSD was for mcelog-1.0pre2, which
> John's patch (at the time) did not work with. =A0I made my own patch
> (dated 2011/02/11), but it looks like John has since updated his patch.
> If you need/want mine, I can put it up on the web.

That would be very useful as a crosscheck.

I found one additional intermediate patch from John, posted on
2011-04-26, that appears to have not been merged into ~jhb/mcelog/ :

http://lists.freebsd.org/pipermail/freebsd-hackers/2011-April/035159.html

That patch got me up and going (using 'gmake FREEBSD=3Dyes i386=3Dyes')

My problem is different from the original poster; looks a bit more serious:

royce@heffalump$ ./mcelog --no-dmi --ascii
MCA: Bank 1, Status 0x9400000000000151
MCA: Global Cap 0x0000000000000105, Status 0x0000000000000000
MCA: Vendor "AuthenticAMD", ID 0xfc0, APIC ID 0
MCA: CPU 0 COR ICACHE L1 IRD error
MCA: Address 0xc089d890

HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 1 instruction cache
ADDR c089d890
  memory/cache error 'instruction fetch mem transaction, instruction
transaction, level 1'
STATUS 9400000000000151 MCGSTATUS 0
MCGCAP 105 APICID 0 SOCKETID 0
CPUID Vendor AMD Family 15 Model 12


> A few moments ago I tried to download mcelog from the official site, but
> ftp.kernel.org is presently returning NXDOMAIN for me (e.g. A record not
> found). =A0The same goes for git.kernel.org. =A0Great.....

kernel.org is still down from the compromise a few weeks ago.  At
least one kernel.org mirror still has mcelog-1.0pre2:

http://mirror.xmission.com/kernel.org/linux/utils/cpu/mce/mcelog-1.0pre2.ta=
r.gz

The main mcelog page also has a link to how to get it from GitHub
while kernel.org is down:

https://github.com/andikleen/mcelog


Agreed that a port and a CGI would be higher leverage, but these
breadcrumbs should help in the short term.

Royce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CA%2BE3k93mrADymauA8Up0XgBWxujRp1HhWBSVfw30bSzi0UCcbA>