Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 18 May 2009 11:41:26 -0400
From:      John Baldwin <jhb@freebsd.org>
To:        Alan Cox <alc@cs.rice.edu>
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org
Subject:   Re: svn commit: r192050 - in head/sys: amd64/amd64 amd64/include conf i386/i386 i386/include
Message-ID:  <200905181141.27355.jhb@freebsd.org>
In-Reply-To: <4A0F085D.6000202@cs.rice.edu>
References:  <200905131753.n4DHr4YL063065@svn.freebsd.org> <4A0F085D.6000202@cs.rice.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Saturday 16 May 2009 2:39:25 pm Alan Cox wrote:
> John Baldwin wrote:
> > Author: jhb
> > Date: Wed May 13 17:53:04 2009
> > New Revision: 192050
> > URL: http://svn.freebsd.org/changeset/base/192050
> >
> > Log:
> >   Implement simple machine check support for amd64 and i386.
> >   - For CPUs that only support MCE (the machine check exception) but not 
MCA
> >     (i.e. Pentium), all this does is print out the value of the machine 
check
> >     registers and then panic when a machine check exception occurs.
> >   - For CPUs that support MCA (the machine check architecture), the 
support is
> >     a bit more involved.
> >     - First, there is limited support for decoding the CPU-independent MCA
> >       error codes in the kernel, and the kernel uses this to output a 
short
> >       description of any machine check events that occur.
> >     - When a machine check exception occurs, all of the MCx banks on the
> >       current CPU are scanned and any events are reported to the console
> >       before panic'ing.
> >     - To catch events for correctable errors, a periodic timer kicks off a
> >       task which scans the MCx banks on all CPUs.  The frequency of these
> >       checks is controlled via the "hw.mca.interval" sysctl.
> >     - Userland can request an immediate scan of the MCx banks by writing
> >       a non-zero value to "hw.mca.force_scan".
> >     - If any correctable events are encountered, the appropriate details
> >       are stored in a 'struct mca_record' (defined in <machine/mca.h>).
> >       The "hw.mca.count" is a count of such records and each record may
> >       be queried via the "hw.mca.records" tree by specifying the record
> >       index (0 .. count - 1) as the next name in the MIB similar to using
> >       PIDs with the kern.proc.* sysctls.  The idea is to export machine
> >       check events to userland for more detailed processing.
> >     - The periodic timer and hw.mca sysctls are only present if the CPU
> >       supports MCA.
> >   
> >   Discussed with:	emaste (briefly)
> >   MFC after:	1 month
> >
> > Added:
> >   head/sys/amd64/amd64/mca.c   (contents, props changed)
> >   head/sys/amd64/include/mca.h   (contents, props changed)
> >   head/sys/i386/i386/mca.c   (contents, props changed)
> >   head/sys/i386/include/mca.h   (contents, props changed)
> > Modified:
> >   head/sys/amd64/amd64/machdep.c
> >   head/sys/amd64/amd64/mp_machdep.c
> >   head/sys/amd64/amd64/trap.c
> >   head/sys/amd64/include/specialreg.h
> >   head/sys/conf/files.amd64
> >   head/sys/conf/files.i386
> >   head/sys/i386/i386/machdep.c
> >   head/sys/i386/i386/mp_machdep.c
> >   head/sys/i386/i386/trap.c
> >   head/sys/i386/include/specialreg.h
> >   
> 
> After this change my Phenom II locks up hard within minutes of booting.  
> There are no messages, and I am unable to break into the debugger from a 
> serial console.
> 
> The same exact kernel is running fine on a Core 2 Quad.

I will probably add a tunable to enable machine checks and disable them by 
default then.

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200905181141.27355.jhb>