Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 24 Sep 2001 18:23:40 -0700
From:      Peter Wemm <peter@wemm.org>
To:        Andrew Gallatin <gallatin@cs.duke.edu>
Cc:        Matt Dillon <dillon@earth.backplane.com>, freebsd-hackers@FreeBSD.ORG
Subject:   Re: ecc on i386 
Message-ID:  <20010925012340.E7EF63808@overcee.netplex.com.au>
In-Reply-To: <15279.55878.110154.650940@grasshopper.cs.duke.edu> 

next in thread | previous in thread | raw e-mail | index | archive | help
Andrew Gallatin wrote:
> 
> Matt Dillon writes:
>  > 
>  > :What happens on an ECC equipped PC when you have a multi-bit memory
>  > :error that hardware scrubbing can't fix?  Will there be some sort of
>  > :NMI or something that will panic the box?
>  > :
>  > :I'm used to alphas (where you'll get a fatal machine check panic) and
>  > :I am just wondering if PCs are as safe.
>  > :
>  > :Thanks,
>  > :
>  > :Drew
>  > 
>  >     ECC can typically detect and correct single bit errors and detect
>  >     double bit errors.  Anything beyond that is problematic... it may or
>  >     may not detect the problem or may mis-correct a multi-bit error. 
>  >     An NMI is generated if an uncorrectable error is detected.
>  > 
>  >     On PC's, ECC is optional.  Desktops typically do not ship with ECC
>  >     memory.  Branded servers typically do.    A year or two ago I would
>  >     have been happy to use non-ECC rams (finding bad RAM through trial
>  >     and error), but now with capacities as they are and memory prices down
>  >     ECC is definitely the way to go.
> 
> My sentiments exactly.

I wrote a poller for picking up correction events on various serverworks
motherboards (compaq, tyan) and it was *scarey* how often single-bit errors
were being corrected.

>  >     Bit errors can come from many sources, memory being only one.  Bit err
    ors
>  >     can occur inside the cpu chip, in the L1 and L2 caches, in memory, in
>  >     controller chips... all over the place.  Many modern processors implem
    ent
>  >     parity on their caches to try to cover the problem areas.  I'm not sur
    e
>  >     how Pentium III's and IV's are setup.
>  > 
>  > 						-Matt
> 
> Hmm.. Well, it turns out that the box I"m insterested in (Thunder K7)
> can be set to send an SERR on multiple bit errors.  I wonder what
> happens when a pc gets an SERR? (that's another machine check
> on alpha)

On the Thunder K7, #SERR is routed to NMI.  Trust me, you want this.
And set it to ECC-SCRUB instead of "off" like the default now is.

See my other email about how #SERR is converted to NMI via the ISA part of
the south bridge.

Cheers,
-Peter
--
Peter Wemm - peter@FreeBSD.org; peter@yahoo-inc.com; peter@netplex.com.au
"All of this is for nothing if we don't go to the stars" - JMS/B5


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010925012340.E7EF63808>