Date: Mon, 24 Sep 2001 18:23:40 -0700 From: Peter Wemm <peter@wemm.org> To: Andrew Gallatin <gallatin@cs.duke.edu> Cc: Matt Dillon <dillon@earth.backplane.com>, freebsd-hackers@FreeBSD.ORG Subject: Re: ecc on i386 Message-ID: <20010925012340.E7EF63808@overcee.netplex.com.au> In-Reply-To: <15279.55878.110154.650940@grasshopper.cs.duke.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
Andrew Gallatin wrote: > > Matt Dillon writes: > > > > :What happens on an ECC equipped PC when you have a multi-bit memory > > :error that hardware scrubbing can't fix? Will there be some sort of > > :NMI or something that will panic the box? > > : > > :I'm used to alphas (where you'll get a fatal machine check panic) and > > :I am just wondering if PCs are as safe. > > : > > :Thanks, > > : > > :Drew > > > > ECC can typically detect and correct single bit errors and detect > > double bit errors. Anything beyond that is problematic... it may or > > may not detect the problem or may mis-correct a multi-bit error. > > An NMI is generated if an uncorrectable error is detected. > > > > On PC's, ECC is optional. Desktops typically do not ship with ECC > > memory. Branded servers typically do. A year or two ago I would > > have been happy to use non-ECC rams (finding bad RAM through trial > > and error), but now with capacities as they are and memory prices down > > ECC is definitely the way to go. > > My sentiments exactly. I wrote a poller for picking up correction events on various serverworks motherboards (compaq, tyan) and it was *scarey* how often single-bit errors were being corrected. > > Bit errors can come from many sources, memory being only one. Bit err ors > > can occur inside the cpu chip, in the L1 and L2 caches, in memory, in > > controller chips... all over the place. Many modern processors implem ent > > parity on their caches to try to cover the problem areas. I'm not sur e > > how Pentium III's and IV's are setup. > > > > -Matt > > Hmm.. Well, it turns out that the box I"m insterested in (Thunder K7) > can be set to send an SERR on multiple bit errors. I wonder what > happens when a pc gets an SERR? (that's another machine check > on alpha) On the Thunder K7, #SERR is routed to NMI. Trust me, you want this. And set it to ECC-SCRUB instead of "off" like the default now is. See my other email about how #SERR is converted to NMI via the ISA part of the south bridge. Cheers, -Peter -- Peter Wemm - peter@FreeBSD.org; peter@yahoo-inc.com; peter@netplex.com.au "All of this is for nothing if we don't go to the stars" - JMS/B5 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010925012340.E7EF63808>