From owner-freebsd-hackers Mon Sep 24 18:23:57 2001 Delivered-To: freebsd-hackers@freebsd.org Received: from peter3.wemm.org (c1315225-a.plstn1.sfba.home.com [24.14.150.180]) by hub.freebsd.org (Postfix) with ESMTP id 4E99537B491 for ; Mon, 24 Sep 2001 18:23:41 -0700 (PDT) Received: from overcee.netplex.com.au (overcee.wemm.org [10.0.0.3]) by peter3.wemm.org (8.11.0/8.11.0) with ESMTP id f8P1NfM20153 for ; Mon, 24 Sep 2001 18:23:41 -0700 (PDT) (envelope-from peter@wemm.org) Received: from wemm.org (localhost [127.0.0.1]) by overcee.netplex.com.au (Postfix) with ESMTP id E7EF63808; Mon, 24 Sep 2001 18:23:40 -0700 (PDT) (envelope-from peter@wemm.org) X-Mailer: exmh version 2.3.1 01/18/2001 with nmh-1.0.4 To: Andrew Gallatin Cc: Matt Dillon , freebsd-hackers@FreeBSD.ORG Subject: Re: ecc on i386 In-Reply-To: <15279.55878.110154.650940@grasshopper.cs.duke.edu> Date: Mon, 24 Sep 2001 18:23:40 -0700 From: Peter Wemm Message-Id: <20010925012340.E7EF63808@overcee.netplex.com.au> Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Andrew Gallatin wrote: > > Matt Dillon writes: > > > > :What happens on an ECC equipped PC when you have a multi-bit memory > > :error that hardware scrubbing can't fix? Will there be some sort of > > :NMI or something that will panic the box? > > : > > :I'm used to alphas (where you'll get a fatal machine check panic) and > > :I am just wondering if PCs are as safe. > > : > > :Thanks, > > : > > :Drew > > > > ECC can typically detect and correct single bit errors and detect > > double bit errors. Anything beyond that is problematic... it may or > > may not detect the problem or may mis-correct a multi-bit error. > > An NMI is generated if an uncorrectable error is detected. > > > > On PC's, ECC is optional. Desktops typically do not ship with ECC > > memory. Branded servers typically do. A year or two ago I would > > have been happy to use non-ECC rams (finding bad RAM through trial > > and error), but now with capacities as they are and memory prices down > > ECC is definitely the way to go. > > My sentiments exactly. I wrote a poller for picking up correction events on various serverworks motherboards (compaq, tyan) and it was *scarey* how often single-bit errors were being corrected. > > Bit errors can come from many sources, memory being only one. Bit err ors > > can occur inside the cpu chip, in the L1 and L2 caches, in memory, in > > controller chips... all over the place. Many modern processors implem ent > > parity on their caches to try to cover the problem areas. I'm not sur e > > how Pentium III's and IV's are setup. > > > > -Matt > > Hmm.. Well, it turns out that the box I"m insterested in (Thunder K7) > can be set to send an SERR on multiple bit errors. I wonder what > happens when a pc gets an SERR? (that's another machine check > on alpha) On the Thunder K7, #SERR is routed to NMI. Trust me, you want this. And set it to ECC-SCRUB instead of "off" like the default now is. See my other email about how #SERR is converted to NMI via the ISA part of the south bridge. Cheers, -Peter -- Peter Wemm - peter@FreeBSD.org; peter@yahoo-inc.com; peter@netplex.com.au "All of this is for nothing if we don't go to the stars" - JMS/B5 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message