Date: Sun, 28 Jun 1998 23:11:54 -0700 (PDT) From: Tom <tom@uniserve.com> To: "Louis A. Mamakos" <louie@TransSys.COM> Cc: "Michael R. Gile" <gilem@wsg.net>, freebsd-stable@FreeBSD.ORG Subject: Re: determining ecc errors on freebsd-stable Message-ID: <Pine.BSF.3.96.980628230424.23093A-100000@shell.uniserve.ca> In-Reply-To: <199806290549.BAA02456@whizzo.transsys.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 29 Jun 1998, Louis A. Mamakos wrote: > > On Sun, 28 Jun 1998, Michael R. Gile wrote: > > > > > > There is no way to log ECC corrections are they are done > > > >transparently in the hardware, and currently there is no mechanism for the > > > >hardware to make available that kind of info. > > > > > > there must be some status register that records these errors. Otherwise what > > > good is ECC? If it doesn't tell you that something is wrong, then it is useless > > > > Either ECC fixes the error, or if the error is unfixable, the hardware > > generates a NMI which will cause a panic and reboot. > > > > Basically, if a fixable error occurs, you won't know about it. If an > > unfixable error occurs, you'll know real fast. > > Well, geez, it would be nice to know that you had bum memory in the > machine so you could replace it at some time of your choosing. ECC > memory ought to be better than just having your system crash later > rather than sooner. Well, you could trap the NMI and kill whatever occupied the offending location, and make it sure it wasn't used again. This is an operating system issue, not a hardware one. An NMI panic is MUCH better that "crashing later", as you know precisely what caused it. Memory corruption on non-ECC/non-parity systems is very difficult to track. Plus, you could be corrupting valuable data in the process. With existing ECC systems, at least you get a clean reboot before anything serious is wreaked. > This is the kind of thing that seperates toy computers from robust, > has to be up no matter what mission critical computers. Yeah, yeah... Sun makes a big deal about this... fact of the matter is, if you lose some memory containing the kernel you have to reboot anyhow. If you don't want a toy computer, you get a cluster anyhow, since there is way more stuff that can fail than memory (and more often too). > louie Tom To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.3.96.980628230424.23093A-100000>