Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 08 Apr 2000 11:00:48 -0700
From:      Peter Wemm <peter@netplex.com.au>
To:        wc.bulte@chello.nl
Cc:        "Koster, K.J." <K.J.Koster@kpn.com>, "'Bob.Gorichanaz@midata.com'" <Bob.Gorichanaz@midata.com>, "'FreeBSD Hackers mailing list'" <freebsd-hackers@FreeBSD.ORG>
Subject:   Re: bad memory patch? 
Message-ID:  <20000408180048.28D8C1CD7@overcee.netplex.com.au>
In-Reply-To: Message from Wilko Bulte <wkb@chello.nl>  of "Fri, 07 Apr 2000 15:36:46 %2B0200." <20000407153646.A7558@yedi.wbnet> 

next in thread | previous in thread | raw e-mail | index | archive | help
Wilko Bulte wrote:
> On Fri, Apr 07, 2000 at 03:31:07PM +0100, Koster, K.J. wrote:
> > > 
> > > Not trying to push this idea one way or the other, I'm just 
> > > curious as to WHY so many people think this is a "bad idea"
> > > 
> > I can think of four things real quick:
> > 
> > 1) Disks are much slowere, and controllers actually have time to do proper
> > error detection. Memory is built for raw, blind speed. The analogy that
> > memory is a disk does not hold for long.
> > 
> > 2) Testing memory is a nightmare. It's virtually impossible to test your RA
    M
> > and guarantee it is right.  If the memory test tells you your RAM is broken
    ,
> > you have to replace it. If it tells you your RAM is fine, it may or may not
> > be fine. Much like a pregnancy test. :-) Thus, expecting the OS to find and
> > mark bad memory for you will give you a false sense of security.
> 
> And Real Systems [tm] use ECC memory. ;-)

And Real Memory (tm) fails with transient random errors that cannot be
mapped around. :-)

We recently (like 2 weeks ago) had a batch of motherboards that had loading
problems with 512MB of ram loaded, we'd see lots and lots of random single
and multiple bit errors, never in the same place.  It turns out the problem
wasn't in the ram modules themselves but in the data path between the
memory controller and the ram.  Most errors were in the read data path
which made debugging interesting as we'd see data in registers that didn't
match memory in the crashdumps.  *That* caused a lot of confusion (and lack
of sleep).  Extensive memory testing didn't detect problems as it depended
on the chipset doing other things as well, including PCI IO.

Cheers,
-Peter



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20000408180048.28D8C1CD7>