From owner-freebsd-hackers Sat Apr 8 11: 1: 0 2000 Delivered-To: freebsd-hackers@freebsd.org Received: from overcee.netplex.com.au (peter1.yahoo.com [208.48.107.4]) by hub.freebsd.org (Postfix) with ESMTP id E44CC37B58A for ; Sat, 8 Apr 2000 11:00:48 -0700 (PDT) (envelope-from peter@netplex.com.au) Received: from netplex.com.au (localhost [127.0.0.1]) by overcee.netplex.com.au (Postfix) with ESMTP id 28D8C1CD7; Sat, 8 Apr 2000 11:00:48 -0700 (PDT) (envelope-from peter@netplex.com.au) X-Mailer: exmh version 2.1.1 10/15/1999 To: wc.bulte@chello.nl Cc: "Koster, K.J." , "'Bob.Gorichanaz@midata.com'" , "'FreeBSD Hackers mailing list'" Subject: Re: bad memory patch? In-Reply-To: Message from Wilko Bulte of "Fri, 07 Apr 2000 15:36:46 +0200." <20000407153646.A7558@yedi.wbnet> Date: Sat, 08 Apr 2000 11:00:48 -0700 From: Peter Wemm Message-Id: <20000408180048.28D8C1CD7@overcee.netplex.com.au> Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Wilko Bulte wrote: > On Fri, Apr 07, 2000 at 03:31:07PM +0100, Koster, K.J. wrote: > > > > > > Not trying to push this idea one way or the other, I'm just > > > curious as to WHY so many people think this is a "bad idea" > > > > > I can think of four things real quick: > > > > 1) Disks are much slowere, and controllers actually have time to do proper > > error detection. Memory is built for raw, blind speed. The analogy that > > memory is a disk does not hold for long. > > > > 2) Testing memory is a nightmare. It's virtually impossible to test your RA M > > and guarantee it is right. If the memory test tells you your RAM is broken , > > you have to replace it. If it tells you your RAM is fine, it may or may not > > be fine. Much like a pregnancy test. :-) Thus, expecting the OS to find and > > mark bad memory for you will give you a false sense of security. > > And Real Systems [tm] use ECC memory. ;-) And Real Memory (tm) fails with transient random errors that cannot be mapped around. :-) We recently (like 2 weeks ago) had a batch of motherboards that had loading problems with 512MB of ram loaded, we'd see lots and lots of random single and multiple bit errors, never in the same place. It turns out the problem wasn't in the ram modules themselves but in the data path between the memory controller and the ram. Most errors were in the read data path which made debugging interesting as we'd see data in registers that didn't match memory in the crashdumps. *That* caused a lot of confusion (and lack of sleep). Extensive memory testing didn't detect problems as it depended on the chipset doing other things as well, including PCI IO. Cheers, -Peter To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message