From owner-freebsd-hackers  Fri Apr  7 15:42:49 2000
Delivered-To: freebsd-hackers@freebsd.org
Received: from relay01.chello.nl (smtp.chello.nl [212.83.68.144])
	by hub.freebsd.org (Postfix) with ESMTP id 519BC37B5F1
	for <hackers@FreeBSD.ORG>; Fri,  7 Apr 2000 15:42:40 -0700 (PDT)
	(envelope-from wkb@chello.nl)
Received: from chello.nl ([213.46.78.184]) by relay01.chello.nl
          (InterMail vK.4.02.00.00 201-232-116 license 99c8f334c649856e3f2cdadc4054e412)
          with ESMTP id <20000407225115.EAOD26673.relay01@chello.nl>;
          Sat, 8 Apr 2000 00:51:15 +0200
Received: (from wkb@localhost)
	by chello.nl (8.9.3/8.9.3) id AAA32154;
	Sat, 8 Apr 2000 00:42:36 +0200 (CEST)
	(envelope-from wkb)
Date: Sat, 8 Apr 2000 00:42:36 +0200
From: Wilko Bulte <wkb@chello.nl>
To: Brooks Davis <brooks@one-eyed-alien.net>
Cc: Warner Losh <imp@village.org>, Bob.Gorichanaz@midata.com,
	hackers@FreeBSD.ORG
Subject: Re: bad memory patch?
Message-ID: <20000408004236.A29300@yedi.wbnet>
Reply-To: wc.bulte@chello.nl
References: <OF2F5C4FC5.C68B571C-ON862568BA.0045E942@midata.com> <200004072204.QAA02457@harmony.village.org> <20000407151907.A1185@orion.ac.hmc.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0.1i
In-Reply-To: <20000407151907.A1185@orion.ac.hmc.edu>; from brooks@one-eyed-alien.net on Fri, Apr 07, 2000 at 03:19:07PM -0700
X-OS: FreeBSD 3.4-STABLE
X-PGP: finger wilko@freebsd.org
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Fri, Apr 07, 2000 at 03:19:07PM -0700, Brooks Davis wrote:
> On Fri, Apr 07, 2000 at 04:04:23PM -0600, Warner Losh wrote:
> > In message <OF2F5C4FC5.C68B571C-ON862568BA.0045E942@midata.com> Bob.Gorichanaz@midata.com writes:
> > : Maybe I'm mis-understanding something, but isn't this situation
> > : analagous to bad sectors on a hard drive?  Isn't this similar, at
> > : least in theory, to remapping dead sectors and continuing to use the
> > : drive? (except that the disk's onboard controller handles the
> > : mapping instead of the OS)
> > 
> > It is not analagous to the bad sectors on the hard drive.  First, it
> > is not always possible to detect a bad memory cell.  In today's world, 
> > these cells are often bad only some of the time.  They work unless
> > pushed really hard in strange patters.  They are just barely outside
> > of spec, and usually work.  This makes their detection hard.
> 
> This can be truly evil.  For instance, I was at a Myricom BOF at SC99
> and they said they had shipped a batch of cards (which they were
> replacing that their expense) that had bad static RAM chips with one bit
> (the exact same one on most of them) which would sometimes flip under
> just the right stress.  I believe the finaly built a test case that
> could trigger the error within a couple of days knowing exactly where it
> was and having some idea what caused it.
> 
> The key to remember with memory is that DRAM is not the nice little
> digital gate we like to think it is.  It's a big ugly analog mess
> and has all sorts of boundry condititions and idea digital system
> wouldn't have.

Right. In a former life I was part of a team that spent a couple of months
tracking down mysterious DRAM errors. In our case we had parity checking on
the machine. In the end our dear memory vendor said: "Well, you know, we
might have found it. We had some mask alignment problems in manufacturing".
Until then they always denied it was a chip problem.

By then we knew that already, weekcode 37 from Hitachi was crap. Hitachi
DRAM still gives me a weird feeling when I see it ;-)

-- 
Wilko Bulte 		Powered by FreeBSD  	http://www.freebsd.org
						http://www.tcja.nl


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message