Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 8 Apr 2000 00:42:36 +0200
From:      Wilko Bulte <wkb@chello.nl>
To:        Brooks Davis <brooks@one-eyed-alien.net>
Cc:        Warner Losh <imp@village.org>, Bob.Gorichanaz@midata.com, hackers@FreeBSD.ORG
Subject:   Re: bad memory patch?
Message-ID:  <20000408004236.A29300@yedi.wbnet>
In-Reply-To: <20000407151907.A1185@orion.ac.hmc.edu>; from brooks@one-eyed-alien.net on Fri, Apr 07, 2000 at 03:19:07PM -0700
References:  <OF2F5C4FC5.C68B571C-ON862568BA.0045E942@midata.com> <200004072204.QAA02457@harmony.village.org> <20000407151907.A1185@orion.ac.hmc.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Apr 07, 2000 at 03:19:07PM -0700, Brooks Davis wrote:
> On Fri, Apr 07, 2000 at 04:04:23PM -0600, Warner Losh wrote:
> > In message <OF2F5C4FC5.C68B571C-ON862568BA.0045E942@midata.com> Bob.Gorichanaz@midata.com writes:
> > : Maybe I'm mis-understanding something, but isn't this situation
> > : analagous to bad sectors on a hard drive?  Isn't this similar, at
> > : least in theory, to remapping dead sectors and continuing to use the
> > : drive? (except that the disk's onboard controller handles the
> > : mapping instead of the OS)
> > 
> > It is not analagous to the bad sectors on the hard drive.  First, it
> > is not always possible to detect a bad memory cell.  In today's world, 
> > these cells are often bad only some of the time.  They work unless
> > pushed really hard in strange patters.  They are just barely outside
> > of spec, and usually work.  This makes their detection hard.
> 
> This can be truly evil.  For instance, I was at a Myricom BOF at SC99
> and they said they had shipped a batch of cards (which they were
> replacing that their expense) that had bad static RAM chips with one bit
> (the exact same one on most of them) which would sometimes flip under
> just the right stress.  I believe the finaly built a test case that
> could trigger the error within a couple of days knowing exactly where it
> was and having some idea what caused it.
> 
> The key to remember with memory is that DRAM is not the nice little
> digital gate we like to think it is.  It's a big ugly analog mess
> and has all sorts of boundry condititions and idea digital system
> wouldn't have.

Right. In a former life I was part of a team that spent a couple of months
tracking down mysterious DRAM errors. In our case we had parity checking on
the machine. In the end our dear memory vendor said: "Well, you know, we
might have found it. We had some mask alignment problems in manufacturing".
Until then they always denied it was a chip problem.

By then we knew that already, weekcode 37 from Hitachi was crap. Hitachi
DRAM still gives me a weird feeling when I see it ;-)

-- 
Wilko Bulte 		Powered by FreeBSD  	http://www.freebsd.org
						http://www.tcja.nl


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20000408004236.A29300>