From owner-freebsd-hackers Fri Apr 7 15:42:49 2000 Delivered-To: freebsd-hackers@freebsd.org Received: from relay01.chello.nl (smtp.chello.nl [212.83.68.144]) by hub.freebsd.org (Postfix) with ESMTP id 519BC37B5F1 for ; Fri, 7 Apr 2000 15:42:40 -0700 (PDT) (envelope-from wkb@chello.nl) Received: from chello.nl ([213.46.78.184]) by relay01.chello.nl (InterMail vK.4.02.00.00 201-232-116 license 99c8f334c649856e3f2cdadc4054e412) with ESMTP id <20000407225115.EAOD26673.relay01@chello.nl>; Sat, 8 Apr 2000 00:51:15 +0200 Received: (from wkb@localhost) by chello.nl (8.9.3/8.9.3) id AAA32154; Sat, 8 Apr 2000 00:42:36 +0200 (CEST) (envelope-from wkb) Date: Sat, 8 Apr 2000 00:42:36 +0200 From: Wilko Bulte To: Brooks Davis Cc: Warner Losh , Bob.Gorichanaz@midata.com, hackers@FreeBSD.ORG Subject: Re: bad memory patch? Message-ID: <20000408004236.A29300@yedi.wbnet> Reply-To: wc.bulte@chello.nl References: <200004072204.QAA02457@harmony.village.org> <20000407151907.A1185@orion.ac.hmc.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: <20000407151907.A1185@orion.ac.hmc.edu>; from brooks@one-eyed-alien.net on Fri, Apr 07, 2000 at 03:19:07PM -0700 X-OS: FreeBSD 3.4-STABLE X-PGP: finger wilko@freebsd.org Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Fri, Apr 07, 2000 at 03:19:07PM -0700, Brooks Davis wrote: > On Fri, Apr 07, 2000 at 04:04:23PM -0600, Warner Losh wrote: > > In message Bob.Gorichanaz@midata.com writes: > > : Maybe I'm mis-understanding something, but isn't this situation > > : analagous to bad sectors on a hard drive? Isn't this similar, at > > : least in theory, to remapping dead sectors and continuing to use the > > : drive? (except that the disk's onboard controller handles the > > : mapping instead of the OS) > > > > It is not analagous to the bad sectors on the hard drive. First, it > > is not always possible to detect a bad memory cell. In today's world, > > these cells are often bad only some of the time. They work unless > > pushed really hard in strange patters. They are just barely outside > > of spec, and usually work. This makes their detection hard. > > This can be truly evil. For instance, I was at a Myricom BOF at SC99 > and they said they had shipped a batch of cards (which they were > replacing that their expense) that had bad static RAM chips with one bit > (the exact same one on most of them) which would sometimes flip under > just the right stress. I believe the finaly built a test case that > could trigger the error within a couple of days knowing exactly where it > was and having some idea what caused it. > > The key to remember with memory is that DRAM is not the nice little > digital gate we like to think it is. It's a big ugly analog mess > and has all sorts of boundry condititions and idea digital system > wouldn't have. Right. In a former life I was part of a team that spent a couple of months tracking down mysterious DRAM errors. In our case we had parity checking on the machine. In the end our dear memory vendor said: "Well, you know, we might have found it. We had some mask alignment problems in manufacturing". Until then they always denied it was a chip problem. By then we knew that already, weekcode 37 from Hitachi was crap. Hitachi DRAM still gives me a weird feeling when I see it ;-) -- Wilko Bulte Powered by FreeBSD http://www.freebsd.org http://www.tcja.nl To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message