From owner-freebsd-hackers Fri Apr 7 15:19:32 2000 Delivered-To: freebsd-hackers@freebsd.org Received: from orion.ac.hmc.edu (Orion.AC.HMC.Edu [134.173.32.20]) by hub.freebsd.org (Postfix) with ESMTP id E172237B5F1 for ; Fri, 7 Apr 2000 15:19:16 -0700 (PDT) (envelope-from brdavis@orion.ac.hmc.edu) Received: (from brdavis@localhost) by orion.ac.hmc.edu (8.8.8/8.8.8) id PAA05160; Fri, 7 Apr 2000 15:19:07 -0700 (PDT) Date: Fri, 7 Apr 2000 15:19:07 -0700 From: Brooks Davis To: Warner Losh Cc: Bob.Gorichanaz@midata.com, hackers@FreeBSD.ORG Subject: Re: bad memory patch? Message-ID: <20000407151907.A1185@orion.ac.hmc.edu> References: <200004072204.QAA02457@harmony.village.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0pre4i In-Reply-To: <200004072204.QAA02457@harmony.village.org>; from imp@village.org on Fri, Apr 07, 2000 at 04:04:23PM -0600 Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Fri, Apr 07, 2000 at 04:04:23PM -0600, Warner Losh wrote: > In message Bob.Gorichanaz@midata.com writes: > : Maybe I'm mis-understanding something, but isn't this situation > : analagous to bad sectors on a hard drive? Isn't this similar, at > : least in theory, to remapping dead sectors and continuing to use the > : drive? (except that the disk's onboard controller handles the > : mapping instead of the OS) > > It is not analagous to the bad sectors on the hard drive. First, it > is not always possible to detect a bad memory cell. In today's world, > these cells are often bad only some of the time. They work unless > pushed really hard in strange patters. They are just barely outside > of spec, and usually work. This makes their detection hard. This can be truly evil. For instance, I was at a Myricom BOF at SC99 and they said they had shipped a batch of cards (which they were replacing that their expense) that had bad static RAM chips with one bit (the exact same one on most of them) which would sometimes flip under just the right stress. I believe the finaly built a test case that could trigger the error within a couple of days knowing exactly where it was and having some idea what caused it. The key to remember with memory is that DRAM is not the nice little digital gate we like to think it is. It's a big ugly analog mess and has all sorts of boundry condititions and idea digital system wouldn't have. -- Brooks -- Any statement of the form "X is the one, true Y" is FALSE. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message