From owner-freebsd-hackers@FreeBSD.ORG Thu May 6 15:26:23 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 972B9106564A for ; Thu, 6 May 2010 15:26:23 +0000 (UTC) (envelope-from nate@thatsmathematics.com) Received: from euclid.ucsd.edu (euclid.ucsd.edu [132.239.145.52]) by mx1.freebsd.org (Postfix) with ESMTP id 77E9A8FC12 for ; Thu, 6 May 2010 15:26:23 +0000 (UTC) Received: from zeno.ucsd.edu (zeno.ucsd.edu [132.239.145.22]) by euclid.ucsd.edu (8.11.7p3+Sun/8.11.7) with ESMTP id o46FQMY21729; Thu, 6 May 2010 08:26:22 -0700 (PDT) Received: from localhost (neldredg@localhost) by zeno.ucsd.edu (8.11.7p3+Sun/8.11.7) with ESMTP id o46FQM902364; Thu, 6 May 2010 08:26:22 -0700 (PDT) X-Authentication-Warning: zeno.ucsd.edu: neldredg owned process doing -bs Date: Thu, 6 May 2010 08:26:21 -0700 (PDT) From: Nate Eldredge X-X-Sender: neldredg@zeno.ucsd.edu To: Andrew Duane In-Reply-To: Message-ID: References: <1005062053260.2629@smasher> <4BE2A3A1.5030805@acm.poly.edu> <1005062327340.2629@smasher> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: "freebsd-hackers@freebsd.org" , Atom Smasher Subject: RE: bad RAM? prove it with a crash dump? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 May 2010 15:26:23 -0000 On Thu, 6 May 2010, Andrew Duane wrote: > It is also useful to make sure that the garbage itself is different. As > mentioned before, a single bit error in an otherwise valid value, or > maybe a missing/scrambled byte, these are good indications of memory > problems. If random places are often overwritten with something else, > that could just be another piece of misbehaving code that is writing > someplace it shouldn't. I've often found code that writes some buffer > into e.g. a piece of memory it no longer owns that looks like memory > corruption until you realize the garbage is always something specific > like a vnode structure. There are trickier things too. I once had a machine with bad cache memory where once in a while you would get a cache line that had come from somewhere else in memory. This was particularly vexing when it happened to an I/O buffer, and I wound up with a large zip file that had 32 bytes of libc.so somewhere in the middle... :-( And of course, swapping out the RAM wouldn't have fixed it. -- Nate Eldredge nate@thatsmathematics.com