Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 6 May 2010 08:26:21 -0700 (PDT)
From:      Nate Eldredge <nate@thatsmathematics.com>
To:        Andrew Duane <aduane@juniper.net>
Cc:        "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>, Atom Smasher <atom@smasher.org>
Subject:   RE: bad RAM? prove it with a crash dump?
Message-ID:  <Pine.GSO.4.64.1005060821190.5432@zeno.ucsd.edu>
In-Reply-To: <AC6674AB7BC78549BB231821ABF7A9AE903D986659@EMBX01-WF.jnpr.net>
References:  <1005062053260.2629@smasher> <4BE2A3A1.5030805@acm.poly.edu> <1005062327340.2629@smasher> <AC6674AB7BC78549BB231821ABF7A9AE903D986659@EMBX01-WF.jnpr.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 6 May 2010, Andrew Duane wrote:

> It is also useful to make sure that the garbage itself is different. As 
> mentioned before, a single bit error in an otherwise valid value, or 
> maybe a missing/scrambled byte, these are good indications of memory 
> problems. If random places are often overwritten with something else, 
> that could just be another piece of misbehaving code that is writing 
> someplace it shouldn't. I've often found code that writes some buffer 
> into e.g. a piece of memory it no longer owns that looks like memory 
> corruption until you realize the garbage is always something specific 
> like a vnode structure.

There are trickier things too.  I once had a machine with bad cache memory 
where once in a while you would get a cache line that had come from 
somewhere else in memory.  This was particularly vexing when it happened 
to an I/O buffer, and I wound up with a large zip file that had 32 bytes 
of libc.so somewhere in the middle... :-(

And of course, swapping out the RAM wouldn't have fixed it.

-- 

Nate Eldredge
nate@thatsmathematics.com



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.GSO.4.64.1005060821190.5432>