Date: Thu, 6 May 2010 08:26:21 -0700 (PDT) From: Nate Eldredge <nate@thatsmathematics.com> To: Andrew Duane <aduane@juniper.net> Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>, Atom Smasher <atom@smasher.org> Subject: RE: bad RAM? prove it with a crash dump? Message-ID: <Pine.GSO.4.64.1005060821190.5432@zeno.ucsd.edu> In-Reply-To: <AC6674AB7BC78549BB231821ABF7A9AE903D986659@EMBX01-WF.jnpr.net> References: <1005062053260.2629@smasher> <4BE2A3A1.5030805@acm.poly.edu> <1005062327340.2629@smasher> <AC6674AB7BC78549BB231821ABF7A9AE903D986659@EMBX01-WF.jnpr.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 6 May 2010, Andrew Duane wrote: > It is also useful to make sure that the garbage itself is different. As > mentioned before, a single bit error in an otherwise valid value, or > maybe a missing/scrambled byte, these are good indications of memory > problems. If random places are often overwritten with something else, > that could just be another piece of misbehaving code that is writing > someplace it shouldn't. I've often found code that writes some buffer > into e.g. a piece of memory it no longer owns that looks like memory > corruption until you realize the garbage is always something specific > like a vnode structure. There are trickier things too. I once had a machine with bad cache memory where once in a while you would get a cache line that had come from somewhere else in memory. This was particularly vexing when it happened to an I/O buffer, and I wound up with a large zip file that had 32 bytes of libc.so somewhere in the middle... :-( And of course, swapping out the RAM wouldn't have fixed it. -- Nate Eldredge nate@thatsmathematics.com
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.GSO.4.64.1005060821190.5432>