Date: Thu, 6 May 2010 09:10:16 -0400 From: Andrew Duane <aduane@juniper.net> To: Atom Smasher <atom@smasher.org>, "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org> Subject: RE: bad RAM? prove it with a crash dump? Message-ID: <AC6674AB7BC78549BB231821ABF7A9AE903D986659@EMBX01-WF.jnpr.net> In-Reply-To: <1005062327340.2629@smasher> References: <1005062053260.2629@smasher> <4BE2A3A1.5030805@acm.poly.edu> <1005062327340.2629@smasher>
next in thread | previous in thread | raw e-mail | index | archive | help
owner-freebsd-hackers@freebsd.org wrote: > On Thu, 6 May 2010, Boris Kochergin wrote: >=20 >> My experience with bad memory is that if it causes the machine to >> crash, it won't always happen while the machine is running the same >> process (or kernel thread)--so look for it crashing in a wide >> variety of places--and upon inspection of the core dump, a pointer >> somewhere will be pointing to garbage. > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >=20 > so really i'd need to collect two or more crash dumps, and if they > point to different addresses then i can reasonably say the RAM is bad? >=20 > thanks... It's not just that they point to different addresses, it is garbage in many= completely independent places. For example, pulling bad registers/return a= ddresses off the stack, or garbage in random unrelated buffers/structures/p= ointers. On the other hand, if you often have garbage in some structure's "= foo" pointer, that indicates a problem (maybe locking) in how your code man= ages setting that foo pointer. It's a subtle difference. It is also useful to make sure that the garbage itself is different. As men= tioned before, a single bit error in an otherwise valid value, or maybe a m= issing/scrambled byte, these are good indications of memory problems. If ra= ndom places are often overwritten with something else, that could just be a= nother piece of misbehaving code that is writing someplace it shouldn't. I'= ve often found code that writes some buffer into e.g. a piece of memory it = no longer owns that looks like memory corruption until you realize the garb= age is always something specific like a vnode structure. /Andrew
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AC6674AB7BC78549BB231821ABF7A9AE903D986659>