Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 6 May 2010 09:10:16 -0400
From:      Andrew Duane <aduane@juniper.net>
To:        Atom Smasher <atom@smasher.org>, "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Subject:   RE: bad RAM? prove it with a crash dump?
Message-ID:  <AC6674AB7BC78549BB231821ABF7A9AE903D986659@EMBX01-WF.jnpr.net>
In-Reply-To: <1005062327340.2629@smasher>
References:  <1005062053260.2629@smasher> <4BE2A3A1.5030805@acm.poly.edu> <1005062327340.2629@smasher>

next in thread | previous in thread | raw e-mail | index | archive | help
owner-freebsd-hackers@freebsd.org wrote:
> On Thu, 6 May 2010, Boris Kochergin wrote:
>=20
>> My experience with bad memory is that if it causes the machine to
>> crash, it won't always happen while the machine is running the same
>> process (or kernel thread)--so look for it crashing in a wide
>> variety of places--and upon inspection of the core dump, a pointer
>> somewhere will be pointing to garbage.
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>=20
> so really i'd need to collect two or more crash dumps, and if they
> point to different addresses then i can reasonably say the RAM is bad?
>=20
> thanks...

It's not just that they point to different addresses, it is garbage in many=
 completely independent places. For example, pulling bad registers/return a=
ddresses off the stack, or garbage in random unrelated buffers/structures/p=
ointers. On the other hand, if you often have garbage in some structure's "=
foo" pointer, that indicates a problem (maybe locking) in how your code man=
ages setting that foo pointer. It's a subtle difference.

It is also useful to make sure that the garbage itself is different. As men=
tioned before, a single bit error in an otherwise valid value, or maybe a m=
issing/scrambled byte, these are good indications of memory problems. If ra=
ndom places are often overwritten with something else, that could just be a=
nother piece of misbehaving code that is writing someplace it shouldn't. I'=
ve often found code that writes some buffer into e.g. a piece of memory it =
no longer owns that looks like memory corruption until you realize the garb=
age is always something specific like a vnode structure.

/Andrew




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AC6674AB7BC78549BB231821ABF7A9AE903D986659>