Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 26 Apr 2015 17:02:32 -0700
From:      Mehmet Erol Sanliturk <m.e.sanliturk@gmail.com>
To:        galtsev@kicp.uchicago.edu
Cc:        =?UTF-8?Q?Fernando_Apestegu=C3=83=C2=ADa?= <fernando.apesteguia@gmail.com>,  User Questions <freebsd-questions@freebsd.org>
Subject:   Re: Debugging bad memory problems
Message-ID:  <CAOgwaMs8ePhmD9%2BX6C87atHu-RxO5Q0%2Bce%2BRLMfhMDPfcmpxGQ@mail.gmail.com>
In-Reply-To: <5793.69.209.235.143.1430086547.squirrel@cosmo.uchicago.edu>
References:  <CAGwOe2Y%2BRuT7MuCTBq_swn-Ny-BS-WH1J=bZTbE9L4tuv8LmCA@mail.gmail.com> <5480.69.209.235.143.1430078703.squirrel@cosmo.uchicago.edu> <CAGwOe2a7UZxSsaV4T2pcU0K1MA-OH1=123pb%2BsM=pTgSFEDLFg@mail.gmail.com> <5793.69.209.235.143.1430086547.squirrel@cosmo.uchicago.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Apr 26, 2015 at 3:15 PM, Valeri Galtsev <galtsev@kicp.uchicago.edu>
wrote:

>
> On Sun, April 26, 2015 4:05 pm, Fernando Apestegu=C3=83=C2=ADa wrote:
> > On Sun, Apr 26, 2015 at 10:05 PM, Valeri Galtsev
> > <galtsev@kicp.uchicago.edu> wrote:
> >>
> >> On Sun, April 26, 2015 12:11 pm, Fernando Apestegu=C3=83=C2=ADa wrote:
> >>> Hi,
> >>>
> >>> I suspect my old and beloved AMD64 laptop is suffering from bad memor=
y
> >>> problems: I get random crashes of well tested programs like sh, which=
,
> >>> etc even when I executed some of them from /rescue.
> >>
> >> If RAM is a suspect the first thing I would do is re-seat memory
> >> modules.
> >> Open the box. (Observe static precautions!) Remove memory modules.
> >> Install
> >> them again.
> >>
> >> Do memtest86 (by booting into memtest86, you can have that in your boo=
t
> >> options, or you can boot off external media as others suggested).
> >>
> >> If you still have problems: try to run with one memory module instead =
of
> >> two. At some point when they went to higher RAM speeds memory bus
> >> amplifier became more fragile (some chips, some manufacturers, as not =
it
> >> is part of CPU, this may be true only about some of the CPU models). Y=
ou
> >> sometimes can slightly fry it if you merely leave laptop running on
> >> battery, letting battery run down and laptop powering off due to that.
> >> With some of chips this may lead to slightly frying it - memory
> >> controller
> >> portion of it, address bus amplifier in particular. Bus amplifier
> >> becomes
> >> slightly lower frequency, which results in poorer handling capacitive
> >> load
> >> (which is larger if you have more RAM), and it is marginally OK,
> >> occasionally having address errors. Going to one module may resolve
> >> this.
> >> You will know if this is likely the case if memtest86 is successful wi=
th
> >> each of single RAM modules, but fails (in random places, often not
> >> reproducible) with both.
> >>
> >> Good luck!
> >
> > I booted from a memtest CD-ROM. It passed a couple of tests fine and
> > then it rebooted while doing a "bit fade" test at around 93%. Removing
> > the modules is tricky since this laptop has screws all around in dark
> > corners (even removing the battery needs a screw driver). I will try
> > to limit physical memory with hw.physmem and see if it makes any
> > difference.
>
> The last will not help against what I mentioned, as capacitive load on
> memory address bus is defined by what is physically attached to it.
>
> One usually runs memtest86 for 24 hours at lest. One loop will catch
> "solid defects" like adjacent line on the board connected (while they
> shouldn't). Memory related failures to the contrary are often
> intermittent. In worst case I've seen, they only manifested under intense
> load of the box (whereas memtest86 is equivalent to almost zero load).
>
> Good luck!
>
> Valeri
>
> ++++++++++++++++++++++++++++++++++++++++
> Valeri Galtsev
> Sr System Administrator
> Department of Astronomy and Astrophysics
> Kavli Institute for Cosmological Physics
> University of Chicago
> Phone: 773-702-4247
> ++++++++++++++++++++++++++++++++++++++++
>



Failure may be in memory management circuits instead of memory chips .
To test this situation , the existing memories may be replaced by memory
chips that they known to work  ( if it can be done ) .


Thank you very much .


Mehmet Ero Sanliturk



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOgwaMs8ePhmD9%2BX6C87atHu-RxO5Q0%2Bce%2BRLMfhMDPfcmpxGQ>