From owner-freebsd-questions@FreeBSD.ORG Mon Apr 27 00:02:33 2015 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 64CCFA75 for ; Mon, 27 Apr 2015 00:02:33 +0000 (UTC) Received: from mail-ig0-x230.google.com (mail-ig0-x230.google.com [IPv6:2607:f8b0:4001:c05::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 2E3941228 for ; Mon, 27 Apr 2015 00:02:33 +0000 (UTC) Received: by igbpi8 with SMTP id pi8so59271155igb.0 for ; Sun, 26 Apr 2015 17:02:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=cO2udFYY6isiTc5cMVnIYwkvsOZ0XdYkSIMNPnU4qNc=; b=It3f/tJ/ZhMIxVrOYO4Ih793BEaEJZtPwfGDHb9Qi/QcV1T0Y8D4f7S/3vQ7L3fsVT ljvVsoaschD9Ah+8JIwi/uuBoqWEpzMRf9aNJ3noxT7AxlT6n4rPSNcAD+nyjpJck/S/ 2FPAso/uctwFmJ2sewkG4uURwk60Qv/y+WRtKz+1FaQBwiWEeRfa98SyUqpWIBx5j9PN tgIiVv3Im19lpJJMWSplNo20Jtp9CTpslO8ZCO5sMjgI7BzCdJiI8BLpbCvQejC0VX4A WN1hmE6tG2lenNSNfT+puAjiMpWNH8I+2nSL1PJbThOsBbmGQvDWIbs9IAmmqELg0E0e WX8Q== MIME-Version: 1.0 X-Received: by 10.43.146.67 with SMTP id jx3mr9423749icc.63.1430092952591; Sun, 26 Apr 2015 17:02:32 -0700 (PDT) Received: by 10.64.24.141 with HTTP; Sun, 26 Apr 2015 17:02:32 -0700 (PDT) In-Reply-To: <5793.69.209.235.143.1430086547.squirrel@cosmo.uchicago.edu> References: <5480.69.209.235.143.1430078703.squirrel@cosmo.uchicago.edu> <5793.69.209.235.143.1430086547.squirrel@cosmo.uchicago.edu> Date: Sun, 26 Apr 2015 17:02:32 -0700 Message-ID: Subject: Re: Debugging bad memory problems From: Mehmet Erol Sanliturk To: galtsev@kicp.uchicago.edu Cc: =?UTF-8?Q?Fernando_Apestegu=C3=83=C2=ADa?= , User Questions Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Apr 2015 00:02:33 -0000 On Sun, Apr 26, 2015 at 3:15 PM, Valeri Galtsev wrote: > > On Sun, April 26, 2015 4:05 pm, Fernando Apestegu=C3=83=C2=ADa wrote: > > On Sun, Apr 26, 2015 at 10:05 PM, Valeri Galtsev > > wrote: > >> > >> On Sun, April 26, 2015 12:11 pm, Fernando Apestegu=C3=83=C2=ADa wrote: > >>> Hi, > >>> > >>> I suspect my old and beloved AMD64 laptop is suffering from bad memor= y > >>> problems: I get random crashes of well tested programs like sh, which= , > >>> etc even when I executed some of them from /rescue. > >> > >> If RAM is a suspect the first thing I would do is re-seat memory > >> modules. > >> Open the box. (Observe static precautions!) Remove memory modules. > >> Install > >> them again. > >> > >> Do memtest86 (by booting into memtest86, you can have that in your boo= t > >> options, or you can boot off external media as others suggested). > >> > >> If you still have problems: try to run with one memory module instead = of > >> two. At some point when they went to higher RAM speeds memory bus > >> amplifier became more fragile (some chips, some manufacturers, as not = it > >> is part of CPU, this may be true only about some of the CPU models). Y= ou > >> sometimes can slightly fry it if you merely leave laptop running on > >> battery, letting battery run down and laptop powering off due to that. > >> With some of chips this may lead to slightly frying it - memory > >> controller > >> portion of it, address bus amplifier in particular. Bus amplifier > >> becomes > >> slightly lower frequency, which results in poorer handling capacitive > >> load > >> (which is larger if you have more RAM), and it is marginally OK, > >> occasionally having address errors. Going to one module may resolve > >> this. > >> You will know if this is likely the case if memtest86 is successful wi= th > >> each of single RAM modules, but fails (in random places, often not > >> reproducible) with both. > >> > >> Good luck! > > > > I booted from a memtest CD-ROM. It passed a couple of tests fine and > > then it rebooted while doing a "bit fade" test at around 93%. Removing > > the modules is tricky since this laptop has screws all around in dark > > corners (even removing the battery needs a screw driver). I will try > > to limit physical memory with hw.physmem and see if it makes any > > difference. > > The last will not help against what I mentioned, as capacitive load on > memory address bus is defined by what is physically attached to it. > > One usually runs memtest86 for 24 hours at lest. One loop will catch > "solid defects" like adjacent line on the board connected (while they > shouldn't). Memory related failures to the contrary are often > intermittent. In worst case I've seen, they only manifested under intense > load of the box (whereas memtest86 is equivalent to almost zero load). > > Good luck! > > Valeri > > ++++++++++++++++++++++++++++++++++++++++ > Valeri Galtsev > Sr System Administrator > Department of Astronomy and Astrophysics > Kavli Institute for Cosmological Physics > University of Chicago > Phone: 773-702-4247 > ++++++++++++++++++++++++++++++++++++++++ > Failure may be in memory management circuits instead of memory chips . To test this situation , the existing memories may be replaced by memory chips that they known to work ( if it can be done ) . Thank you very much . Mehmet Ero Sanliturk