From owner-freebsd-current@FreeBSD.ORG Tue Jul 1 15:57:32 2014 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C3CC99C3; Tue, 1 Jul 2014 15:57:32 +0000 (UTC) Received: from smtp.digiware.nl (smtp.digiware.nl [31.223.170.169]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 5389F20E6; Tue, 1 Jul 2014 15:57:31 +0000 (UTC) Received: from rack1.digiware.nl (unknown [127.0.0.1]) by smtp.digiware.nl (Postfix) with ESMTP id CD1681534C0; Tue, 1 Jul 2014 17:57:29 +0200 (CEST) X-Virus-Scanned: amavisd-new at digiware.nl Received: from smtp.digiware.nl ([127.0.0.1]) by rack1.digiware.nl (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id v66sQyIVIpVC; Tue, 1 Jul 2014 17:57:27 +0200 (CEST) Received: from [IPv6:2001:4cb8:3:1:49f0:83be:af19:5ffb] (unknown [IPv6:2001:4cb8:3:1:49f0:83be:af19:5ffb]) by smtp.digiware.nl (Postfix) with ESMTP id 44818153448; Tue, 1 Jul 2014 17:57:27 +0200 (CEST) Message-ID: <53B2DA66.9010506@digiware.nl> Date: Tue, 01 Jul 2014 17:57:26 +0200 From: Willem Jan Withagen Organization: Digiware Management b.v. User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: "O. Hartmann" Subject: Re: [CURRENT]: weird memory/linker problem? References: <20140622165639.17a1ba1e.ohartman@zedat.fu-berlin.de> <20140623163115.03bdd675.ohartman@zedat.fu-berlin.de> <20140701150755.548ed6b9.ohartman@zedat.fu-berlin.de> <53B2D262.2040502@digiware.nl> <20140701173335.394414c3.ohartman@zedat.fu-berlin.de> In-Reply-To: <20140701173335.394414c3.ohartman@zedat.fu-berlin.de> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: "Rang, Anton" , Adrian Chadd , FreeBSD CURRENT , Dimitry Andric X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Jul 2014 15:57:32 -0000 On 2014-07-01 17:33, O. Hartmann wrote: > Am Tue, 01 Jul 2014 17:23:14 +0200 > Willem Jan Withagen schrieb: > >> On 2014-07-01 16:48, Rang, Anton wrote: >>> DOT => DOD >>> >>> 444F54 => 444F44 >>> >>> That's a single-bit flip. Bad memory, perhaps? >> >> Very likely, especially if the system does not have ECC.... >> It just happens on rare occasions that a alpha particle, power cycle, or >> any things else disruptive damages a memory cell. And it could be that >> it requires a special pattern of accesses to actually exhibit the error. >> >> In the past (199x's) 'make buildworld' used to be a rather good memory >> tester. But nowadays look at >> http://www.memtest.org/ >> >> This tool has found all of the bad memory in all the systems I used and >> or build for others... >> Note that it might take a few runs and some more heat to actually >> trigger the faulty cell, but memtest86 will usually find it. >> >> Note that on big systems with lots of memory it can take a loooooong >> time to run just one full testset to completion. >> >> --WjW > > I already testet via memtest86+ (had to download the linux image, the port on FreeBSD is > broken on CURRENT). It didn't find anything strange so far. > > I will do another test. > > I realised, that on that that specific box, the chipset temperature is 81 Grad Celius. > The chipset is a Eaglelake P45 - in which the memory controller resides on that old > platform. dmidecode gives: > > Manufacturer: ASUSTeK Computer INC. > Product Name: P5Q-WS > Version: Rev 1.xx Hi Oliver, I've build several (5+) systems with these boards (from memory they date around 2009??). And if I recall right, one of them is still functional. The first one broke down in a couple of weeks, and the other did not survive time either. The auxiliary chips on that board do run hot, but I never realized this hot. Is 81C is the CPU temp from sysctl, or did you measure the cooling body on the motherboard. In the later case it is just too hot, probably. But even if it is the temp on the chip itself, I've rrarely seen temps go up this high. You can need to run the memtest86 for more than 6-10 complete runs with all the tests. If the memtests do not reveal anything broken, then you get into even more wizardry stuff, like bad power etc... Especially since it only occurs on occasion, it is going to be a nightmare to find the root cause of this. Other than replacing hardware piece by piece, which won't be easy given the age of the board and parts. You could go into the bios, and try to config ram access at a slower speed and see if the problem goes away. Then it could be that you are running an the edge of the spec with regards to ram timing. But like I said, it is all lots of funky details that can interact in strange and unexpected ways. --WjW