From owner-freebsd-current@FreeBSD.ORG Wed Nov 30 10:29:43 2011 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C4EE5106566B for ; Wed, 30 Nov 2011 10:29:43 +0000 (UTC) (envelope-from se@freebsd.org) Received: from nm3-vm3.bullet.mail.ne1.yahoo.com (nm3-vm3.bullet.mail.ne1.yahoo.com [98.138.91.133]) by mx1.freebsd.org (Postfix) with SMTP id 87D618FC0A for ; Wed, 30 Nov 2011 10:29:43 +0000 (UTC) Received: from [98.138.90.48] by nm3.bullet.mail.ne1.yahoo.com with NNFMP; 30 Nov 2011 10:16:26 -0000 Received: from [98.138.226.59] by tm1.bullet.mail.ne1.yahoo.com with NNFMP; 30 Nov 2011 10:16:26 -0000 Received: from [127.0.0.1] by smtp210.mail.ne1.yahoo.com with NNFMP; 30 Nov 2011 10:16:26 -0000 X-Yahoo-Newman-Id: 91462.79141.bm@smtp210.mail.ne1.yahoo.com X-Yahoo-Newman-Property: ymail-3 X-YMail-OSG: 7yaM4tkVM1moWLEnbclsBAz8O5cD6NDONLuftxRZ.jQS.G8 Ub4UOmOqzM1epSkGZ8EhoE0mXmqb8CPmSfKkRLxH4tbbTWLu0mahI4BMwMNh xQOJTYbiNF8jUq3C1iaty5tqVzq0F8I1pgJvxlWtvViZ0NEcl5FZ6huG57fj jbsOhvkAMNxzOUwmEqZHn48M7n3MXVAcfbOEho4zttK60k7MnRptZ94yLev0 cB4r0R6EnK5zc4.dPv7WatTtYCrjoWyWIhMLRXV55fRPw0aqfuhHB4fl8v9o h0NddxsW47XopUCXo05gN24CE0sTE7QYpPepyILh5rz8Gqwg5uV2Q6JSevLg hO5bFnIb16Crjh6eRa5SMgXFEe3dOmfqyUmmEQl2moz14h14dkqkYJkRDYP8 mITAVS7fyPl9cog-- X-Yahoo-SMTP: iDf2N9.swBDAhYEh7VHfpgq0lnq. Received: from [192.168.119.20] (se@81.173.147.13 with plain) by smtp210.mail.ne1.yahoo.com with SMTP; 30 Nov 2011 02:16:25 -0800 PST Message-ID: <4ED60279.10901@freebsd.org> Date: Wed, 30 Nov 2011 11:16:25 +0100 From: Stefan Esser User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:8.0) Gecko/20111105 Thunderbird/8.0 MIME-Version: 1.0 To: John Baldwin References: <4EBB885E.9060908@freebsd.org> <201111161116.24855.jhb@freebsd.org> <4EC4CCFF.8040704@freebsd.org> <201111171133.34108.jhb@freebsd.org> In-Reply-To: <201111171133.34108.jhb@freebsd.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: Attilio Rao , freebsd-current@freebsd.org Subject: [SOLVED]: HW defect (was: Re: [amd64] Reproducible cold boot failure (reboot succeeds) in -CURRENT) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 30 Nov 2011 10:29:43 -0000 Am 17.11.2011 17:33, schrieb John Baldwin: > On Thursday, November 17, 2011 3:59:43 am Stefan Esser wrote: >> Am 16.11.2011 17:16, schrieb John Baldwin: [...] >>> That isn't unusual. Those are the addresses of the metadata provided by the >>> loader, not the base address of the kernel or zfs.ko object themselves. The >>> unexpected relocation type is interesting however. That value in hex is >>> 0x400000b. 0xb is the R_X86_64_32S relocation type which is normal for the >>> kernel. I think you just have a single-bit memory error due to a failing >>> DIMM. >> >> Thanks for the information about the load address semantics. The other >> unexpected relocation type I observed was 268435457 == 0x10000001, which >> also hints at a single bit error. But today the system failed with a >> different error: >> >> ath0: ... >> ioapic0: routing interrupt 18 to ... >> panic: vm_page_insert: page already inserted >> >> This could of course also be caused by a single bit error ... > > Yes, very likely. > >> Hmmm, perhaps there is a problem with components at room temperature >> and the system is still significantly warmer after 3 hours? > > Yes, I strongly suspect it is a thermal effect that the RAM "works" once it > is warmed up. If you have data you care about on the machine, I would just > go ahead and replace the RAM now before waiting for the RAM's failure to > become worse. Thanks a lot, John! I should have checked the hardware before, but since the system was perfectly stable, once it had been up and running, I had been suspecting an initialization bug instead of defective RAM. In fact, one of the 4GB DIMMs in the system returns bogus data (0x10000000 or 0x04000000 instead of 0) for some 40 to 50 seconds after power-on. Once warmed up, memtest86+ runs for days without a single extra data error (I wanted to have an estimate for the defect having led to damaged data in disk files). When I was still doing hardware work, I always had a freezer aerosol on my desk, which allowed me to quickly cool down a DUT by a few tens of degrees, but without such a tool I had to wait for the components to cool down over night between test. Best regards, STefan