From owner-freebsd-hackers@FreeBSD.ORG Mon Oct 21 22:48:17 2013 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id B59FDAF8 for ; Mon, 21 Oct 2013 22:48:17 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 8D93B2C6E for ; Mon, 21 Oct 2013 22:48:17 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 8F47DB94A; Mon, 21 Oct 2013 18:48:16 -0400 (EDT) From: John Baldwin To: rank1seeker@gmail.com Subject: Re: UFS related panic (daily <-> find) Date: Mon, 21 Oct 2013 18:47:00 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20130906; KDE/4.5.5; amd64; ; ) References: <20130719.174511.786.3@DOMY-PC> <20131021.133036.045.1@DOMY-PC> In-Reply-To: <20131021.133036.045.1@DOMY-PC> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201310211847.00875.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Mon, 21 Oct 2013 18:48:16 -0400 (EDT) Cc: Adam Vande More , hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Oct 2013 22:48:17 -0000 On Monday, October 21, 2013 9:30:36 am rank1seeker@gmail.com wrote: > > > Same drill as before, see what instruction this is. Actually, this > looks > > > to > > > be in the same location as your last panic, so a NULL pointer is 0x1 > > > instead > > > of 0x0 again. In my experience, this would still indicate failing RAM > to > > > me, > > > memtest86+ notwithstanding (memtest86+ is single threaded AFAIK, so it > may > > > not stress the hardware quite the same, e.g. if the error is heat > related, > > > etc.). > > > > > > memtest* cannot conclusively diagnose a dimm as good. Usually the only > > practical solution is to swap modules with known good ones. > > > > > 0xc082c552 : cmp %ecx,0x24(%eax) > PREVIOUS we talked about > 0xc083bd42 : cmp %ecx,0x24(%eax) > CURRENT ONE Different instruction pointer doesn't matter. The error is in the memory that %eax is loaded from in a prior instruction. > Now, after all this I recompiled kernel and world and there was no crash. > How can it be, when it is far more stresing dan daily's 'find'?! Because it might have shuffled where the bad memory cell now lives by having the kernel text + data laid out differently in RAM? > I see addresses 0xc08* and 0xc06* appearing each time, so as I have four > DDR1 (400) modules, each of 256 MB = 1GB, can those addresses aid me in > targeting failing module? The virtual addresses (0xc*) do not matter. They are not physical addresses which are what you would need. > If I can't use memtest86+-4.20, to determine failing module, then what is a > use of it at all? > Test RAM speed perhaps? Swap out your dimms. That's really the only test, esp. if you have a reproducible crash. -- John Baldwin