From owner-freebsd-hackers@FreeBSD.ORG Wed Oct 23 09:08:42 2013 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id C40B1BD1; Wed, 23 Oct 2013 09:08:42 +0000 (UTC) (envelope-from rank1seeker@gmail.com) Received: from mail-ea0-x230.google.com (mail-ea0-x230.google.com [IPv6:2a00:1450:4013:c01::230]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 39C3D2E6B; Wed, 23 Oct 2013 09:08:42 +0000 (UTC) Received: by mail-ea0-f176.google.com with SMTP id q16so253590ead.7 for ; Wed, 23 Oct 2013 02:08:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:from:to:cc:subject:date; bh=lpWy4XxVUQR+7Y5whbztLr/FMbQ5jnpbsd+VnelIV5g=; b=RNSQROSTE5lFwvsewewVxP4qdCzImsxM7XUITHrDvGnlF/MdXG5aQtNvVrjzSRrlyx AQGhmkqxMWPbJXVATLUo2mRBesAiZ7k7AmpWN9bzwMBQZvxC2xrsfElNVN2YNrgO+ulM WEHZBI1cLF/lZg4Mzv+RsUhNIgGSvX1BeEi7xQb8vxf4uH+CdjJyMXVvVIelBE4SEA+s bC31H8a7NIzLEBe9RTdBqElobtEKgNz4QfiC0LuJrVioMsFhAi6gYqdfDKJiYbdfkZFO JEBosi3oz6sSuF+62kRgzZECjqkIW4jo+1JtM10ClcOqW/Yrfov2/QdO+Nit+ld9zrrf Y5yg== X-Received: by 10.14.225.199 with SMTP id z47mr711469eep.24.1382519320673; Wed, 23 Oct 2013 02:08:40 -0700 (PDT) Received: from DOMYPC ([82.193.208.225]) by mx.google.com with ESMTPSA id a1sm67619500eem.1.2013.10.23.02.08.38 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Wed, 23 Oct 2013 02:08:39 -0700 (PDT) Message-ID: <20131023.090839.469.1@DOMY-PC> From: rank1seeker@gmail.com To: "John Baldwin" Subject: Re: UFS related panic (daily <-> find) Date: Wed, 23 Oct 2013 11:08:39 +0200 X-Mailer: POP Peeper (3.8.1.0) Cc: Adam Vande More , hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Oct 2013 09:08:42 -0000 > > > > Same drill as before, see what instruction this is. Actually, this > > looks > > > > to > > > > be in the same location as your last panic, so a NULL pointer is 0x1 > > > > instead > > > > of 0x0 again. In my experience, this would still indicate failing RAM > > to > > > > me, > > > > memtest86+ notwithstanding (memtest86+ is single threaded AFAIK, so it > > may > > > > not stress the hardware quite the same, e.g. if the error is heat > > related, > > > > etc.). > > > > > > > > > memtest* cannot conclusively diagnose a dimm as good. Usually the only > > > practical solution is to swap modules with known good ones. > > > > > > > > > 0xc082c552 : cmp %ecx,0x24(%eax) > > PREVIOUS we talked about > > 0xc083bd42 : cmp %ecx,0x24(%eax) > > CURRENT ONE > > Different instruction pointer doesn't matter. The error is in the memory > that %eax is loaded from in a prior instruction. > > > Now, after all this I recompiled kernel and world and there was no crash. > > How can it be, when it is far more stresing dan daily's 'find'?! > > Because it might have shuffled where the bad memory cell now lives by having > the kernel text + data laid out differently in RAM? > > > I see addresses 0xc08* and 0xc06* appearing each time, so as I have four > > DDR1 (400) modules, each of 256 MB = 1GB, can those addresses aid me in > > targeting failing module? > > The virtual addresses (0xc*) do not matter. They are not physical addresses > which are what you would need. > > > If I can't use memtest86+-4.20, to determine failing module, then what is a > > use of it at all? > > Test RAM speed perhaps? > > Swap out your dimms. That's really the only test, esp. if you have a > reproducible crash. That is exactly what I did. I've halfed dimms. Depending on a result, I'll half them again in one of directions. Unfortunately, crash isn't reproducible, so I'll just hang with it for a month. Domagoj