From owner-freebsd-hackers@FreeBSD.ORG  Wed Oct 23 09:08:42 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTP id C40B1BD1;
 Wed, 23 Oct 2013 09:08:42 +0000 (UTC)
 (envelope-from rank1seeker@gmail.com)
Received: from mail-ea0-x230.google.com (mail-ea0-x230.google.com
 [IPv6:2a00:1450:4013:c01::230])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 39C3D2E6B;
 Wed, 23 Oct 2013 09:08:42 +0000 (UTC)
Received: by mail-ea0-f176.google.com with SMTP id q16so253590ead.7
 for <multiple recipients>; Wed, 23 Oct 2013 02:08:40 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=message-id:from:to:cc:subject:date;
 bh=lpWy4XxVUQR+7Y5whbztLr/FMbQ5jnpbsd+VnelIV5g=;
 b=RNSQROSTE5lFwvsewewVxP4qdCzImsxM7XUITHrDvGnlF/MdXG5aQtNvVrjzSRrlyx
 AQGhmkqxMWPbJXVATLUo2mRBesAiZ7k7AmpWN9bzwMBQZvxC2xrsfElNVN2YNrgO+ulM
 WEHZBI1cLF/lZg4Mzv+RsUhNIgGSvX1BeEi7xQb8vxf4uH+CdjJyMXVvVIelBE4SEA+s
 bC31H8a7NIzLEBe9RTdBqElobtEKgNz4QfiC0LuJrVioMsFhAi6gYqdfDKJiYbdfkZFO
 JEBosi3oz6sSuF+62kRgzZECjqkIW4jo+1JtM10ClcOqW/Yrfov2/QdO+Nit+ld9zrrf
 Y5yg==
X-Received: by 10.14.225.199 with SMTP id z47mr711469eep.24.1382519320673;
 Wed, 23 Oct 2013 02:08:40 -0700 (PDT)
Received: from DOMYPC ([82.193.208.225])
 by mx.google.com with ESMTPSA id a1sm67619500eem.1.2013.10.23.02.08.38
 for <multiple recipients>
 (version=TLSv1 cipher=RC4-SHA bits=128/128);
 Wed, 23 Oct 2013 02:08:39 -0700 (PDT)
Message-ID: <20131023.090839.469.1@DOMY-PC>
From: rank1seeker@gmail.com
To: "John Baldwin" <jhb@freebsd.org>
Subject: Re: UFS related panic (daily <-> find)
Date: Wed, 23 Oct 2013 11:08:39 +0200
X-Mailer: POP Peeper (3.8.1.0)
Cc: Adam Vande More <amvandemore@gmail.com>, hackers@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 23 Oct 2013 09:08:42 -0000

> > > > Same drill as before, see what instruction this is.  Actually, this 
> > looks
> > > > to
> > > > be in the same location as your last panic, so a NULL pointer is 0x1
> > > > instead
> > > > of 0x0 again.  In my experience, this would still indicate failing 
RAM 
> > to
> > > > me,
> > > > memtest86+ notwithstanding (memtest86+ is single threaded AFAIK, so 
it 
> > may
> > > > not stress the hardware quite the same, e.g. if the error is heat 
> > related,
> > > > etc.).
> > > 
> > > 
> > > memtest* cannot conclusively diagnose a dimm as good.  Usually the 
only
> > > practical solution is to swap modules with known good ones.
> > > 
> > 
> > 
> > 0xc082c552 <inodedep_find+13>:  cmp    %ecx,0x24(%eax)
> >     PREVIOUS we talked about
> > 0xc083bd42 <inodedep_find+13>:  cmp    %ecx,0x24(%eax)
> >     CURRENT ONE
> 
> Different instruction pointer doesn't matter.  The error is in the memory
> that %eax is loaded from in a prior instruction.
> 
> > Now, after all this I recompiled kernel and world and there was no 
crash.
> > How can it be, when it is far more stresing dan daily's 'find'?!
> 
> Because it might have shuffled where the bad memory cell now lives by 
having
> the kernel text + data laid out differently in RAM?
> 
> > I see addresses 0xc08* and 0xc06* appearing each time, so as I have 
four 
> > DDR1 (400) modules, each of 256 MB = 1GB, can those addresses aid me in 
> > targeting failing module?
> 
> The virtual addresses (0xc*) do not matter.  They are not physical 
addresses
> which are what you would need.
> 
> > If I can't use memtest86+-4.20, to determine failing module, then what 
is a 
> > use of it at all?
> > Test RAM speed perhaps?
> 
> Swap out your dimms.  That's really the only test, esp. if you have a
> reproducible crash.


That is exactly what I did. I've halfed dimms. Depending on a result, I'll 
half them again in one of directions.
Unfortunately, crash isn't reproducible, so I'll just hang with it for a 
month.


Domagoj