Date: Mon, 21 Oct 2013 15:30:36 +0200 From: rank1seeker@gmail.com To: "John Baldwin" <jhb@freebsd.org> Cc: Adam Vande More <amvandemore@gmail.com>, hackers@freebsd.org Subject: Re: UFS related panic (daily <-> find) Message-ID: <20131021.133036.045.1@DOMY-PC> In-Reply-To: <CA%2BtpaK23Wo8Nemi-xUy3-BZSUdeWkpWQh_o6Ws=mxi6jrbubvw@mail.gmail.com> References: <20130719.174511.786.3@DOMY-PC> <201310071212.05281.jhb@freebsd.org> <20131016.104912.479.1@DOMY-PC> <201310161650.52354.jhb@freebsd.org> <CA%2BtpaK23Wo8Nemi-xUy3-BZSUdeWkpWQh_o6Ws=mxi6jrbubvw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
> > Same drill as before, see what instruction this is.  Actually, this 
looks
> > to
> > be in the same location as your last panic, so a NULL pointer is 0x1
> > instead
> > of 0x0 again.  In my experience, this would still indicate failing RAM 
to
> > me,
> > memtest86+ notwithstanding (memtest86+ is single threaded AFAIK, so it 
may
> > not stress the hardware quite the same, e.g. if the error is heat 
related,
> > etc.).
> 
> 
> memtest* cannot conclusively diagnose a dimm as good.  Usually the only
> practical solution is to swap modules with known good ones.
> 
0xc082c552 <inodedep_find+13>:  cmp    %ecx,0x24(%eax)
    PREVIOUS we talked about
0xc083bd42 <inodedep_find+13>:  cmp    %ecx,0x24(%eax)
    CURRENT ONE
Lattest (few days ago):
--
#7  0xc06d5f11 in cache_lookup_times (dvp=0xc921b470, vpp=0xe7c24ae8, 
cnp=0xe7c24afc, tsp=0x0, ticksp=0x0) at /usr/src/sys/kern/vfs_cache.c:548
548                     numchecks++;
(kgdb) p ncp
$1 = (struct namecache *) 0x1
(kgdb) p *ncp
Cannot access memory at address 0x1
--
Now, after all this I recompiled kernel and world and there was no crash.
How can it be, when it is far more stresing dan daily's 'find'?!
Why does exactly daily's 'find' AND not every time, but each 10th or 20th 
time triggers this?
I see addresses 0xc08* and 0xc06* appearing each time, so as I have four 
DDR1 (400) modules, each of 256 MB = 1GB, can those addresses aid me in 
targeting failing module?
If I can't use memtest86+-4.20, to determine failing module, then what is a 
use of it at all?
Test RAM speed perhaps?
I might try to pull out module 2 and 4 (dual channel) and wait for a month 
to see if find will crash machine.
Then another month ... Well you get a point.
There must be other solution.
Thanks for your help.
Domagoj 
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20131021.133036.045.1>
