Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 23 Oct 2013 11:08:39 +0200
From:      rank1seeker@gmail.com
To:        "John Baldwin" <jhb@freebsd.org>
Cc:        Adam Vande More <amvandemore@gmail.com>, hackers@freebsd.org
Subject:   Re: UFS related panic (daily <-> find)
Message-ID:  <20131023.090839.469.1@DOMY-PC>

next in thread | raw e-mail | index | archive | help
> > > > Same drill as before, see what instruction this is.  Actually, this 
> > looks
> > > > to
> > > > be in the same location as your last panic, so a NULL pointer is 0x1
> > > > instead
> > > > of 0x0 again.  In my experience, this would still indicate failing 
RAM 
> > to
> > > > me,
> > > > memtest86+ notwithstanding (memtest86+ is single threaded AFAIK, so 
it 
> > may
> > > > not stress the hardware quite the same, e.g. if the error is heat 
> > related,
> > > > etc.).
> > > 
> > > 
> > > memtest* cannot conclusively diagnose a dimm as good.  Usually the 
only
> > > practical solution is to swap modules with known good ones.
> > > 
> > 
> > 
> > 0xc082c552 <inodedep_find+13>:  cmp    %ecx,0x24(%eax)
> >     PREVIOUS we talked about
> > 0xc083bd42 <inodedep_find+13>:  cmp    %ecx,0x24(%eax)
> >     CURRENT ONE
> 
> Different instruction pointer doesn't matter.  The error is in the memory
> that %eax is loaded from in a prior instruction.
> 
> > Now, after all this I recompiled kernel and world and there was no 
crash.
> > How can it be, when it is far more stresing dan daily's 'find'?!
> 
> Because it might have shuffled where the bad memory cell now lives by 
having
> the kernel text + data laid out differently in RAM?
> 
> > I see addresses 0xc08* and 0xc06* appearing each time, so as I have 
four 
> > DDR1 (400) modules, each of 256 MB = 1GB, can those addresses aid me in 
> > targeting failing module?
> 
> The virtual addresses (0xc*) do not matter.  They are not physical 
addresses
> which are what you would need.
> 
> > If I can't use memtest86+-4.20, to determine failing module, then what 
is a 
> > use of it at all?
> > Test RAM speed perhaps?
> 
> Swap out your dimms.  That's really the only test, esp. if you have a
> reproducible crash.


That is exactly what I did. I've halfed dimms. Depending on a result, I'll 
half them again in one of directions.
Unfortunately, crash isn't reproducible, so I'll just hang with it for a 
month.


Domagoj




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20131023.090839.469.1>