Date: Mon, 21 Oct 2013 18:47:00 -0400 From: John Baldwin <jhb@freebsd.org> To: rank1seeker@gmail.com Cc: Adam Vande More <amvandemore@gmail.com>, hackers@freebsd.org Subject: Re: UFS related panic (daily <-> find) Message-ID: <201310211847.00875.jhb@freebsd.org> In-Reply-To: <20131021.133036.045.1@DOMY-PC> References: <20130719.174511.786.3@DOMY-PC> <CA%2BtpaK23Wo8Nemi-xUy3-BZSUdeWkpWQh_o6Ws=mxi6jrbubvw@mail.gmail.com> <20131021.133036.045.1@DOMY-PC>
next in thread | previous in thread | raw e-mail | index | archive | help
On Monday, October 21, 2013 9:30:36 am rank1seeker@gmail.com wrote: > > > Same drill as before, see what instruction this is. Actually, this > looks > > > to > > > be in the same location as your last panic, so a NULL pointer is 0x1 > > > instead > > > of 0x0 again. In my experience, this would still indicate failing RAM > to > > > me, > > > memtest86+ notwithstanding (memtest86+ is single threaded AFAIK, so it > may > > > not stress the hardware quite the same, e.g. if the error is heat > related, > > > etc.). > > > > > > memtest* cannot conclusively diagnose a dimm as good. Usually the only > > practical solution is to swap modules with known good ones. > > > > > 0xc082c552 <inodedep_find+13>: cmp %ecx,0x24(%eax) > PREVIOUS we talked about > 0xc083bd42 <inodedep_find+13>: cmp %ecx,0x24(%eax) > CURRENT ONE Different instruction pointer doesn't matter. The error is in the memory that %eax is loaded from in a prior instruction. > Now, after all this I recompiled kernel and world and there was no crash. > How can it be, when it is far more stresing dan daily's 'find'?! Because it might have shuffled where the bad memory cell now lives by having the kernel text + data laid out differently in RAM? > I see addresses 0xc08* and 0xc06* appearing each time, so as I have four > DDR1 (400) modules, each of 256 MB = 1GB, can those addresses aid me in > targeting failing module? The virtual addresses (0xc*) do not matter. They are not physical addresses which are what you would need. > If I can't use memtest86+-4.20, to determine failing module, then what is a > use of it at all? > Test RAM speed perhaps? Swap out your dimms. That's really the only test, esp. if you have a reproducible crash. -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201310211847.00875.jhb>