Date: Mon, 21 Oct 2013 15:30:36 +0200 From: rank1seeker@gmail.com To: "John Baldwin" <jhb@freebsd.org> Cc: Adam Vande More <amvandemore@gmail.com>, hackers@freebsd.org Subject: Re: UFS related panic (daily <-> find) Message-ID: <20131021.133036.045.1@DOMY-PC> In-Reply-To: <CA%2BtpaK23Wo8Nemi-xUy3-BZSUdeWkpWQh_o6Ws=mxi6jrbubvw@mail.gmail.com> References: <20130719.174511.786.3@DOMY-PC> <201310071212.05281.jhb@freebsd.org> <20131016.104912.479.1@DOMY-PC> <201310161650.52354.jhb@freebsd.org> <CA%2BtpaK23Wo8Nemi-xUy3-BZSUdeWkpWQh_o6Ws=mxi6jrbubvw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
> > Same drill as before, see what instruction this is. Actually, this looks > > to > > be in the same location as your last panic, so a NULL pointer is 0x1 > > instead > > of 0x0 again. In my experience, this would still indicate failing RAM to > > me, > > memtest86+ notwithstanding (memtest86+ is single threaded AFAIK, so it may > > not stress the hardware quite the same, e.g. if the error is heat related, > > etc.). > > > memtest* cannot conclusively diagnose a dimm as good. Usually the only > practical solution is to swap modules with known good ones. > 0xc082c552 <inodedep_find+13>: cmp %ecx,0x24(%eax) PREVIOUS we talked about 0xc083bd42 <inodedep_find+13>: cmp %ecx,0x24(%eax) CURRENT ONE Lattest (few days ago): -- #7 0xc06d5f11 in cache_lookup_times (dvp=0xc921b470, vpp=0xe7c24ae8, cnp=0xe7c24afc, tsp=0x0, ticksp=0x0) at /usr/src/sys/kern/vfs_cache.c:548 548 numchecks++; (kgdb) p ncp $1 = (struct namecache *) 0x1 (kgdb) p *ncp Cannot access memory at address 0x1 -- Now, after all this I recompiled kernel and world and there was no crash. How can it be, when it is far more stresing dan daily's 'find'?! Why does exactly daily's 'find' AND not every time, but each 10th or 20th time triggers this? I see addresses 0xc08* and 0xc06* appearing each time, so as I have four DDR1 (400) modules, each of 256 MB = 1GB, can those addresses aid me in targeting failing module? If I can't use memtest86+-4.20, to determine failing module, then what is a use of it at all? Test RAM speed perhaps? I might try to pull out module 2 and 4 (dual channel) and wait for a month to see if find will crash machine. Then another month ... Well you get a point. There must be other solution. Thanks for your help. Domagoj
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20131021.133036.045.1>