Date: Thu, 21 Sep 2000 13:08:52 -0400 (EDT) From: Michael Allman <msa@dinosauricon.com> To: Chris Dillon <cdillon@wolves.k12.mo.us> Cc: BSD <bsd@shell-server.com>, stable@FreeBSD.ORG Subject: Re: Constant panics on 4.1-STABLE! Message-ID: <Pine.BSF.4.21.0009211252260.17806-100000@dinosaur.umbc.edu> In-Reply-To: <Pine.BSF.4.21.0009211125170.27801-100000@mail.wolves.k12.mo.us>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 21 Sep 2000, Chris Dillon wrote: > On Thu, 21 Sep 2000, Michael Allman wrote: > > > I am having problems with random panics/reboots as well. I am using two > > sticks of Corsair 128MB ECC memory. My motherboard uses the GX chipset. > > Crashes occur when I am using both sticks and one or the other stick. > > Considering that I have been using this memory reliably for about a year I > > find it hard to believe that both sticks would go bad simultaneously. I > > have been using CAS3, ECC settings in my bios. > > It probably isn't the memory, then (Corsair is pretty good). > > > > BTW, crash dumps will be meaningless if this really is a hardware > > > problem. > > > > Equivalent to this statement is the following. If the crash dumps are not > > meaningless (meaningful?), then this is not a hardware problem. I would > > say it is still worthwhile to look at crash dumps. > > Wrong. You have no way of knowing just by looking at a crashdump if > the problem was caused by random memory corruption, CPU flakyness, or > whatever, or if it was a real software problem. Crashdumps are only > useful if you _know_ flaky hardware wasn't the culprit. If you hand a > developer a crashdump caused by hardware flakyness, you are going to > send them on a wild goose-chase and they will never find a real > problem with the code where the failure supposedly occurred. If > they're really lucky, they'll look at a crashdump and say "It is not > at all possible for this to have happened because of software. It > must have been caused by hardware". I wouldn't put that burden on any > of these developers, however. This has already happened at least a > few times, and usually the developer wastes days or weeks looking for > a non-existent problem until the original finder of the problem comes > back and says "Duh, I'm REALLY sorry guys, but I found the culprit, it > was my hardware". You can find at least a few of these archived in > our mailing lists. Let's wait and see what the other guy who's having these problems comes up with (Bart, I think). Also, I think I know why I'm not getting crash dumps sometimes. When it starts to take a dump, if you press a key on the keyboard it aborts, yes? Since I use my computer for application work perhaps my typing at the keyboard is aborting the dump before it finishes. > > I have ECC RAM with ECC enabled. I get crashes anyway. Would you say > > then that it's not the RAM? > > Then it most likely isn't the RAM. That does not, however, rule out > the CPU, support chipsets, or even a weird expansion card that is > spewing enough RF noise to cause data corruption on nearby devices. I have tried using another CPU to no avail. This other CPU is currently in use in another system without problems. I have swapped out every one of my expansion cards, and then some. One thing that comes to mind is that I haven't tried a different ethernet card (my ethernet is on the motherboard). I will try that. I am also not excluding the possibility of a bad chipset. I may try using a different motherboard. It's really just a matter of finding the time to do these things. Michael To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.21.0009211252260.17806-100000>