From owner-freebsd-stable Thu Sep 21 10:50:10 2000 Delivered-To: freebsd-stable@freebsd.org Received: from mail.wolves.k12.mo.us (mail.wolves.k12.mo.us [207.160.214.1]) by hub.freebsd.org (Postfix) with ESMTP id B67BE37B43C for ; Thu, 21 Sep 2000 10:49:57 -0700 (PDT) Received: from mail.wolves.k12.mo.us (cdillon@mail.wolves.k12.mo.us [207.160.214.1]) by mail.wolves.k12.mo.us (8.9.3/8.9.3) with ESMTP id MAA32263; Thu, 21 Sep 2000 12:49:49 -0500 (CDT) (envelope-from cdillon@wolves.k12.mo.us) Date: Thu, 21 Sep 2000 12:49:49 -0500 (CDT) From: Chris Dillon To: Michael Allman Cc: BSD , stable@FreeBSD.ORG Subject: Re: Constant panics on 4.1-STABLE! In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Thu, 21 Sep 2000, Michael Allman wrote: > On Thu, 21 Sep 2000, Chris Dillon wrote: > > > On Thu, 21 Sep 2000, Michael Allman wrote: > > > > > I am having problems with random panics/reboots as well. I am using two > > > sticks of Corsair 128MB ECC memory. My motherboard uses the GX chipset. > > > Crashes occur when I am using both sticks and one or the other stick. > > > Considering that I have been using this memory reliably for about a year I > > > find it hard to believe that both sticks would go bad simultaneously. I > > > have been using CAS3, ECC settings in my bios. > > > > It probably isn't the memory, then (Corsair is pretty good). > > > > > > BTW, crash dumps will be meaningless if this really is a hardware > > > > problem. > > > > > > Equivalent to this statement is the following. If the crash dumps are not > > > meaningless (meaningful?), then this is not a hardware problem. I would > > > say it is still worthwhile to look at crash dumps. > > > > Wrong. You have no way of knowing just by looking at a crashdump if > > the problem was caused by random memory corruption, CPU flakyness, or > > whatever, or if it was a real software problem. Crashdumps are only > > useful if you _know_ flaky hardware wasn't the culprit. If you hand a > > developer a crashdump caused by hardware flakyness, you are going to > > send them on a wild goose-chase and they will never find a real > > problem with the code where the failure supposedly occurred. If > > they're really lucky, they'll look at a crashdump and say "It is not > > at all possible for this to have happened because of software. It > > must have been caused by hardware". I wouldn't put that burden on any > > of these developers, however. This has already happened at least a > > few times, and usually the developer wastes days or weeks looking for > > a non-existent problem until the original finder of the problem comes > > back and says "Duh, I'm REALLY sorry guys, but I found the culprit, it > > was my hardware". You can find at least a few of these archived in > > our mailing lists. > > Let's wait and see what the other guy who's having these problems comes up > with (Bart, I think). Also, I think I know why I'm not getting crash > dumps sometimes. When it starts to take a dump, if you press a key on the > keyboard it aborts, yes? Since I use my computer for application work > perhaps my typing at the keyboard is aborting the dump before it finishes. Yes, it is possible that is what is preventing the crashdumps. If you are in X whenever you get these unexplained reboots, it might help to hook up a serial console so that you can monitor the goings-on even if the primary display never makes it back to the real world when the crash occurs. > > > I have ECC RAM with ECC enabled. I get crashes anyway. Would you say > > > then that it's not the RAM? > > > > Then it most likely isn't the RAM. That does not, however, rule out > > the CPU, support chipsets, or even a weird expansion card that is > > spewing enough RF noise to cause data corruption on nearby devices. > > I have tried using another CPU to no avail. This other CPU is > currently in use in another system without problems. I have > swapped out every one of my expansion cards, and then some. One > thing that comes to mind is that I haven't tried a different > ethernet card (my ethernet is on the motherboard). I will try > that. I am also not excluding the possibility of a bad chipset. > I may try using a different motherboard. It's really just a > matter of finding the time to do these things. Try a new power supply, too. This won't make you feel any better, but I recently fought a system that I never found out what was causing the problems. I had gone through many combinations of motherboards, CPUs (both Intel and AMD, this was a Socket-7 system), power supplies, ethernet cards, video cards, RAM, hard drives, CDROM drives, etc... NOTHING was the same when I was done, yet I kept having the same problems. It was almost as if the case itself was posessed (or maybe I was cursed), as that was the only thing I hadn't changed. I knew it was a hardware problem because the weirdness occurred regularly in both FreeBSD and NT4 Server, even during initial installations of FreeBSD or NT4. I finally gave up and just got an entirely new system. I haven't had a single problem with it. -- Chris Dillon - cdillon@wolves.k12.mo.us - cdillon@inter-linc.net FreeBSD: The fastest and most stable server OS on the planet. For Intel x86 and Alpha architectures. ( http://www.freebsd.org ) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message