From owner-freebsd-stable Wed May 3 2:26:44 2000 Delivered-To: freebsd-stable@freebsd.org Received: from account.abs.net (account.abs.net [207.114.5.70]) by hub.freebsd.org (Postfix) with ESMTP id AC5BD37B8D3; Wed, 3 May 2000 02:26:24 -0700 (PDT) (envelope-from howardl@account.abs.net) Received: (from howardl@localhost) by account.abs.net (8.9.3/8.9.3+RBL+DUL+RSS+ORBS) id FAA91225; Wed, 3 May 2000 05:25:30 -0400 (EDT) (envelope-from howardl) From: Howard Leadmon Message-Id: <200005030925.FAA91225@account.abs.net> Subject: Re: Debugging Kernel/System Crashes, can anyone help?? In-Reply-To: <20000503175346.S8284@freebie.lemis.com> from Greg Lehey at "May 3, 2000 05:53:46 pm" To: Greg Lehey Date: Wed, 3 May 2000 05:25:30 -0400 (EDT) Cc: freebsd-stable@FreeBSD.ORG, freebsd-hackers@FreeBSD.ORG X-Mailer: ELM [version 2.4ME+ PL72 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Well I actually have two prior crashes that I did save before I turned off the dumpsaves to avoid running out of drive space, and as I am by no means a gdb user if you could tell me what your looking for I'll be happy to fire up gdb and send you the info. Here is what I grabbed out of the /var/crash directory (hopefully this is useful), and I'll set the system to grab a clean dump next time around.. -rw-r--r-- 1 howardl wheel 31 Apr 8 15:44 bounds.0.gz -rw-r--r-- 1 howardl wheel 31 Apr 9 13:39 bounds.1.gz -rw-r--r-- 1 howardl wheel 825413 Apr 8 15:46 kernel.0.gz -rw-r--r-- 1 howardl wheel 825413 Apr 9 13:40 kernel.1.gz -rw-rw-r-- 1 howardl wheel 5 Feb 14 19:08 minfree -rw-r--r-- 1 howardl wheel 111631734 Apr 8 15:46 vmcore.0.gz -rw-r--r-- 1 howardl wheel 110971933 Apr 9 13:40 vmcore.1.gz NOTE - I gzipped the dumps to save space, as with 384M RAM it was leaving some rather large files... Also, thanks for the quick response.. > > I know I posted a few messages here in the past, but maybe someone who is > > good at tracking kernel problems can step up and lend a hand. > > > > I have a machine running FBSD 4.0-STABLE, and have been experiencing almost > > daily kernel panics or reboots on the machine. I have replaced ALL of the > > hardware, and reloaded the OS, but still having troubles. I am at a bit of > > a loss as to what is going on. From one panic, I thought well maybe this > > is an SMP issue, but removed one of the CPU's and still the box crashes. As > > I have basically replaced everything, I am at a loss as to where to go from > > here, so looking for some type of pointers or help with this.. > > Indeed. We need to address this issue in some detail. We need both > documentation and tools. > > > The other day I was there, and got the following from one of the > > crashes, as many times I am gone and luckally in some ways the box > > will just panicboot and go on it's way. Here is what I was able to > > copy down: > > > > > > Fatal trap 12: page fault while in kernel mode > > mp_lock=01000002; cpuid=1; lapic.id=01000000 > > fault virtual address= 0x30 > > fault code= supervisor read, page not present > > instruction pointer= 0x8:0xC01CAF71 > > stack pointer= 0x10:0xFF80DE48 > > frame pointer= 0x10:0xFF80DE4C > > code segment= base 0x0, limit 0xFFFFF, type 0x1B > > = DPL 0, pres 1, def 32, gran 1 > > processor eflags= interrupt enabled, resume, IOPL=0 > > current process = idle > > interupt mask= bio <- SMP: XXX > > trap number= 12 > > panic: page fault > > > > The formatting of it may not be perfect, but the information should be > > accurate, as I tried to be precise on what I wrote down. Also here are > > a few previous messages I had posted a while back when I thought this > > might be network related, but after trying several different NIC's I still > > have the same issues. I will include the info below, as maybe it will > > have some value in trying to debunk this problem. > > The sad thing is that this information is that most of this > information is almost useless. I'm thinking of printing out a stack > trace instead (comments, anybody?). Without tedious comparison with > your kernel namelist, all we can say here is that you died somewhere > in the kernel, that you have an SMP machine, and that the block I/O > subsystem is probably involved. If this is happening daily, you > should build a kernel with debugging symbols enabled and take a dump > of the next crash. We can then use gdb to analyse the dump. > > > Hello, I am running a 4.0-STABLE machine which is being used to host an > > Undernet IRC server, and the machine keeps dying at times, or should I say > > the networking side of it is at least dying. At first I thought it might > > have been related to the dc (DEC Chip) based drivers, so I replaced it with > > a EEpro board using the fxp driver, but the same results. > > > > > > If all your dumps have the interrupt mask set to bio, I don't think > it's a networking problem. With one possible exception... > > > Mar 27 12:39:00 u2 /kernel: fxp0: device timeout > > S_ren and I are trying to find out what is causing some weird Vinum > problems. He stated that the problem happened more frequently when > an fxp board was in the system. I don't believe him, and I've found > at least one bug in Vinum that has nothing to do with networking (but > does have to do with the bio mask); possibly, however, there's some > other problem with the fxp driver. > > It's possible that the other information will be of use, but I think > we first need to look at a dump. > > Greg > -- > Finger grog@lemis.com for PGP public key > See complete headers for address and phone numbers --- Howard Leadmon - howardl@abs.net - http://www.abs.net ABSnet Internet Services - Phone: 410-361-8160 - FAX: 410-361-8162 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message