From owner-freebsd-questions@FreeBSD.ORG Wed Jun 18 23:48:01 2008 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5E625106567A for ; Wed, 18 Jun 2008 23:48:01 +0000 (UTC) (envelope-from kris@FreeBSD.org) Received: from weak.local (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 855378FC20; Wed, 18 Jun 2008 23:47:59 +0000 (UTC) (envelope-from kris@FreeBSD.org) Message-ID: <48599EAF.6050506@FreeBSD.org> Date: Thu, 19 Jun 2008 01:47:59 +0200 From: Kris Kennaway User-Agent: Thunderbird 2.0.0.14 (Macintosh/20080421) MIME-Version: 1.0 To: "Edwin L. Culp" References: <20080611133220.198644ite5u5r778@intranet.casasponti.net> <4850203A.5080805@FreeBSD.org> <20080618181233.80176fck3bdyi680@intranet.casasponti.net> In-Reply-To: <20080618181233.80176fck3bdyi680@intranet.casasponti.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit Cc: freebsd-questions@freebsd.org Subject: Re: reboot after panic : page fault for two consecutive days now with FreeBSD stable 7.0 X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Jun 2008 23:48:01 -0000 Edwin L. Culp wrote: > Kris Kennaway escribió: > >> eculp wrote: >>> This is on a relatively new Dell dualcore with 4G of ram running up >>> to date stable. I'm not on site so I have no idea what might be >>> provoking these crashes. In fact in many years of running FreeBSD >>> I've not seen something just happen like this. It is a >>> simi-production machine that cvsups daily and builds and installs a >>> new world and kernel. Ports are updated about once a week and >>> haven't seen any issues previously. It has been running 24/7 since >>> new, about 8 months. >>> >>> 3 files were generated info, bounds and vmcore. The info file follows: >>> >>> Dump header from device /dev/mfid0s1b >>> Architecture: i386 >>> Architecture Version: 2 >>> Dump Length: 341225472B (325 MB) >>> Blocksize: 512 >>> Dumptime: Wed Jun 11 12:34:24 2008 >>> Hostname: casasponti.net >>> Magic: FreeBSD Kernel Dump >>> Version String: FreeBSD 7.0-STABLE #258: Tue Jun 10 05:54:42 CDT 2008 >>> root@casasponti.net:/usr/obj/usr/src/sys/ENCONTACTO >>> Panic String: page fault >>> Dump Parity: 2395754794 >>> Bounds: 2 >>> Dump Status: good >>> >>> the vmcore is about 300M so I'm not attaching it;) I could put it on >>> line at a moments notice. I think that what I need is probably a >>> crash course on debugging a crash and I really don't know where to >>> start since after over 10 years with freebsd I've never needed it. >>> Any help, suggestions, etc. would be greatly appreciated. >> >> See the developers' handbook chapter on kernel debugging. >> >> However, panics that "suddenly" start happening frequently on a system >> that has been stable for a while with no OS or workload changes made, >> are usually due to the hardware starting to fail. >> >> Kris > > I got as far as I could. I recompiled the kernel with debuging and > waited for a new panic. I got the fourth one a few minutes ago and went > as far as I could with the handbook. > > # kgdb kernel.debug /var/crash/vmcore.4 > GNU gdb 6.1.1 [FreeBSD] > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you > are > welcome to change it and/or distribute copies of it under certain > conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "i386-marcel-freebsd"... > > Unread portion of the kernel message buffer: > > > Fatal trap 12: page fault while in kernel mode > cpuid = 0; apic id = 00 > fault virtual address = 0x0 > fault code = supervisor write, page not present > instruction pointer = 0x20:0xc0716ba9 > stack pointer = 0x28:0xe6d2bc4c > frame pointer = 0x28:0xe6d2bc4c > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, def32 1, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 13 (swi4: clock sio) > trap number = 12 > panic: page fault > cpuid = 0 > Uptime: 1d4h34m22s > Physical memory: 3315 MB > Dumping 273 MB: 258 242 226 210 194 178 162 146 130 114 98 82 66 50 34 18 2 > > Reading symbols from /boot/kernel/mfi_linux.ko...Reading symbols from > /boot/kernel/mfi_linux.ko.symbols...done. > done. > Loaded symbols for /boot/kernel/mfi_linux.ko > Reading symbols from /boot/kernel/acpi.ko...Reading symbols from > /boot/kernel/acpi.ko.symbols...done. > done. > Loaded symbols for /boot/kernel/acpi.ko > Reading symbols from /boot/kernel/fdescfs.ko...Reading symbols from > /boot/kernel/fdescfs.ko.symbols...done. > done. > Loaded symbols for /boot/kernel/fdescfs.ko > #0 doadump () at pcpu.h:195 > 195 __asm __volatile("movl %%fs:0,%0" : "=r" (td)); > > -------------------------------------------------------------------- > > That is as far as I got, any suggestions appreciated. I'm going to check > the others and see if the get further. I believe the instructions tell you to run 'bt' :) However, my advice re failing hardware remains in effect. Kris