From owner-freebsd-current@FreeBSD.ORG Mon Jun 7 11:00:19 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 30E9A16A4CE for ; Mon, 7 Jun 2004 11:00:19 +0000 (GMT) Received: from hetzner.co.za (lfw.hetzner.co.za [196.7.18.226]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0CAEE43D1F for ; Mon, 7 Jun 2004 11:00:18 +0000 (GMT) (envelope-from ianf@hetzner.co.za) Received: from localhost ([127.0.0.1]) by hetzner.co.za with esmtp (Exim 3.36 #1) id 1BXHrQ-00019t-00 for freebsd-current@freebsd.org; Mon, 07 Jun 2004 13:00:12 +0200 To: John Baldwin From: Ian FREISLICH In-reply-to: Your message of "Fri, 04 Jun 2004 14:34:32 -0400." <200406041434.32193.jhb@FreeBSD.org> X-Attribution: BOFH Date: Sat, 05 Jun 2004 23:44:09 +0200 Sender: ianf@hetzner.co.za Resent-To: freebsd-current@freebsd.org Resent-Date: Mon, 07 Jun 2004 13:00:12 +0200 Resent-From: Ian FREISLICH Resent-Message-Id: cc: freebsd-current@freebsd.org Subject: Re: It's happening again (panic early in boot) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jun 2004 11:00:19 -0000 John Baldwin wrote: > On Friday 04 June 2004 11:14 am, Ian FREISLICH wrote: > > John Baldwin wrote: > > > On Friday 04 June 2004 06:45 am, Ian FREISLICH wrote: > > > > Hi > > > > > > > > Every month or so after it started working I get this panic. > > > > The panic then goes away after a month or two, with no > > > > explanation. During the existence of the panic I try new kernel > > > > source once a day. > > > > > > > > This is an SMP machine. Using the same source UP kernels work > > > > fine, SMP kernels don't. The last SMP kernel that worked is > > > > circa May 17. > > > > > > grr, I still don't know why this happens. One thing though is > > > that if we can fix the nested panic we might can work on the first > > > one. > > > > If you want access to the box in question, I can arrange that. > > > > > > Booting [/boot/kernel/kernel]... > > > > /boot/kernel/acpi.ko text=0x3a0e4 data=0x19e4+0x11ac > > > > syms=[0x4+0x6860+0x4+0x8a87 ] > > > > Copyright (c) 1992-2004 The FreeBSD Project. > > > > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, > > > > 1994 The Regents of the University of California. All rights reserved. > > > > FreeBSD 5.2-CURRENT #15: Fri Jun 4 10:23:23 SAST 2004 > > > > > > > > ianf@brane-dead.freislich.nom.za:/usr/src/sys/i386/compile/BRANE-DEAD > > > > Preloaded elf kernel "/boot/kernel/kernel" at 0xc0728000. > > > > Preloaded elf module "/boot/kernel/acpi.ko" at 0xc0728244. > > > > Timecounter "i8254" frequency 1193182 Hz quality 0 > > > > CPU: Pentium II/Pentium II Xeon/Celeron (267.27-MHz 686-class CPU) > > > > Origin = "GenuineIntel" Id = 0x634 Stepping = 4 > > > > > > > > Features=0x80fbff > > >MCA, CMO V,MMX> > > > > real memory = 201261056 (191 MB) > > > > avail memory = 191311872 (182 MB) > > > > MPTable: > > > > kernel trap 12 with interrupts disabled > > > > > > > > > > > > Fatal trap 12: page fault while in kernel mode > > > > cpuid = 0; apic id = 00 > > > > fault virtual address = 0x1c > > > > fault code = supervisor write, page not present > > > > instruction pointer = 0x8:0xc058d98e > > > > > > Can you do a gdb -k on kernel.debug and do 'l *' on this address? That > > > might let us fix the panic in vm_fault(). > > > > Is this what you're after? > > > > (kgdb) l * 0xc058d98e > > 0xc058d98e is in vm_fault (machine/atomic.h:154). > > 149 static __inline int > > 150 atomic_cmpset_int(volatile u_int *dst, u_int exp, u_int src) > > 151 { > > 152 int res = exp; > > 153 > > 154 __asm __volatile ( > > 155 " " __XSTRING(MPLOCKED) " " > > 156 " cmpxchgl %1,%2 ; " > > 157 " setz %%al ; " > > 158 " movzbl %%al,%0 ; " > > > > Ian > > Hmm, darn inlines. :) Can you compile the kernel with either > INVARIANTS or MUTEX_NOINLINE so that mutex ops aren't inlined, > reproduce the panic and then do the same lookup using the new faulting > IP? (kgdb) l * 0xc04b9828 0xc04b9828 is in _mtx_lock_flags (../../../kern/kern_mutex.c:247). 242 void 243 _mtx_lock_flags(struct mtx *m, int opts, const char *file, int line) 244 { 245 246 MPASS(curthread != NULL); 247 KASSERT(m->mtx_object.lo_class == &lock_class_mtx_sleep, 248 ("mtx_lock() of spin mutex %s @ %s:%d", m->mtx_object.lo_name, 249 file, line)); 250 WITNESS_CHECKORDER(&m->mtx_object, opts | LOP_NEWORDER | LOP_EXCLUSIVE, 251 file, line); Interstingly with INVARIENTS, the panic is exactly the same except for this (new) text at the end of the multiple panic: panic: page fault at line 815 in file ../../../i386/i386/trap.ccpuid = 0; Uptime: 1s panic: _mtx_lock_sleep: recursed on non-recursive mutex system map @ ../../../vm/vm_map.c:2876 at line 437 in file ../../../kern/kern_mutex.ccpuid = 0; Uptime: 1s panic: _mtx_lock_sleep: recursed on non-rep Ian -- Ian Freislich