From owner-freebsd-current@FreeBSD.ORG  Mon Jun  7 11:00:19 2004
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 30E9A16A4CE
	for <freebsd-current@freebsd.org>;
	Mon,  7 Jun 2004 11:00:19 +0000 (GMT)
Received: from hetzner.co.za (lfw.hetzner.co.za [196.7.18.226])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 0CAEE43D1F
	for <freebsd-current@freebsd.org>;
	Mon,  7 Jun 2004 11:00:18 +0000 (GMT)
	(envelope-from ianf@hetzner.co.za)
Received: from localhost ([127.0.0.1])
	by hetzner.co.za with esmtp (Exim 3.36 #1)
	id 1BXHrQ-00019t-00
	for freebsd-current@freebsd.org; Mon, 07 Jun 2004 13:00:12 +0200
To: John Baldwin <jhb@FreeBSD.org>
From: Ian FREISLICH <if@hetzner.co.za>
In-reply-to: Your message of "Fri, 04 Jun 2004 14:34:32 -0400."
             <200406041434.32193.jhb@FreeBSD.org> 
X-Attribution: BOFH
Date: Sat, 05 Jun 2004 23:44:09 +0200
Sender: ianf@hetzner.co.za
Resent-To: freebsd-current@freebsd.org
Resent-Date: Mon, 07 Jun 2004 13:00:12 +0200
Resent-From: Ian FREISLICH <ianf@hetzner.co.za>
Resent-Message-Id: <E1BXHrQ-00019t-00@hetzner.co.za>
cc: freebsd-current@freebsd.org
Subject: Re: It's happening again (panic early in boot) 
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 07 Jun 2004 11:00:19 -0000

John Baldwin wrote:
> On Friday 04 June 2004 11:14 am, Ian FREISLICH wrote:
> > John Baldwin wrote:
> > > On Friday 04 June 2004 06:45 am, Ian FREISLICH wrote:
> > > > Hi
> > > >
> > > > Every month or so after it started working I get this panic.
> > > > The panic then goes away after a month or two, with no
> > > > explanation.  During the existence of the panic I try new kernel
> > > > source once a day.
> > > >
> > > > This is an SMP machine.  Using the same source UP kernels work
> > > > fine, SMP kernels don't.  The last SMP kernel that worked is
> > > > circa May 17.
> > >
> > > grr, I still don't know why this happens.  One thing though is
> > > that if we can fix the nested panic we might can work on the first
> > > one.
> >
> > If you want access to the box in question, I can arrange that.
> >
> > > > Booting [/boot/kernel/kernel]...
> > > > /boot/kernel/acpi.ko text=0x3a0e4 data=0x19e4+0x11ac
> > > > syms=[0x4+0x6860+0x4+0x8a87 ]
> > > > Copyright (c) 1992-2004 The FreeBSD Project.
> > > > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993,
> > > > 1994 The Regents of the University of California. All rights reserved.
> > > > FreeBSD 5.2-CURRENT #15: Fri Jun  4 10:23:23 SAST 2004
> > > >    
> > > > ianf@brane-dead.freislich.nom.za:/usr/src/sys/i386/compile/BRANE-DEAD
> > > > Preloaded elf kernel "/boot/kernel/kernel" at 0xc0728000.
> > > > Preloaded elf module "/boot/kernel/acpi.ko" at 0xc0728244.
> > > > Timecounter "i8254" frequency 1193182 Hz quality 0
> > > > CPU: Pentium II/Pentium II Xeon/Celeron (267.27-MHz 686-class CPU)
> > > >   Origin = "GenuineIntel"  Id = 0x634  Stepping = 4
> > > >
> > > > Features=0x80fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,
> > > >MCA, CMO V,MMX>
> > > > real memory  = 201261056 (191 MB)
> > > > avail memory = 191311872 (182 MB)
> > > > MPTable: <OEM00000 PROD00000000>
> > > > kernel trap 12 with interrupts disabled
> > > >
> > > >
> > > > Fatal trap 12: page fault while in kernel mode
> > > > cpuid = 0; apic id = 00
> > > > fault virtual address   = 0x1c
> > > > fault code              = supervisor write, page not present
> > > > instruction pointer     = 0x8:0xc058d98e
> > >
> > > Can you do a gdb -k on kernel.debug and do 'l *' on this address?  That
> > > might let us fix the panic in vm_fault().
> >
> > Is this what you're after?
> >
> > (kgdb) l * 0xc058d98e
> > 0xc058d98e is in vm_fault (machine/atomic.h:154).
> > 149     static __inline int
> > 150     atomic_cmpset_int(volatile u_int *dst, u_int exp, u_int src)
> > 151     {
> > 152             int res = exp;
> > 153
> > 154             __asm __volatile (
> > 155             "       " __XSTRING(MPLOCKED) " "
> > 156             "       cmpxchgl %1,%2 ;        "
> > 157             "       setz    %%al ;          "
> > 158             "       movzbl  %%al,%0 ;       "
> >
> > Ian
>
> Hmm, darn inlines. :) Can you compile the kernel with either
> INVARIANTS or MUTEX_NOINLINE so that mutex ops aren't inlined,
> reproduce the panic and then do the same lookup using the new faulting
> IP?

(kgdb) l * 0xc04b9828
0xc04b9828 is in _mtx_lock_flags (../../../kern/kern_mutex.c:247).
242     void
243     _mtx_lock_flags(struct mtx *m, int opts, const char *file, int line)
244     {
245
246             MPASS(curthread != NULL);
247             KASSERT(m->mtx_object.lo_class == &lock_class_mtx_sleep,
248                 ("mtx_lock() of spin mutex %s @ %s:%d", m->mtx_object.lo_name,
249                 file, line));
250             WITNESS_CHECKORDER(&m->mtx_object, opts | LOP_NEWORDER | LOP_EXCLUSIVE,
251                 file, line);


Interstingly with INVARIENTS, the panic is exactly the same except
for this (new) text at the end of the multiple panic:

panic: page fault
at line 815 in file ../../../i386/i386/trap.ccpuid = 0; 
Uptime: 1s
panic: _mtx_lock_sleep: recursed on non-recursive mutex system map @ ../../../vm/vm_map.c:2876

at line 437 in file ../../../kern/kern_mutex.ccpuid = 0; 
Uptime: 1s
panic: _mtx_lock_sleep: recursed on non-rep

Ian

--
Ian Freislich