From owner-freebsd-current Fri Feb 23 22:51:11 2001 Delivered-To: freebsd-current@freebsd.org Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by hub.freebsd.org (Postfix) with ESMTP id AFA2937B491 for ; Fri, 23 Feb 2001 22:51:05 -0800 (PST) (envelope-from bde@zeta.org.au) Received: from bde.zeta.org.au (bde.zeta.org.au [203.2.228.102]) by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id RAA02820; Sat, 24 Feb 2001 17:50:58 +1100 Date: Sat, 24 Feb 2001 17:41:40 +1100 (EST) From: Bruce Evans X-Sender: bde@besplex.bde.org To: Warner Losh Cc: current@FreeBSD.ORG Subject: Re: Today's panic :-) In-Reply-To: <200102240551.f1O5p0W85583@harmony.village.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Fri, 23 Feb 2001, Warner Losh wrote: > I've added INVARIANTS and WITNESS to my kernel. Today I get a random > panic on boot sometimes: > > lock order reseral (this doesn't cause the panic, but > does seem to happen all the time) > 1st vnode interlock last acquired @ ../../usr/ffs/ffs_fsops.c:396 > 2nd 0xc04837a0 mntvnode @ ../../ufs/ffs/ffs_vfsops.c:457 > 3rd 0xc80b9e8c vnode interlock @ ../../kern/vfs_subr.c:1872 > kernel trap 12 with interrupts disabled > panic: runq-add: proc 0xc7b28ee0 (fsck_ufs) not SRUN > Debugger("panic") > Stopeed at Debugger+0x44: pushl %ebx > db> trace > Debugger(c03d3c03) at Debugger+0x44 > panic(c03d4040,c7b28ee0,c7b290a5,282,c7b2b960) at panic+0x70 > runq_add(c046ae20,c7b28ee0,c8a4bcra4,c0221ee5,c7b28ee0) at runq_add+0x40 > setrunqueue(c7b28ee0) at setrunqueue+0x10 > ithread_schedule(c0f0a00,1) at ithread_schedule+0x129 > sched_ithd(e) at sched_ithrd+0x3f > Xresume14() at Xresume14+0x8 > --- interrupt, eip = 0xc03830fb, esp = 0x286, ebp = 0xc8a4bd34 --- > trap(18,10,10,73b152,0) at trap+0x9b > calltrap() at calltrap+0x5 > --- trap 0xc, eip = 0xc03822e9, esp = 0xc8a4bd7c, ebp = 0xc8a4bd90 --- > sw1b(0,...) at sw1b+0x6b > msleep(...) at msleep+0x588 > physio(...) at physio+0x30d > spec_read(...) at spec_read+0x71 > ufsspec_read(...) at ufsspec_read+0x20 > ufs_noperatespec(...) at ufs_noperatespec+0x15 > vn_read(...) at vn_read+0x128 > dofileread(...) at dofileread+0xb0 > read(...) at read+0x36 > syscall(...) at syscall+0x551 > Xint0x80_syscall() at Xint0x80_syscall+0x23 > --- syscall 0x3, eip = 0x8054770, esp = 0xbfbfef60, ebp = 0xbfbfef9c --- > db> > > Anything that I can do to help? I don't have a core dump of this, but > it is happening often enough to be a pain. It seems to be another trap while holding sched_lock. This should be fatal, but the problem is only detected because trap() enables interrupts. Then an interrupt causes bad things to happen. Unfortunately, the above omits the critical information: the instruction at sw1b+0x6b. There is no instruction at that address here. It is apparently just an access to a swapped-out page for the new process. I can't see how this ever worked. The page must be faulted in, but this can't be done while sched_lock is held (not to mention after we have committed to switching contexts). Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message