From owner-freebsd-current  Fri Feb 23 22:51:11 2001
Delivered-To: freebsd-current@freebsd.org
Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16])
	by hub.freebsd.org (Postfix) with ESMTP id AFA2937B491
	for <current@FreeBSD.ORG>; Fri, 23 Feb 2001 22:51:05 -0800 (PST)
	(envelope-from bde@zeta.org.au)
Received: from bde.zeta.org.au (bde.zeta.org.au [203.2.228.102])
	by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id RAA02820;
	Sat, 24 Feb 2001 17:50:58 +1100
Date: Sat, 24 Feb 2001 17:41:40 +1100 (EST)
From: Bruce Evans <bde@zeta.org.au>
X-Sender: bde@besplex.bde.org
To: Warner Losh <imp@harmony.village.org>
Cc: current@FreeBSD.ORG
Subject: Re: Today's panic :-)
In-Reply-To: <200102240551.f1O5p0W85583@harmony.village.org>
Message-ID: <Pine.BSF.4.21.0102241719140.26925-100000@besplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Fri, 23 Feb 2001, Warner Losh wrote:

> I've added INVARIANTS and WITNESS to my kernel.  Today I get a random
> panic on boot sometimes:
> 
> lock order reseral		(this doesn't cause the panic, but
> 				does seem to happen all the time)
>  1st vnode interlock last acquired @ ../../usr/ffs/ffs_fsops.c:396
>  2nd 0xc04837a0 mntvnode @ ../../ufs/ffs/ffs_vfsops.c:457
>  3rd 0xc80b9e8c vnode interlock @ ../../kern/vfs_subr.c:1872
> kernel trap 12 with interrupts disabled
> panic: runq-add: proc 0xc7b28ee0 (fsck_ufs) not SRUN
> Debugger("panic")
> Stopeed at Debugger+0x44: pushl %ebx
> db> trace
> Debugger(c03d3c03) at Debugger+0x44
> panic(c03d4040,c7b28ee0,c7b290a5,282,c7b2b960) at panic+0x70
> runq_add(c046ae20,c7b28ee0,c8a4bcra4,c0221ee5,c7b28ee0) at runq_add+0x40
> setrunqueue(c7b28ee0) at setrunqueue+0x10
> ithread_schedule(c0f0a00,1) at ithread_schedule+0x129
> sched_ithd(e) at sched_ithrd+0x3f
> Xresume14() at Xresume14+0x8
> --- interrupt, eip = 0xc03830fb, esp = 0x286, ebp = 0xc8a4bd34 ---
> trap(18,10,10,73b152,0) at trap+0x9b
> calltrap() at calltrap+0x5
> --- trap 0xc, eip = 0xc03822e9, esp = 0xc8a4bd7c, ebp = 0xc8a4bd90 ---
> sw1b(0,...) at sw1b+0x6b
> msleep(...) at msleep+0x588
> physio(...) at physio+0x30d
> spec_read(...) at spec_read+0x71
> ufsspec_read(...) at ufsspec_read+0x20
> ufs_noperatespec(...) at ufs_noperatespec+0x15
> vn_read(...) at vn_read+0x128
> dofileread(...) at dofileread+0xb0
> read(...) at read+0x36
> syscall(...) at syscall+0x551
> Xint0x80_syscall() at Xint0x80_syscall+0x23
> --- syscall 0x3, eip = 0x8054770, esp = 0xbfbfef60, ebp = 0xbfbfef9c ---
> db>
> 
> Anything that I can do to help?  I don't have a core dump of this, but
> it is happening often enough to be a pain.

It seems to be another trap while holding sched_lock.  This should be
fatal, but the problem is only detected because trap() enables
interrupts.  Then an interrupt causes bad things to happen.  Unfortunately,
the above omits the critical information: the instruction at sw1b+0x6b.
There is no instruction at that address here.  It is apparently just an
access to a swapped-out page for the new process.  I can't see how this
ever worked.  The page must be faulted in, but this can't be done while
sched_lock is held (not to mention after we have committed to switching
contexts).

Bruce


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message