From owner-freebsd-stable  Sun May 19 15:22:27 2002
Delivered-To: freebsd-stable@freebsd.org
Received: from mail.deltanet.com (mail.deltanet.com [216.237.144.132])
	by hub.freebsd.org (Postfix) with ESMTP id 0528337B40A
	for <freebsd-stable@FreeBSD.ORG>; Sun, 19 May 2002 15:22:23 -0700 (PDT)
Received: from mammoth.eat.frenchfries.net (da001d0787.lax-ca.osd.concentric.net [64.0.147.20])
	by mail.deltanet.com (8.11.6/8.11.6) with ESMTP id g4JM1SO01903
	for <freebsd-stable@FreeBSD.ORG>; Sun, 19 May 2002 15:01:29 -0700
Received: by mammoth.eat.frenchfries.net (Postfix, from userid 1000)
	id 720B349C8; Sun, 19 May 2002 15:21:07 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1])
	by mammoth.eat.frenchfries.net (Postfix) with ESMTP id 6584849C7
	for <freebsd-stable@FreeBSD.ORG>; Sun, 19 May 2002 15:21:07 -0700 (PDT)
Date: Sun, 19 May 2002 15:21:07 -0700 (PDT)
From: Paul Herman <pherman@frenchfries.net>
X-X-Sender: pherman@mammoth.eat.frenchfries.net
To: freebsd-stable@FreeBSD.ORG
Subject: Re: panic: softdep_disk_write_complete: lock is held
In-Reply-To: <20020517123019.V1458-100000@mammoth.eat.frenchfries.net>
Message-ID: <20020519150606.G443-100000@mammoth.eat.frenchfries.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-stable@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-stable.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-stable>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-stable>
X-Loop: FreeBSD.ORG

On Fri, 17 May 2002, I wrote:

> panic: softdep_disk_write_complete: lock is held
>
> syncing disks... panic: softdep_lock: locking against myself
> Uptime: 1m59s
> [...]

OK, I'm seeing something strange here.  As you can see, the panic
happens here at frame 10:

#7  0xc022066f in sync (p=0xc048f380, uap=0x0)
    at /usr/src/sys/kern/vfs_syscalls.c:576
#8  0xc01f0c82 in boot (howto=256) at /usr/src/sys/kern/kern_shutdown.c:235
#9  0xc01f130c in poweroff_wait (junk=0xc03dad00, howto=-1015284636)
    at /usr/src/sys/kern/kern_shutdown.c:595
#10 0xc02f7a64 in softdep_disk_write_complete (bp=0xc37995ac)
    at /usr/src/sys/ufs/ffs/ffs_softdep.c:3228
#11 0xc0218571 in biodone (bp=0xc37995ac) at /usr/src/sys/kern/vfs_bio.c:2703
#12 0xc021a670 in cluster_callback (bp=0xc376fef8)
    at /usr/src/sys/kern/vfs_cluster.c:549
#13 0xc0218544 in biodone (bp=0xc376fef8) at /usr/src/sys/kern/vfs_bio.c:2698
#14 0xc015d5af in ad_interrupt (request=0xc105d8c0)

where the offending code in softdep_disk_write_complete() is:

3226    #ifdef DEBUG
3227            if (lk.lkt_held != -1)
3228                    panic("softdep_disk_write_complete: lock is held");
3229            lk.lkt_held = -2;
3230    #endif

but in this very frame...

(kgdb) print lk
$2 = {lkt_spl = 6867008, lkt_held = -1}

So, according to lk, no lock is held, but the if condition is
satisfied anyway.  Is there a race condition somewhere?  biodone()
is called twice (don't know if that's normal) which might cause the
lock to be freed twice?  Or is something else happening?  I have no
clue here.

I've narrowed this panic down, and I'm able to 100% reliably
reproduce it in single user mode with only a shell and ftp running.
It happens while transfering a large file over a wi0 wireless link.
During the transfer, wi0 produces about 460 interrupts per second.
Again, I'm running 4.6-PRERELEASE.

-Paul.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message