Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 23 Nov 2004 16:35:40 -0500
From:      John Baldwin <jhb@FreeBSD.org>
To:        Peter Holm <peter@holm.cc>
Cc:        phk@FreeBSD.org
Subject:   Re: panic: sleeping thread owns a non-sleepable lock
Message-ID:  <200411231635.40567.jhb@FreeBSD.org>
In-Reply-To: <20041123204635.GA42682@peter.osted.lan>
References:  <20041122143804.GA36649@peter.osted.lan> <200411231136.49362.jhb@FreeBSD.org> <20041123204635.GA42682@peter.osted.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tuesday 23 November 2004 03:46 pm, Peter Holm wrote:
> On Tue, Nov 23, 2004 at 11:36:49AM -0500, John Baldwin wrote:
> > On Monday 22 November 2004 08:13 pm, Peter Holm wrote:
> > > On Mon, Nov 22, 2004 at 04:57:36PM -0500, John Baldwin wrote:
> > > > On Monday 22 November 2004 09:38 am, Peter Holm wrote:
> > > > > During stress test with GENERIC HEAD from Nov 20 08:40 UTC I got:
> > > > > Sleeping on "fdesc" with the following non-sleepable locks held:
> > > > > exclusive sleep mutex fdesc r = 0 (0xc08d15a0) locked @
> > > > > kern/kern_descrip.c:2425 and then
> > > > > panic: sleeping thread (pid 92279) owns a non-sleepable lock
> > > > >
> > > > > http://www.holm.cc/stress/log/cons89.html
> > > >
> > > > Yes, the panic is a result of the earlier warning.  Poul-Henning
> > > > touched this code last, so it is probably something for him to look
> > > > at.  I'm unsure how msleep() is getting called, however.  The
> > > > turnstile panic is not important, can you find the thread that went
> > > > to sleep (should be pid 92279) and get stack trace for that?
> > >
> > > The ddb trace is in the log, just before call doadump. Let me know if
> > > you need any gdb output.
> >
> > Ok, can you use gdb to get the source/file of 'sysctl_kern_file+0x1ae'?
>
> I've updated to HEAD from Nov 23 08:05 UTC , but was lucky to get the same
> panic again :-) http://www.holm.cc/stress/log/cons90.html
>
> (kgdb) l *sysctl_kern_file+0x1ae
> 0xc05f3526 is in sysctl_kern_file (../../../kern/kern_descrip.c:2427).
> 2422                    mtx_lock(&fdesc_mtx);
> 2423                    if ((fdp = p->p_fd) == NULL) {
> 2424                            mtx_unlock(&fdesc_mtx);
> 2425                            continue;
> 2426                    }
> 2427                    FILEDESC_LOCK(fdp);
> 2428                    for (n = 0; n < fdp->fd_nfiles; ++n) {
> 2429                            if ((fp = fdp->fd_ofiles[n]) == NULL)
> 2430                                    continue;
> 2431                            xf.xf_fd = n;

Oh, this is because of phk's home rolled msleep locks.  Hmm, the basic problem 
here is that somehow he needs to drop the fdesc_mtx lock after locking the 
internal mutex but before doing the sleep.  Also, he will need to add a 
reference count (in case the fdp goes away while he is waiting for the 
xlock), and bump it before going to sleep and drop it after doing the 
SYSCTL_OUT().  Kind of like:

	lock(&fdesc_mtx);
	fdp = p->p_fd;
	FILEDESC_LOCK_SMALL(fdp);
	unlock(&fdesc_mtx);
	filedesc_hold(fdp);
	FILEDESC_LOCK_BIG(fdp);
	...
	SYSCTL_OUT();
	filedesc_free(fdp);

On the other hand, since the SYSCTL_OUT() can't block here, he probably just 
needs to use the FILEDESC_LOCK_FAST() variants that just lock the mutex 
instead of using the full-blown sleep lock.

-- 
John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200411231635.40567.jhb>