FreeBSD Mail Archives

Date:      Mon, 25 Jul 2011 16:19:37 +0200
From:      Herve Boulouis <amon@aelita.org>
To:        Kostik Belousov <kostikbel@gmail.com>
Cc:        rmacklem@freebsd.org, Herve Boulouis <amon@aelita.org>, freebsd-stable@freebsd.org
Subject:   Re: Sleeping thread owns a nonsleepable lock panic (& lor)
Message-ID:  <20110725141937.GE17204@ra.aabs>
In-Reply-To: <20110725085902.GM17489@deviant.kiev.zoral.com.ua>
References:  <20110725102107.GB17204@ra.aabs> <20110725085902.GM17489@deviant.kiev.zoral.com.ua>

Le 25/07/2011  11:59, Kostik Belousov a écrit:
> On Mon, Jul 25, 2011 at 12:21:07PM +0200, Herve Boulouis wrote:
> > Hi list,
> > 
> > We have 2 freebsd 8.2-STABLE (cvsuped june 22) that keeps crashing in a bad way :
> > 
> > The are doing heavy apache / php4 web serving from a nfs mount and panic at least once a day
> > with the following message (no crash dump produced, hand copied from the console) :
> > 
> > Sleeping on "vmopar" with the following non-sleepable locks held:
> > exclusive sleep mutex NFSnode lock (NFSnode lock) r =  0 (0xffffff0201798000) locked @ nfsclient/nfs_subs.c:538
> > lock order reversal:
> >  1st 0xffffffff018ff6da80 turnstile lock (turnstile lock) @ kern/subr_turnstile.c:190
> >  2nd 0xffffffffff80b52b10 scrlock (scrlock) @ dev/syscons.c:2570
> > lock order reversal:
> >  1st 0xffffffff018ff6da80 turnstile lock (turnstile lock) @ kern/subr_turnstile.c:190
> >  2nd 0xffffffffff80b78ef8 sleepq chain (sleepq chain) @ kern/subr_turnstile.c:203
> > lock order reversal:
> >  1st 0xffffffffff80b78ef8 sleepq chain (sleepq chain) @ kern/subr_turnstile.c:203
> >  2nd 0xffffffffff80b52b10 scrlock (scrlock) @ dev/syscons.c:2570
> > Sleeping thread (tid 100998, pid 20700) owns a non-sleepable lock
> > panic: sleeping thread
> > cpuid = 1
> > panic: bufwrite: buffer is not busy???
> > cpuid = 1
> > 
> > The 2 servers share the same load and panic consistently. I enabled WITNESS on the 2 in the hope
> > it would allow the boxes to auto reboot after panic and get extra debug info. I got debug info
> > but the servers still hangs after the double panic :(
> 
> Try this. Calling vnode_pager_setsize() while holding a mutex is prohibited.
> On the other hand, I remember that my attempt to add a strict assert
> that a vnode is exclusively locked in vnode_pager_setsize() had to be
> reversed because nfs_loadattrcache() sometimes called without vnode
> lock held.
> 
> commit 2aa7d15c38b0c01e3f724f04d7ed02ce11c82cc0
> Author: Konstantin Belousov <kostikbel@gmail.com>
> Date:   Mon Jul 25 11:56:04 2011 +0300
> 
>     Postpone the vnode_pager_setsize() call until the nfs node mutex is dropped.

1 of the boxes crashed so its kernel is now running with the patch. I still get
the 3 LORs when services are starting thought :

lock order reversal:
 1st 0xffffff81ee061268 bufwait (bufwait) @ kern/vfs_bio.c:2636
 2nd 0xffffff0006901000 dirhash (dirhash) @ ufs/ufs/ufs_dirhash.c:285
lock order reversal:
 1st 0xffffff0125236c88 so_snd_sx (so_snd_sx) @ kern/uipc_sockbuf.c:145
 2nd 0xffffff01256e9448 nfs (nfs) @ kern/uipc_syscalls.c:2086
lock order reversal:
 1st 0xffffff01253e1c88 so_snd_sx (so_snd_sx) @ kern/uipc_sockbuf.c:145
 2nd 0xffffff01252b5620 ufs (ufs) @ kern/uipc_syscalls.c:2086

I'll keep you posted if the patch improves the stability or not.

Regards,

-- 
Herve Boulouis

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110725141937.GE17204>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation