From owner-freebsd-bugs@FreeBSD.ORG Wed Jul 5 19:40:59 2006 Return-Path: X-Original-To: freebsd-bugs@hub.freebsd.org Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2F6C816A532 for ; Wed, 5 Jul 2006 19:40:59 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id 49ECA43DA4 for ; Wed, 5 Jul 2006 19:40:23 +0000 (GMT) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.13.4/8.13.4) with ESMTP id k65JeMDo013121 for ; Wed, 5 Jul 2006 19:40:22 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.13.4/8.13.4/Submit) id k65JeM6N013120; Wed, 5 Jul 2006 19:40:22 GMT (envelope-from gnats) Date: Wed, 5 Jul 2006 19:40:22 GMT Message-Id: <200607051940.k65JeM6N013120@freefall.freebsd.org> To: freebsd-bugs@FreeBSD.org From: John Baldwin Cc: Subject: Re: kern/99094: panic: sleeping thread (Sleeping thread ... owns a non-sleepable lock) X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: John Baldwin List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 05 Jul 2006 19:40:59 -0000 The following reply was made to PR kern/99094; it has been noted by GNATS. From: John Baldwin To: Eirik =?iso-8859-15?q?=D8verby?= Cc: bug-followup@freebsd.org, des@freebsd.org Subject: Re: kern/99094: panic: sleeping thread (Sleeping thread ... owns a non-sleepable lock) Date: Wed, 5 Jul 2006 14:25:41 -0400 On Saturday 01 July 2006 08:04, Eirik =D8verby wrote: > Hi again, >=20 > I now have WITNESS and INVARIANTS in the kernel, and today it hung =20 > again. It looks somewhat different than before, but I am fairly =20 > certain it's the same error. >=20 > Below you'll find the panic message, a bt, a ps, and then the output =20 > of a "c", which is exactly the same as the first message except it's =20 > not chopped off due to terminal size, and finally the panic resulting =20 > from the boot() call. >=20 > /Eirik >=20 > malloc(M_WAITOK) of "1024", forcing M_NOWAIT with the following non-=20 > sleepable locks held: > exclusive sleep mutex vm object (standard object) r =3D 0 =20 > (0xffffff0018f3fe00) locked @ /usr/src/sys/compat/linprocfs/lin9 > KDB: enter: witness_warn > [thread pid 77487 tid 100323 ] > Stopped at kdb_enter+0x2f: nop > db> >=20 >=20 > db> bt > Tracing pid 77487 tid 100323 td 0xffffff00531794c0 > kdb_enter() at kdb_enter+0x2f > witness_warn() at witness_warn+0x2e0 > uma_zalloc_arg() at uma_zalloc_arg+0x1ee > malloc() at malloc+0xab > vn_fullpath() at vn_fullpath+0x56 > linprocfs_doprocmaps() at linprocfs_doprocmaps+0x31e Well, the problem is in linprocfs. It is trying to do some very expensive= =20 things while holding a mutex. Here's the code excerpt: if (lobj) { vp =3D lobj->handle; VM_OBJECT_LOCK(lobj); off =3D IDX_TO_OFF(lobj->size); if (lobj->type =3D=3D OBJT_VNODE && lobj->handle) { vn_fullpath(td, vp, &name, &freename); VOP_GETATTR(vp, &vat, td->td_ucred, td); ino =3D vat.va_fileid; } flags =3D obj->flags; ref_count =3D obj->ref_count; shadow_count =3D obj->shadow_count; VM_OBJECT_UNLOCK(lobj); The VM_OBJECT_LOCK() is a mutex, and it can't really hold a mutex while=20 calling things like vn_fullpath() and VOP_GETATTR() as those can block, etc= =2E =20 It needs to probably be reordered to grab copies of the object fields under= =20 the object lock, take a ref on the vnode (via vref) then do the vn_fullpath= ()=20 and VOP_GETATTR() after dropping the vm object lock and finally do a vrele(= )=20 to drop the vnode reference. I'm cc'ing des@ as he's the linprocfs=20 maintainer and should be able to help with this further. =2D-=20 John Baldwin