Date: Mon, 25 Jul 2011 11:59:03 +0300 From: Kostik Belousov <kostikbel@gmail.com> To: Herve Boulouis <amon@aelita.org> Cc: rmacklem@freebsd.org, freebsd-stable@freebsd.org Subject: Re: Sleeping thread owns a nonsleepable lock panic (& lor) Message-ID: <20110725085902.GM17489@deviant.kiev.zoral.com.ua> In-Reply-To: <20110725102107.GB17204@ra.aabs> References: <20110725102107.GB17204@ra.aabs>
next in thread | previous in thread | raw e-mail | index | archive | help
--OxDl9SlxSp5FbYFo Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Jul 25, 2011 at 12:21:07PM +0200, Herve Boulouis wrote: > Hi list, >=20 > We have 2 freebsd 8.2-STABLE (cvsuped june 22) that keeps crashing in a b= ad way : >=20 > The are doing heavy apache / php4 web serving from a nfs mount and panic = at least once a day > with the following message (no crash dump produced, hand copied from the = console) : >=20 > Sleeping on "vmopar" with the following non-sleepable locks held: > exclusive sleep mutex NFSnode lock (NFSnode lock) r =3D 0 (0xffffff02017= 98000) locked @ nfsclient/nfs_subs.c:538 > lock order reversal: > 1st 0xffffffff018ff6da80 turnstile lock (turnstile lock) @ kern/subr_tur= nstile.c:190 > 2nd 0xffffffffff80b52b10 scrlock (scrlock) @ dev/syscons.c:2570 > lock order reversal: > 1st 0xffffffff018ff6da80 turnstile lock (turnstile lock) @ kern/subr_tur= nstile.c:190 > 2nd 0xffffffffff80b78ef8 sleepq chain (sleepq chain) @ kern/subr_turnsti= le.c:203 > lock order reversal: > 1st 0xffffffffff80b78ef8 sleepq chain (sleepq chain) @ kern/subr_turnsti= le.c:203 > 2nd 0xffffffffff80b52b10 scrlock (scrlock) @ dev/syscons.c:2570 > Sleeping thread (tid 100998, pid 20700) owns a non-sleepable lock > panic: sleeping thread > cpuid =3D 1 > panic: bufwrite: buffer is not busy??? > cpuid =3D 1 >=20 > The 2 servers share the same load and panic consistently. I enabled WITNE= SS on the 2 in the hope > it would allow the boxes to auto reboot after panic and get extra debug i= nfo. I got debug info > but the servers still hangs after the double panic :( Try this. Calling vnode_pager_setsize() while holding a mutex is prohibited. On the other hand, I remember that my attempt to add a strict assert that a vnode is exclusively locked in vnode_pager_setsize() had to be reversed because nfs_loadattrcache() sometimes called without vnode lock held. commit 2aa7d15c38b0c01e3f724f04d7ed02ce11c82cc0 Author: Konstantin Belousov <kostikbel@gmail.com> Date: Mon Jul 25 11:56:04 2011 +0300 Postpone the vnode_pager_setsize() call until the nfs node mutex is dro= pped. diff --git a/sys/nfsclient/nfs_subs.c b/sys/nfsclient/nfs_subs.c index 19fde06..351885a 100644 --- a/sys/nfsclient/nfs_subs.c +++ b/sys/nfsclient/nfs_subs.c @@ -478,7 +478,9 @@ nfs_loadattrcache(struct vnode **vpp, struct mbuf **mdp= , caddr_t *dposp, struct timespec mtime, mtime_save; int v3 =3D NFS_ISV3(vp); int error =3D 0; + int do_setsize; =20 + do_setsize =3D 0; md =3D *mdp; t1 =3D (mtod(md, caddr_t) + md->m_len) - *dposp; cp2 =3D nfsm_disct(mdp, dposp, NFSX_FATTR(v3), t1, M_WAIT); @@ -606,7 +608,7 @@ nfs_loadattrcache(struct vnode **vpp, struct mbuf **mdp= , caddr_t *dposp, np->n_size =3D vap->va_size; np->n_flag |=3D NSIZECHANGED; } - vnode_pager_setsize(vp, np->n_size); + do_setsize =3D 1; } else { np->n_size =3D vap->va_size; } @@ -643,6 +645,8 @@ nfs_loadattrcache(struct vnode **vpp, struct mbuf **mdp= , caddr_t *dposp, KDTRACE_NFS_ATTRCACHE_LOAD_DONE(vp, &np->n_vattr, 0); #endif mtx_unlock(&np->n_mtx); + if (do_setsize) + vnode_pager_setsize(vp, np->n_size); out: #ifdef KDTRACE_HOOKS if (error) --OxDl9SlxSp5FbYFo Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iEYEARECAAYFAk4tMFYACgkQC3+MBN1Mb4hbdQCdFW1D6Ic5r1zMXlMEMV0GoieS pbQAoL7U3cJ2KV17OwDi6JkqnQQc+cQe =8/06 -----END PGP SIGNATURE----- --OxDl9SlxSp5FbYFo--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110725085902.GM17489>