Date: Wed, 18 Jan 2012 20:57:58 +0200 From: Kostik Belousov <kostikbel@gmail.com> To: Eygene Ryabinkin <rea@freebsd.org> Cc: fs@freebsd.org Subject: Re: kern/164261: [patch] fix panic with NFS served from NULLFS Message-ID: <20120118185758.GZ31224@deviant.kiev.zoral.com.ua> In-Reply-To: <3YuyVoVrStnLgh/yaJvBsNy18nw@HbohoBmewgxm0atwUoKO7zhAAgw> References: <20120117202853.19F65DA81C@void.codelabs.ru> <20120118041439.GU31224@deviant.kiev.zoral.com.ua> <3YuyVoVrStnLgh/yaJvBsNy18nw@HbohoBmewgxm0atwUoKO7zhAAgw>
next in thread | previous in thread | raw e-mail | index | archive | help
--UE3U/UUWprt+cWGw Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Jan 18, 2012 at 02:07:27PM +0400, Eygene Ryabinkin wrote: > Konstantin, good day. >=20 > Wed, Jan 18, 2012 at 06:14:39AM +0200, Kostik Belousov wrote: > > On Wed, Jan 18, 2012 at 12:28:53AM +0400, Eygene Ryabinkin wrote: > > > Patches > > > http://codelabs.ru/fbsd/prs/2012-jan-nullfs-LK_SHARED/0001-NULLFS-p= roperly-destroy-node-hash.patch > > This one is probably fine, assuming that the hashes are properly cleared > > on unmount. Feel free to commit. >=20 > Will do, thanks! >=20 > > > and > > > http://codelabs.ru/fbsd/prs/2012-jan-nullfs-LK_SHARED/0002-NULLFS-f= ix-panics-when-lowervp-is-locked-with-LK_SHA.patch > > > will fix the problem (in reality, the first patch is just some > > > nitpicking). > > And I do not even read this. > >=20 > > The issue that the backtrace is pointing to seems to be the misuse of v= rele(), > > after the vnode lock is switched to null vnode v_lock. Since the vnode = that > > is being thrown out is exclusively locked, cleanup path shall do vput() > > instead of vrele(). >=20 > The short story: at this vrele, vp had returned to its own v_lock as v_vn= lock, > so it is unlocked here. >=20 >=20 > The long story with some thoughts and questions. If vput() path will > end up in null_reclaim(), this seems to be unhelpful: >=20 > - VOP_RECLAIM() expects exclusively locked vnode, since it was instructed > to by vnode_if.src and thus vnode_if.c has ASSERT_VOP_ELOCKED(a->a_vp) > in VOP_RECLAIM_APV(); for nullfs vop_islocked is vop_stdislocked() and > it checks the lock status of v_vnlock, so anything that comes to null_= reclaim > will be exclusively locked with *v_vnlock; >=20 > - null_reclaim() has the following code, > {{{ > struct vnode *vp =3D ap->a_vp; > [...] > lockmgr(&vp->v_lock, LK_EXCLUSIVE, NULL) > [...]] > }}} > And when vp->v_lock is equal to *v_vnlock, this will lead to the > lockmgr panic, because the thread tries to exclusively lock the > object that was already locked by itself and has no recursion rights. >=20 > If anyone sees flaws in this explanations, please, point me to them. Ok, real flaw there is the attempt to treat half-constructed vnode as the fully-constructed one, later. It shall be decommissioned with the same code as does the insmntque1(). The complication is the fact that the vnode can be found on the mount list, but we only search for a vnode by hash. >=20 >=20 > I had recompiled the kernel with your vrele -> vput change inside > null_reclaim and with DEBUG_VFS_LOCKS: it resulted in the lock violation > from insmntque, > http://codelabs.ru/fbsd/prs/2012-jan-nullfs-LK_SHARED/DEBUG_VFS_LOCKS-p= anic.txt > So, it hadn't got to the point where null_reclaim() will come to the game. >=20 > The problem is that insmntque1() wants the passed vnode to be exclusively > locked, but nfsrvd_readdirplus() uses LK_SHARED. >=20 > By the way, the ASSERT_VOP_ELOCKED() was introduced in r182364 by you: > why do you insist on exclusive lock for MP-safe fs? The current interplay > of NFS and NULLFS makes me think that either some of filesystems isn't re= ally > MP-safe, or the requirement for exclusive locking can be relaxed. insmntque1() requires the exclusively locked vnode, because the function modifies the vnode (it inserts the vnode into mount list). nfsd is right there, nullfs is not. The filesystem shall ensure the proper locking if the requested mode is not strong enough. See how the UFS treats the lock flags if ffs_vgetf(): shared is only honored if the vnode is found in hash. So this is another bug, nullfs must switch to exclusive lock there. diff --git a/sys/fs/nullfs/null_subr.c b/sys/fs/nullfs/null_subr.c index 319e404..dd4ab61 100644 --- a/sys/fs/nullfs/null_subr.c +++ b/sys/fs/nullfs/null_subr.c @@ -169,17 +169,26 @@ null_hashins(mp, xp) } =20 static void -null_insmntque_dtr(struct vnode *vp, void *xp) +null_destroy_proto(struct vnode *vp, void *xp) { =20 - vput(((struct null_node *)xp)->null_lowervp); + VI_LOCK(vp); vp->v_data =3D NULL; vp->v_vnlock =3D &vp->v_lock; - free(xp, M_NULLFSNODE); vp->v_op =3D &dead_vnodeops; + VI_UNLOCK(vp); (void) vn_lock(vp, LK_EXCLUSIVE | LK_RETRY); vgone(vp); vput(vp); + free(xp, M_NULLFSNODE); +} + +static void +null_insmntque_dtr(struct vnode *vp, void *xp) +{ + + vput(((struct null_node *)xp)->null_lowervp); + null_destroy_proto(vp, xp); } =20 /* @@ -250,9 +259,7 @@ null_nodeget(mp, lowervp, vpp) *vpp =3D null_hashins(mp, xp); if (*vpp !=3D NULL) { vrele(lowervp); - vp->v_vnlock =3D &vp->v_lock; - xp->null_lowervp =3D NULL; - vrele(vp); + null_destroy_proto(vp, xp); return (0); } *vpp =3D vp; diff --git a/sys/fs/nullfs/null_vfsops.c b/sys/fs/nullfs/null_vfsops.c index cf3176f..d39926f 100644 --- a/sys/fs/nullfs/null_vfsops.c +++ b/sys/fs/nullfs/null_vfsops.c @@ -307,6 +307,12 @@ nullfs_vget(mp, ino, flags, vpp) struct vnode **vpp; { int error; + + KASSERT((flags & LK_TYPE_MASK) !=3D 0, + ("nullfs_vget: no lock requested")); + flags &=3D ~LK_TYPE_MASK; + flags |=3D LK_EXCLUSIVE; + error =3D VFS_VGET(MOUNTTONULLMOUNT(mp)->nullm_vfs, ino, flags, vpp); if (error) return (error); diff --git a/sys/fs/nullfs/null_vnops.c b/sys/fs/nullfs/null_vnops.c index e0645bd..b607666 100644 --- a/sys/fs/nullfs/null_vnops.c +++ b/sys/fs/nullfs/null_vnops.c @@ -697,12 +697,18 @@ null_inactive(struct vop_inactive_args *ap) static int null_reclaim(struct vop_reclaim_args *ap) { - struct vnode *vp =3D ap->a_vp; - struct null_node *xp =3D VTONULL(vp); - struct vnode *lowervp =3D xp->null_lowervp; + struct vnode *vp; + struct null_node *xp; + struct vnode *lowervp; + + vp =3D ap->a_vp; + xp =3D VTONULL(vp); + lowervp =3D xp->null_lowervp; + + KASSERT(lowervp !=3D NULL && vp->v_vnlock !=3D &vp->v_lock, + ("Reclaiming inclomplete null vnode %p", vp)); =20 - if (lowervp) - null_hashrem(xp); + null_hashrem(xp); /* * Use the interlock to protect the clearing of v_data to * prevent faults in null_lock(). @@ -713,10 +719,7 @@ null_reclaim(struct vop_reclaim_args *ap) vp->v_object =3D NULL; vp->v_vnlock =3D &vp->v_lock; VI_UNLOCK(vp); - if (lowervp) - vput(lowervp); - else - panic("null_reclaim: reclaiming a node with no lowervp"); + vput(lowervp); free(xp, M_NULLFSNODE); =20 return (0); --UE3U/UUWprt+cWGw Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iEYEARECAAYFAk8XFjYACgkQC3+MBN1Mb4hZ0QCgomQ64o9FleVDK778Mb4MA3dz LgwAnjFKq0NM2K8tqujf4zIlUMgxpxnU =nbAC -----END PGP SIGNATURE----- --UE3U/UUWprt+cWGw--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120118185758.GZ31224>