Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 18 Jan 2012 20:57:58 +0200
From:      Kostik Belousov <kostikbel@gmail.com>
To:        Eygene Ryabinkin <rea@freebsd.org>
Cc:        fs@freebsd.org
Subject:   Re: kern/164261: [patch] fix panic with NFS served from NULLFS
Message-ID:  <20120118185758.GZ31224@deviant.kiev.zoral.com.ua>
In-Reply-To: <3YuyVoVrStnLgh/yaJvBsNy18nw@HbohoBmewgxm0atwUoKO7zhAAgw>
References:  <20120117202853.19F65DA81C@void.codelabs.ru> <20120118041439.GU31224@deviant.kiev.zoral.com.ua> <3YuyVoVrStnLgh/yaJvBsNy18nw@HbohoBmewgxm0atwUoKO7zhAAgw>

next in thread | previous in thread | raw e-mail | index | archive | help

--UE3U/UUWprt+cWGw
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Jan 18, 2012 at 02:07:27PM +0400, Eygene Ryabinkin wrote:
> Konstantin, good day.
>=20
> Wed, Jan 18, 2012 at 06:14:39AM +0200, Kostik Belousov wrote:
> > On Wed, Jan 18, 2012 at 12:28:53AM +0400, Eygene Ryabinkin wrote:
> > > Patches
> > >   http://codelabs.ru/fbsd/prs/2012-jan-nullfs-LK_SHARED/0001-NULLFS-p=
roperly-destroy-node-hash.patch
> > This one is probably fine, assuming that the hashes are properly cleared
> > on unmount. Feel free to commit.
>=20
> Will do, thanks!
>=20
> > > and
> > >   http://codelabs.ru/fbsd/prs/2012-jan-nullfs-LK_SHARED/0002-NULLFS-f=
ix-panics-when-lowervp-is-locked-with-LK_SHA.patch
> > > will fix the problem (in reality, the first patch is just some
> > > nitpicking).
> > And I do not even read this.
> >=20
> > The issue that the backtrace is pointing to seems to be the misuse of v=
rele(),
> > after the vnode lock is switched to null vnode v_lock. Since the vnode =
that
> > is being thrown out is exclusively locked, cleanup path shall do vput()
> > instead of vrele().
>=20
> The short story: at this vrele, vp had returned to its own v_lock as v_vn=
lock,
> so it is unlocked here.
>=20
>=20
> The long story with some thoughts and questions.  If vput() path will
> end up in null_reclaim(), this seems to be unhelpful:
>=20
>  - VOP_RECLAIM() expects exclusively locked vnode, since it was instructed
>    to by vnode_if.src and thus vnode_if.c has ASSERT_VOP_ELOCKED(a->a_vp)
>    in VOP_RECLAIM_APV(); for nullfs vop_islocked is vop_stdislocked() and
>    it checks the lock status of v_vnlock, so anything that comes to null_=
reclaim
>    will be exclusively locked with *v_vnlock;
>=20
>  - null_reclaim() has the following code,
>    {{{
>    struct vnode *vp =3D ap->a_vp;
>    [...]
>    lockmgr(&vp->v_lock, LK_EXCLUSIVE, NULL)
>    [...]]
>    }}}
>    And when vp->v_lock is equal to *v_vnlock, this will lead to the
>    lockmgr panic, because the thread tries to exclusively lock the
>    object that was already locked by itself and has no recursion rights.
>=20
> If anyone sees flaws in this explanations, please, point me to them.
Ok, real flaw there is the attempt to treat half-constructed vnode as
the fully-constructed one, later. It shall be decommissioned with the
same code as does the insmntque1(). The complication is the fact that
the vnode can be found on the mount list, but we only search for a vnode
by hash.

>=20
>=20
> I had recompiled the kernel with your vrele -> vput change inside
> null_reclaim and with DEBUG_VFS_LOCKS: it resulted in the lock violation
> from insmntque,
>   http://codelabs.ru/fbsd/prs/2012-jan-nullfs-LK_SHARED/DEBUG_VFS_LOCKS-p=
anic.txt
> So, it hadn't got to the point where null_reclaim() will come to the game.
>=20
> The problem is that insmntque1() wants the passed vnode to be exclusively
> locked, but nfsrvd_readdirplus() uses LK_SHARED.
>=20
> By the way, the ASSERT_VOP_ELOCKED() was introduced in r182364 by you:
> why do you insist on exclusive lock for MP-safe fs?  The current interplay
> of NFS and NULLFS makes me think that either some of filesystems isn't re=
ally
> MP-safe, or the requirement for exclusive locking can be relaxed.
insmntque1() requires the exclusively locked vnode, because
the function modifies the vnode (it inserts the vnode into mount list).

nfsd is right there, nullfs is not. The filesystem shall ensure the
proper locking if the requested mode is not strong enough. See how the
UFS treats the lock flags if ffs_vgetf(): shared is only honored if the
vnode is found in hash. So this is another bug, nullfs must switch to
exclusive lock there.

diff --git a/sys/fs/nullfs/null_subr.c b/sys/fs/nullfs/null_subr.c
index 319e404..dd4ab61 100644
--- a/sys/fs/nullfs/null_subr.c
+++ b/sys/fs/nullfs/null_subr.c
@@ -169,17 +169,26 @@ null_hashins(mp, xp)
 }
=20
 static void
-null_insmntque_dtr(struct vnode *vp, void *xp)
+null_destroy_proto(struct vnode *vp, void *xp)
 {
=20
-	vput(((struct null_node *)xp)->null_lowervp);
+	VI_LOCK(vp);
 	vp->v_data =3D NULL;
 	vp->v_vnlock =3D &vp->v_lock;
-	free(xp, M_NULLFSNODE);
 	vp->v_op =3D &dead_vnodeops;
+	VI_UNLOCK(vp);
 	(void) vn_lock(vp, LK_EXCLUSIVE | LK_RETRY);
 	vgone(vp);
 	vput(vp);
+	free(xp, M_NULLFSNODE);
+}
+
+static void
+null_insmntque_dtr(struct vnode *vp, void *xp)
+{
+
+	vput(((struct null_node *)xp)->null_lowervp);
+	null_destroy_proto(vp, xp);
 }
=20
 /*
@@ -250,9 +259,7 @@ null_nodeget(mp, lowervp, vpp)
 	*vpp =3D null_hashins(mp, xp);
 	if (*vpp !=3D NULL) {
 		vrele(lowervp);
-		vp->v_vnlock =3D &vp->v_lock;
-		xp->null_lowervp =3D NULL;
-		vrele(vp);
+		null_destroy_proto(vp, xp);
 		return (0);
 	}
 	*vpp =3D vp;
diff --git a/sys/fs/nullfs/null_vfsops.c b/sys/fs/nullfs/null_vfsops.c
index cf3176f..d39926f 100644
--- a/sys/fs/nullfs/null_vfsops.c
+++ b/sys/fs/nullfs/null_vfsops.c
@@ -307,6 +307,12 @@ nullfs_vget(mp, ino, flags, vpp)
 	struct vnode **vpp;
 {
 	int error;
+
+	KASSERT((flags & LK_TYPE_MASK) !=3D 0,
+	    ("nullfs_vget: no lock requested"));
+	flags &=3D ~LK_TYPE_MASK;
+	flags |=3D LK_EXCLUSIVE;
+
 	error =3D VFS_VGET(MOUNTTONULLMOUNT(mp)->nullm_vfs, ino, flags, vpp);
 	if (error)
 		return (error);
diff --git a/sys/fs/nullfs/null_vnops.c b/sys/fs/nullfs/null_vnops.c
index e0645bd..b607666 100644
--- a/sys/fs/nullfs/null_vnops.c
+++ b/sys/fs/nullfs/null_vnops.c
@@ -697,12 +697,18 @@ null_inactive(struct vop_inactive_args *ap)
 static int
 null_reclaim(struct vop_reclaim_args *ap)
 {
-	struct vnode *vp =3D ap->a_vp;
-	struct null_node *xp =3D VTONULL(vp);
-	struct vnode *lowervp =3D xp->null_lowervp;
+	struct vnode *vp;
+	struct null_node *xp;
+	struct vnode *lowervp;
+
+	vp =3D ap->a_vp;
+	xp =3D VTONULL(vp);
+	lowervp =3D xp->null_lowervp;
+
+	KASSERT(lowervp !=3D NULL && vp->v_vnlock !=3D &vp->v_lock,
+	    ("Reclaiming inclomplete null vnode %p", vp));
=20
-	if (lowervp)
-		null_hashrem(xp);
+	null_hashrem(xp);
 	/*
 	 * Use the interlock to protect the clearing of v_data to
 	 * prevent faults in null_lock().
@@ -713,10 +719,7 @@ null_reclaim(struct vop_reclaim_args *ap)
 	vp->v_object =3D NULL;
 	vp->v_vnlock =3D &vp->v_lock;
 	VI_UNLOCK(vp);
-	if (lowervp)
-		vput(lowervp);
-	else
-		panic("null_reclaim: reclaiming a node with no lowervp");
+	vput(lowervp);
 	free(xp, M_NULLFSNODE);
=20
 	return (0);

--UE3U/UUWprt+cWGw
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (FreeBSD)

iEYEARECAAYFAk8XFjYACgkQC3+MBN1Mb4hZ0QCgomQ64o9FleVDK778Mb4MA3dz
LgwAnjFKq0NM2K8tqujf4zIlUMgxpxnU
=nbAC
-----END PGP SIGNATURE-----

--UE3U/UUWprt+cWGw--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120118185758.GZ31224>