Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 18 Jan 2012 14:07:27 +0400
From:      Eygene Ryabinkin <rea@freebsd.org>
To:        Kostik Belousov <kostikbel@gmail.com>
Cc:        fs@freebsd.org
Subject:   Re: kern/164261: [patch] fix panic with NFS served from NULLFS
Message-ID:  <3YuyVoVrStnLgh/yaJvBsNy18nw@HbohoBmewgxm0atwUoKO7zhAAgw>
In-Reply-To: <20120118041439.GU31224@deviant.kiev.zoral.com.ua>
References:  <20120117202853.19F65DA81C@void.codelabs.ru> <20120118041439.GU31224@deviant.kiev.zoral.com.ua>

next in thread | previous in thread | raw e-mail | index | archive | help

--4Ckj6UjgE2iN1+kY
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Konstantin, good day.

Wed, Jan 18, 2012 at 06:14:39AM +0200, Kostik Belousov wrote:
> On Wed, Jan 18, 2012 at 12:28:53AM +0400, Eygene Ryabinkin wrote:
> > Patches
> >   http://codelabs.ru/fbsd/prs/2012-jan-nullfs-LK_SHARED/0001-NULLFS-pro=
perly-destroy-node-hash.patch
> This one is probably fine, assuming that the hashes are properly cleared
> on unmount. Feel free to commit.

Will do, thanks!

> > and
> >   http://codelabs.ru/fbsd/prs/2012-jan-nullfs-LK_SHARED/0002-NULLFS-fix=
-panics-when-lowervp-is-locked-with-LK_SHA.patch
> > will fix the problem (in reality, the first patch is just some
> > nitpicking).
> And I do not even read this.
>=20
> The issue that the backtrace is pointing to seems to be the misuse of vre=
le(),
> after the vnode lock is switched to null vnode v_lock. Since the vnode th=
at
> is being thrown out is exclusively locked, cleanup path shall do vput()
> instead of vrele().

The short story: at this vrele, vp had returned to its own v_lock as v_vnlo=
ck,
so it is unlocked here.


The long story with some thoughts and questions.  If vput() path will
end up in null_reclaim(), this seems to be unhelpful:

 - VOP_RECLAIM() expects exclusively locked vnode, since it was instructed
   to by vnode_if.src and thus vnode_if.c has ASSERT_VOP_ELOCKED(a->a_vp)
   in VOP_RECLAIM_APV(); for nullfs vop_islocked is vop_stdislocked() and
   it checks the lock status of v_vnlock, so anything that comes to null_re=
claim
   will be exclusively locked with *v_vnlock;

 - null_reclaim() has the following code,
   {{{
   struct vnode *vp =3D ap->a_vp;
   [...]
   lockmgr(&vp->v_lock, LK_EXCLUSIVE, NULL)
   [...]]
   }}}
   And when vp->v_lock is equal to *v_vnlock, this will lead to the
   lockmgr panic, because the thread tries to exclusively lock the
   object that was already locked by itself and has no recursion rights.

If anyone sees flaws in this explanations, please, point me to them.


I had recompiled the kernel with your vrele -> vput change inside
null_reclaim and with DEBUG_VFS_LOCKS: it resulted in the lock violation
=66rom insmntque,
  http://codelabs.ru/fbsd/prs/2012-jan-nullfs-LK_SHARED/DEBUG_VFS_LOCKS-pan=
ic.txt
So, it hadn't got to the point where null_reclaim() will come to the game.

The problem is that insmntque1() wants the passed vnode to be exclusively
locked, but nfsrvd_readdirplus() uses LK_SHARED.

By the way, the ASSERT_VOP_ELOCKED() was introduced in r182364 by you:
why do you insist on exclusive lock for MP-safe fs?  The current interplay
of NFS and NULLFS makes me think that either some of filesystems isn't real=
ly
MP-safe, or the requirement for exclusive locking can be relaxed.


But to move on, I had removed ASSERT_VOP_ELOCKED() from insmntque1
and did the test once again.  Another panic, now from vputx(),
  http://codelabs.ru/fbsd/prs/2012-jan-nullfs-LK_SHARED/DEBUG_VFS_LOCKS-no-=
MPsafe-check-panic.txt
and it hits a really good assertion: we had switched to the native
lock (vp->v_lock) and it is not locked at all.  So, I think that
vrele() is the appropriate call for the race loser return path
inside null_nodeget().

> Despite above, vrele(lowervp) call is be fine, despite lowervp also
> being locked exclusively, because usecount for the vnode must be > 1,
> due to null_hashins() successfully found the ovp in the hash.
>=20
> Try the change below.
>
> diff --git a/sys/fs/nullfs/null_subr.c b/sys/fs/nullfs/null_subr.c
> index 319e404..6ba7508 100644
> --- a/sys/fs/nullfs/null_subr.c
> +++ b/sys/fs/nullfs/null_subr.c
> @@ -252,7 +252,7 @@ null_nodeget(mp, lowervp, vpp)
>  		vrele(lowervp);
>  		vp->v_vnlock =3D &vp->v_lock;
>  		xp->null_lowervp =3D NULL;
> -		vrele(vp);
> +		vput(vp);
>  		return (0);
>  	}
>  	*vpp =3D vp;

It will also need to remove
{{{
panic("null_reclaim: reclaiming a node with no lowervp");
}}}
=66rom null_reclaim(), since this function is not ready for null
vnodes released by race losers.

And my intention to add the code (the second patch) that will not pass
such half-baked vnodes into existence was precisely to make
null_reclaim to be simple and to add all internal complexity of
getting new null vnodes to the null_nodeget() and null_subr.c.
--=20
Eygene Ryabinkin                                        ,,,^..^,,,
[ Life's unfair - but root password helps!           | codelabs.ru ]
[ 82FE 06BC D497 C0DE 49EC  4FF0 16AF 9EAE 8152 ECFB | freebsd.org ]

--4Ckj6UjgE2iN1+kY
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (FreeBSD)

iF4EABEIAAYFAk8Wmd8ACgkQFq+eroFS7PtTFgD/eaU4z/EUV6XNH5/SzCXzt3h4
Bk3//7H6XToBFdpZoLcA/1JgJA3Ki0FKgIQaCTA93iYQnXFKkX8+OzP1p9ob2GC3
=I/5l
-----END PGP SIGNATURE-----

--4Ckj6UjgE2iN1+kY--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3YuyVoVrStnLgh/yaJvBsNy18nw>