From owner-freebsd-fs@FreeBSD.ORG Wed Jan 18 18:58:06 2012 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1F3601065706; Wed, 18 Jan 2012 18:58:06 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 7E6D78FC17; Wed, 18 Jan 2012 18:58:04 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q0IIvxNo091473 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 18 Jan 2012 20:57:59 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q0IIvxDi025865; Wed, 18 Jan 2012 20:57:59 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q0IIvw5X025864; Wed, 18 Jan 2012 20:57:58 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 18 Jan 2012 20:57:58 +0200 From: Kostik Belousov To: Eygene Ryabinkin Message-ID: <20120118185758.GZ31224@deviant.kiev.zoral.com.ua> References: <20120117202853.19F65DA81C@void.codelabs.ru> <20120118041439.GU31224@deviant.kiev.zoral.com.ua> <3YuyVoVrStnLgh/yaJvBsNy18nw@HbohoBmewgxm0atwUoKO7zhAAgw> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="UE3U/UUWprt+cWGw" Content-Disposition: inline In-Reply-To: <3YuyVoVrStnLgh/yaJvBsNy18nw@HbohoBmewgxm0atwUoKO7zhAAgw> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-3.9 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: fs@freebsd.org Subject: Re: kern/164261: [patch] fix panic with NFS served from NULLFS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Jan 2012 18:58:06 -0000 --UE3U/UUWprt+cWGw Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Jan 18, 2012 at 02:07:27PM +0400, Eygene Ryabinkin wrote: > Konstantin, good day. >=20 > Wed, Jan 18, 2012 at 06:14:39AM +0200, Kostik Belousov wrote: > > On Wed, Jan 18, 2012 at 12:28:53AM +0400, Eygene Ryabinkin wrote: > > > Patches > > > http://codelabs.ru/fbsd/prs/2012-jan-nullfs-LK_SHARED/0001-NULLFS-p= roperly-destroy-node-hash.patch > > This one is probably fine, assuming that the hashes are properly cleared > > on unmount. Feel free to commit. >=20 > Will do, thanks! >=20 > > > and > > > http://codelabs.ru/fbsd/prs/2012-jan-nullfs-LK_SHARED/0002-NULLFS-f= ix-panics-when-lowervp-is-locked-with-LK_SHA.patch > > > will fix the problem (in reality, the first patch is just some > > > nitpicking). > > And I do not even read this. > >=20 > > The issue that the backtrace is pointing to seems to be the misuse of v= rele(), > > after the vnode lock is switched to null vnode v_lock. Since the vnode = that > > is being thrown out is exclusively locked, cleanup path shall do vput() > > instead of vrele(). >=20 > The short story: at this vrele, vp had returned to its own v_lock as v_vn= lock, > so it is unlocked here. >=20 >=20 > The long story with some thoughts and questions. If vput() path will > end up in null_reclaim(), this seems to be unhelpful: >=20 > - VOP_RECLAIM() expects exclusively locked vnode, since it was instructed > to by vnode_if.src and thus vnode_if.c has ASSERT_VOP_ELOCKED(a->a_vp) > in VOP_RECLAIM_APV(); for nullfs vop_islocked is vop_stdislocked() and > it checks the lock status of v_vnlock, so anything that comes to null_= reclaim > will be exclusively locked with *v_vnlock; >=20 > - null_reclaim() has the following code, > {{{ > struct vnode *vp =3D ap->a_vp; > [...] > lockmgr(&vp->v_lock, LK_EXCLUSIVE, NULL) > [...]] > }}} > And when vp->v_lock is equal to *v_vnlock, this will lead to the > lockmgr panic, because the thread tries to exclusively lock the > object that was already locked by itself and has no recursion rights. >=20 > If anyone sees flaws in this explanations, please, point me to them. Ok, real flaw there is the attempt to treat half-constructed vnode as the fully-constructed one, later. It shall be decommissioned with the same code as does the insmntque1(). The complication is the fact that the vnode can be found on the mount list, but we only search for a vnode by hash. >=20 >=20 > I had recompiled the kernel with your vrele -> vput change inside > null_reclaim and with DEBUG_VFS_LOCKS: it resulted in the lock violation > from insmntque, > http://codelabs.ru/fbsd/prs/2012-jan-nullfs-LK_SHARED/DEBUG_VFS_LOCKS-p= anic.txt > So, it hadn't got to the point where null_reclaim() will come to the game. >=20 > The problem is that insmntque1() wants the passed vnode to be exclusively > locked, but nfsrvd_readdirplus() uses LK_SHARED. >=20 > By the way, the ASSERT_VOP_ELOCKED() was introduced in r182364 by you: > why do you insist on exclusive lock for MP-safe fs? The current interplay > of NFS and NULLFS makes me think that either some of filesystems isn't re= ally > MP-safe, or the requirement for exclusive locking can be relaxed. insmntque1() requires the exclusively locked vnode, because the function modifies the vnode (it inserts the vnode into mount list). nfsd is right there, nullfs is not. The filesystem shall ensure the proper locking if the requested mode is not strong enough. See how the UFS treats the lock flags if ffs_vgetf(): shared is only honored if the vnode is found in hash. So this is another bug, nullfs must switch to exclusive lock there. diff --git a/sys/fs/nullfs/null_subr.c b/sys/fs/nullfs/null_subr.c index 319e404..dd4ab61 100644 --- a/sys/fs/nullfs/null_subr.c +++ b/sys/fs/nullfs/null_subr.c @@ -169,17 +169,26 @@ null_hashins(mp, xp) } =20 static void -null_insmntque_dtr(struct vnode *vp, void *xp) +null_destroy_proto(struct vnode *vp, void *xp) { =20 - vput(((struct null_node *)xp)->null_lowervp); + VI_LOCK(vp); vp->v_data =3D NULL; vp->v_vnlock =3D &vp->v_lock; - free(xp, M_NULLFSNODE); vp->v_op =3D &dead_vnodeops; + VI_UNLOCK(vp); (void) vn_lock(vp, LK_EXCLUSIVE | LK_RETRY); vgone(vp); vput(vp); + free(xp, M_NULLFSNODE); +} + +static void +null_insmntque_dtr(struct vnode *vp, void *xp) +{ + + vput(((struct null_node *)xp)->null_lowervp); + null_destroy_proto(vp, xp); } =20 /* @@ -250,9 +259,7 @@ null_nodeget(mp, lowervp, vpp) *vpp =3D null_hashins(mp, xp); if (*vpp !=3D NULL) { vrele(lowervp); - vp->v_vnlock =3D &vp->v_lock; - xp->null_lowervp =3D NULL; - vrele(vp); + null_destroy_proto(vp, xp); return (0); } *vpp =3D vp; diff --git a/sys/fs/nullfs/null_vfsops.c b/sys/fs/nullfs/null_vfsops.c index cf3176f..d39926f 100644 --- a/sys/fs/nullfs/null_vfsops.c +++ b/sys/fs/nullfs/null_vfsops.c @@ -307,6 +307,12 @@ nullfs_vget(mp, ino, flags, vpp) struct vnode **vpp; { int error; + + KASSERT((flags & LK_TYPE_MASK) !=3D 0, + ("nullfs_vget: no lock requested")); + flags &=3D ~LK_TYPE_MASK; + flags |=3D LK_EXCLUSIVE; + error =3D VFS_VGET(MOUNTTONULLMOUNT(mp)->nullm_vfs, ino, flags, vpp); if (error) return (error); diff --git a/sys/fs/nullfs/null_vnops.c b/sys/fs/nullfs/null_vnops.c index e0645bd..b607666 100644 --- a/sys/fs/nullfs/null_vnops.c +++ b/sys/fs/nullfs/null_vnops.c @@ -697,12 +697,18 @@ null_inactive(struct vop_inactive_args *ap) static int null_reclaim(struct vop_reclaim_args *ap) { - struct vnode *vp =3D ap->a_vp; - struct null_node *xp =3D VTONULL(vp); - struct vnode *lowervp =3D xp->null_lowervp; + struct vnode *vp; + struct null_node *xp; + struct vnode *lowervp; + + vp =3D ap->a_vp; + xp =3D VTONULL(vp); + lowervp =3D xp->null_lowervp; + + KASSERT(lowervp !=3D NULL && vp->v_vnlock !=3D &vp->v_lock, + ("Reclaiming inclomplete null vnode %p", vp)); =20 - if (lowervp) - null_hashrem(xp); + null_hashrem(xp); /* * Use the interlock to protect the clearing of v_data to * prevent faults in null_lock(). @@ -713,10 +719,7 @@ null_reclaim(struct vop_reclaim_args *ap) vp->v_object =3D NULL; vp->v_vnlock =3D &vp->v_lock; VI_UNLOCK(vp); - if (lowervp) - vput(lowervp); - else - panic("null_reclaim: reclaiming a node with no lowervp"); + vput(lowervp); free(xp, M_NULLFSNODE); =20 return (0); --UE3U/UUWprt+cWGw Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iEYEARECAAYFAk8XFjYACgkQC3+MBN1Mb4hZ0QCgomQ64o9FleVDK778Mb4MA3dz LgwAnjFKq0NM2K8tqujf4zIlUMgxpxnU =nbAC -----END PGP SIGNATURE----- --UE3U/UUWprt+cWGw--