Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 13 Jun 2010 17:42:40 +0300
From:      Kostik Belousov <kostikbel@gmail.com>
To:        Attilio Rao <attilio@freebsd.org>
Cc:        freebsd-fs@freebsd.org, Jaakko Heinonen <jh@freebsd.org>
Subject:   Re: syncer vnode leak because of nmount() race
Message-ID:  <20100613144240.GV13238@deviant.kiev.zoral.com.ua>
In-Reply-To: <AANLkTimuJJcZ0D1TMDvTHjpb3advHetSB0aJw2IevSC1@mail.gmail.com>
References:  <20100603143501.GA3176@a91-153-117-195.elisa-laajakaista.fi> <AANLkTimuJJcZ0D1TMDvTHjpb3advHetSB0aJw2IevSC1@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--3jK+0sHr6j/jwA0V
Content-Type: text/plain; charset=koi8-r
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Jun 04, 2010 at 05:00:49PM +0200, Attilio Rao wrote:
> 2010/6/3 Jaakko Heinonen <jh@freebsd.org>:
> >
> > Hi,
> >
> > I have been playing with devfs and stress2 recently and I was able to
> > trigger a syncer vnode leak with devfs.sh. As far as I can tell it is a
> > nmount(2) (initial) mount race against update mount. The code in
> > vfs_domount() is protected with vfs_busy()/vfs_unbusy() but it doesn't
> > protect against update mounts. The protection by Giant is defeated
> > because devfs_root() may sleep due to sx_xlock(). vfs_domount() calls
> > VFS_ROOT() before allocating the syncer vnode. VI_MOUNT vnode flag
> > doesn't provide sufficient protection against update mounts either.
>=20
> Thanks for this bug report.
> I think that, luckilly, it is not a very common condition to have the
> mount still in flight and get updates... :)
> However, I think that the real breakage here is that the check on
> mnt->mnt_syncer is done lockless and it is unsafe. It might really be
> protected by the mount interlock but that's tricky because
> vfs_allocate_syncvnode() sleeps. Also re-checking the condition (after
> the allocation takes place) is not too simple here.
Is it  ? I think that the patch below would fix the issue, by
syncronizing mnt_syncer by syncer mutex.

On the other hand, I prefer the jh' patch, because it seemingly disallows
parallel updates of the mount point. I believe that mp->mnt_vnodecovered
should be stable after the call to vfs_busy() succeeded.


> I found also this bug when rewriting the syncer and I resolved that by
> using a separate flag for that (in my case it was simpler and more
> beneficial actually for some other reasons, but you may do the same
> thing with a mnt_kern_flag entry).
> If you have no time I can work actively on the patch, even if I'm
> confident, luckilly, this problem is very hard to happen in the real
> life.
>=20
> Additively, note that vfs_busy() here is not intended to protect
> against such situations but against unmount.
> Also, I'm very unsure about what Giant protection might bring here.
> IMHO we might probabilly remove it at all as long as all the sleeping
> point present make it completely unuseful if anything (and I don't see
> a reason why it may be needed).
>=20
> > PS. vfs_unbusy(9) manual page is out of date after r184554 and IMO
> > =9A =9Avfs_busy(9) manual page is misleading because it talks about
> > =9A =9Asynchronizing access to a mount point.
>=20
> May you be more precise on what is misleading please?

diff --git a/sys/kern/vfs_mount.c b/sys/kern/vfs_mount.c
index 088d939..053bd68 100644
--- a/sys/kern/vfs_mount.c
+++ b/sys/kern/vfs_mount.c
@@ -1034,14 +1034,10 @@ vfs_domount(
 		else
 			mp->mnt_kern_flag &=3D ~MNTK_ASYNC;
 		MNT_IUNLOCK(mp);
-		if ((mp->mnt_flag & MNT_RDONLY) =3D=3D 0) {
-			if (mp->mnt_syncer =3D=3D NULL)
-				error =3D vfs_allocate_syncvnode(mp);
-		} else {
-			if (mp->mnt_syncer !=3D NULL)
-				vrele(mp->mnt_syncer);
-			mp->mnt_syncer =3D NULL;
-		}
+		if ((mp->mnt_flag & MNT_RDONLY) =3D=3D 0)
+			error =3D vfs_allocate_syncvnode(mp);
+		else
+			vfs_deallocate_syncvnode(mp);
 		vfs_unbusy(mp);
 		VI_LOCK(vp);
 		vp->v_iflag &=3D ~VI_MOUNT;
diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c
index ff6744a..6e4bb12 100644
--- a/sys/kern/vfs_subr.c
+++ b/sys/kern/vfs_subr.c
@@ -3400,12 +3400,31 @@ vfs_allocate_syncvnode(struct mount *mp)
 	/* XXX - vn_syncer_add_to_worklist() also grabs and drops sync_mtx. */
 	mtx_lock(&sync_mtx);
 	sync_vnode_count++;
+	if (mp->mnt_syncer =3D=3D NULL) {
+		mp->mnt_syncer =3D vp;
+		vp =3D NULL;
+	}
 	mtx_unlock(&sync_mtx);
 	BO_UNLOCK(bo);
-	mp->mnt_syncer =3D vp;
+	if (vp !=3D NULL)
+		vgone(vp);
 	return (0);
 }
=20
+void
+vfs_deallocate_syncvnode(struct mount *mp)
+{
+	struct vnode *vp;
+
+	mtx_lock(&sync_mtx);
+	vp =3D mp->mnt_syncer;
+	if (vp !=3D NULL)
+		mp->mnt_syncer =3D NULL;
+	mtx_unlock(&sync_mtx);
+	if (vp !=3D NULL)
+		vrele(vp);
+}
+
 /*
  * Do a lazy sync of the filesystem.
  */
@@ -3484,15 +3503,16 @@ sync_reclaim(struct vop_reclaim_args *ap)
=20
 	bo =3D &vp->v_bufobj;
 	BO_LOCK(bo);
-	vp->v_mount->mnt_syncer =3D NULL;
+	mtx_lock(&sync_mtx);
+	if (vp->v_mount->mnt_syncer =3D=3D vp)
+		vp->v_mount->mnt_syncer =3D NULL;
 	if (bo->bo_flag & BO_ONWORKLST) {
-		mtx_lock(&sync_mtx);
 		LIST_REMOVE(bo, bo_synclist);
 		syncer_worklist_len--;
 		sync_vnode_count--;
-		mtx_unlock(&sync_mtx);
 		bo->bo_flag &=3D ~BO_ONWORKLST;
 	}
+	mtx_unlock(&sync_mtx);
 	BO_UNLOCK(bo);
=20
 	return (0);
diff --git a/sys/sys/mount.h b/sys/sys/mount.h
index 20dcf64..dcff206 100644
--- a/sys/sys/mount.h
+++ b/sys/sys/mount.h
@@ -731,6 +731,7 @@ int	vfs_busy(struct mount *, int);
 int	vfs_export			 /* process mount export info */
 	    (struct mount *, struct export_args *);
 int	vfs_allocate_syncvnode(struct mount *);
+void	vfs_deallocate_syncvnode(struct mount *);
 int	vfs_donmount(struct thread *td, int fsflags, struct uio *fsoptions);
 void	vfs_getnewfsid(struct mount *);
 struct cdev *vfs_getrootfsid(struct mount *);

--3jK+0sHr6j/jwA0V
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (FreeBSD)

iEYEARECAAYFAkwU7l8ACgkQC3+MBN1Mb4hljgCfaImXDvro2UDwP2GHdnHVvgDS
ziYAoNtaKSPCKsLL89I3OmV//8arTa9N
=BR4w
-----END PGP SIGNATURE-----

--3jK+0sHr6j/jwA0V--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100613144240.GV13238>