Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 8 Jul 2013 08:43:01 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Andreas Longwitz <longwitz@incore.de>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: Shutdown hangs on unmount of a gjournaled file system in 8-Stable
Message-ID:  <20130708054301.GI91021@kib.kiev.ua>
In-Reply-To: <51D9EB23.4070505@incore.de>
References:  <51D9EB23.4070505@incore.de>

next in thread | previous in thread | raw e-mail | index | archive | help

--HmXnia/qGvWh3jry
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Jul 08, 2013 at 12:26:43AM +0200, Andreas Longwitz wrote:
> The deadlock can be explained now: pid 1 (init) sleeps on "mount drain"
> because mp->mnt_lockref was 1. This setting was done by pid 18 (gjournal
> switcher) by calling vfs_busy(). pid 18 now sleeps on "suspwt" because
> mp->mnt_writeopcount was 1. This setting was done by pid 1 before going
> to sleep by calling vn_start_write() in dounmount().
>=20
> I think the reason for this deadlock is the commit r249055 which seems
> not to be compatible with gjournal.
Thank you for the analysis. I think 'not compatible' is some
understatement. The situation clearly causes a deadlock, you are right.

The vfs_busy(); vfs_write_suspend(); call sequence is somewhat dubious,
in fact, exactly because unmount could start in between. I think that
vfs_write_suspend() must avoid setting MNT_SUSPEND if unmount was
started. Patch below, for HEAD, should fix the problem, by marking the
callers of vfs_write_suspend(), which are not protected by the covered
vnode lock, with the VS_SKIP_UNMOUNT flag.

I believe that the conflicts on stable/8 should be trivial, if any.

diff --git a/sys/geom/journal/g_journal.c b/sys/geom/journal/g_journal.c
index a3c996c..3ce2785 100644
--- a/sys/geom/journal/g_journal.c
+++ b/sys/geom/journal/g_journal.c
@@ -2960,7 +2960,7 @@ g_journal_do_switch(struct g_class *classp)
 		GJ_TIMER_STOP(1, &bt, "BIO_FLUSH time of %s", sc->sc_name);
=20
 		GJ_TIMER_START(1, &bt);
-		error =3D vfs_write_suspend(mp);
+		error =3D vfs_write_suspend(mp, VS_SKIP_UNMOUNT);
 		GJ_TIMER_STOP(1, &bt, "Suspend time of %s", mountpoint);
 		if (error !=3D 0) {
 			GJ_DEBUG(0, "Cannot suspend file system %s (error=3D%d).",
diff --git a/sys/kern/vfs_vnops.c b/sys/kern/vfs_vnops.c
index 7eac0ef..06e59f9 100644
--- a/sys/kern/vfs_vnops.c
+++ b/sys/kern/vfs_vnops.c
@@ -1668,8 +1668,7 @@ vn_finished_secondary_write(mp)
  * Request a filesystem to suspend write operations.
  */
 int
-vfs_write_suspend(mp)
-	struct mount *mp;
+vfs_write_suspend(struct mount *mp, int flags)
 {
 	int error;
=20
@@ -1680,6 +1679,21 @@ vfs_write_suspend(mp)
 	}
 	while (mp->mnt_kern_flag & MNTK_SUSPEND)
 		msleep(&mp->mnt_flag, MNT_MTX(mp), PUSER - 1, "wsuspfs", 0);
+
+	/*
+	 * Unmount holds a write reference on the mount point.  If we
+	 * own busy reference and drain for writers, we deadlock with
+	 * the reference draining in the unmount path.  Callers of
+	 * vfs_write_suspend() must specify VS_SKIP_UNMOUNT if
+	 * vfs_busy() reference is owned and caller is not in the
+	 * unmount context.
+	 */
+	if ((flags & VS_SKIP_UNMOUNT) !=3D 0 &&
+	    (mp->mnt_kern_flag & MNTK_UNMOUNT) !=3D 0) {
+		MNT_IUNLOCK(mp);
+		return (EBUSY);
+	}
+
 	mp->mnt_kern_flag |=3D MNTK_SUSPEND;
 	mp->mnt_susp_owner =3D curthread;
 	if (mp->mnt_writeopcount > 0)
diff --git a/sys/sys/vnode.h b/sys/sys/vnode.h
index 42bfb65..b0cbcc0 100644
--- a/sys/sys/vnode.h
+++ b/sys/sys/vnode.h
@@ -398,6 +398,9 @@ extern int		vttoif_tab[];
 #define	VR_START_WRITE	0x0001	/* vfs_write_resume: start write atomically =
*/
 #define	VR_NO_SUSPCLR	0x0002	/* vfs_write_resume: do not clear suspension =
*/
=20
+#define	VS_SKIP_UNMOUNT	0x0001	/* vfs_write_suspend: fail if the
+				   filesystem is being unmounted */
+
 #define	VREF(vp)	vref(vp)
=20
 #ifdef DIAGNOSTIC
@@ -711,7 +714,7 @@ int	vn_io_fault_pgmove(vm_page_t ma[], vm_offset_t offs=
et, int xfersize,
 int	vfs_cache_lookup(struct vop_lookup_args *ap);
 void	vfs_timestamp(struct timespec *);
 void	vfs_write_resume(struct mount *mp, int flags);
-int	vfs_write_suspend(struct mount *mp);
+int	vfs_write_suspend(struct mount *mp, int flags);
 int	vop_stdbmap(struct vop_bmap_args *);
 int	vop_stdfsync(struct vop_fsync_args *);
 int	vop_stdgetwritemount(struct vop_getwritemount_args *);
diff --git a/sys/ufs/ffs/ffs_snapshot.c b/sys/ufs/ffs/ffs_snapshot.c
index 9a9c88a..ad157aa 100644
--- a/sys/ufs/ffs/ffs_snapshot.c
+++ b/sys/ufs/ffs/ffs_snapshot.c
@@ -423,7 +423,7 @@ restart:
 	 */
 	for (;;) {
 		vn_finished_write(wrtmp);
-		if ((error =3D vfs_write_suspend(vp->v_mount)) !=3D 0) {
+		if ((error =3D vfs_write_suspend(vp->v_mount, 0)) !=3D 0) {
 			vn_start_write(NULL, &wrtmp, V_WAIT);
 			vn_lock(vp, LK_EXCLUSIVE | LK_RETRY);
 			goto out;
diff --git a/sys/ufs/ffs/ffs_suspend.c b/sys/ufs/ffs/ffs_suspend.c
index 3198c1a..a8c4578 100644
--- a/sys/ufs/ffs/ffs_suspend.c
+++ b/sys/ufs/ffs/ffs_suspend.c
@@ -206,7 +206,7 @@ ffs_susp_suspend(struct mount *mp)
 		return (EPERM);
 #endif
=20
-	if ((error =3D vfs_write_suspend(mp)) !=3D 0)
+	if ((error =3D vfs_write_suspend(mp, VS_SKIP_UNMOUNT)) !=3D 0)
 		return (error);
=20
 	ump->um_writesuspended =3D 1;
diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c
index 57f092c..a87fdfa 100644
--- a/sys/ufs/ffs/ffs_vfsops.c
+++ b/sys/ufs/ffs/ffs_vfsops.c
@@ -257,7 +257,7 @@ ffs_mount(struct mount *mp)
 				return (error);
 			for (;;) {
 				vn_finished_write(mp);
-				if ((error =3D vfs_write_suspend(mp)) !=3D 0)
+				if ((error =3D vfs_write_suspend(mp, 0)) !=3D 0)
 					return (error);
 				MNT_ILOCK(mp);
 				if (mp->mnt_kern_flag & MNTK_SUSPENDED) {
@@ -1255,7 +1255,7 @@ ffs_unmount(mp, mntflags)
 		 */
 		for (;;) {
 			vn_finished_write(mp);
-			if ((error =3D vfs_write_suspend(mp)) !=3D 0)
+			if ((error =3D vfs_write_suspend(mp, 0)) !=3D 0)
 				return (error);
 			MNT_ILOCK(mp);
 			if (mp->mnt_kern_flag & MNTK_SUSPENDED) {

--HmXnia/qGvWh3jry
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.20 (FreeBSD)

iQIcBAEBAgAGBQJR2lFlAAoJEJDCuSvBvK1Bs/wQAIsAa78Lxaj34I1LNgp01SAS
E2xWUTHvCSEVQIUdMbopqtHagjEgMaIj2mJctYkBMOF6z5QkUh8wdeqq8KeI+EJh
osBQEXZ7zUiXGN2o6PoSQP7+FRXakVX3r40jCzV0H5lvcpJ74I/lYuLz818swksf
ufXMUxqsd6mQ6NOaOvwgt5ze6LW6y4aQ2h9GWXyeHey8a7Tg0NKcbqHoId+tDkIu
hamLTkz/72OFnbvHlER5L5L0KSglmAk5aeuzySoBUk5g/SGFSDgHvdvAzZamdwsE
zfZ0Ec2D+thooInS3yyq+gI0B2e9sxpWGekXS7s+S63GRA/0IITHBo0vpU8mPqUS
3JmecCvgsNGi43o/ddHAdyBH9nSw6Ew9RdCmuNk2P85LvAgBobCFkFlYdzI5V5cj
FugG4FtKgv79wkYYcWB1FfeDFVOYs0+YeQME0uZNSyCJtpZaGNX0PT+o46hszQ2b
IO+Oxnko5vSrfI6IzDOLEk0g6BsZe84YnILo1auLsQQGO6G9jlzp6vhWu+IZJKib
pos4quRlvfkvkEbcIx4+9oKuHIbPv6uCHG3nQwpS+mIAAaSn+82LSiO0bFJ+wEzc
U+QrwOAwMWdTLcusrziCHmFvU+mHDy4yvnzlWNTAt8zE5Nyso11HOnmHhIH+9rsZ
oMojlqI9V98Ew8nEp4LS
=fUQf
-----END PGP SIGNATURE-----

--HmXnia/qGvWh3jry--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130708054301.GI91021>