Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 13 Dec 2012 06:31:32 +0200
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Rick Macklem <rmacklem@uoguelph.ca>
Cc:        Tim Kientzle <kientzle@freebsd.org>, freebsd-current Current <freebsd-current@freebsd.org>
Subject:   Re: r244036 kernel hangs under load.
Message-ID:  <20121213043132.GB71906@kib.kiev.ua>
In-Reply-To: <1645494460.1368157.1355367699558.JavaMail.root@erie.cs.uoguelph.ca>
References:  <20121212043520.GF3013@kib.kiev.ua> <1645494460.1368157.1355367699558.JavaMail.root@erie.cs.uoguelph.ca>

next in thread | previous in thread | raw e-mail | index | archive | help

--s2ZSL+KKDSLx8OML
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Dec 12, 2012 at 10:01:39PM -0500, Rick Macklem wrote:
> Konstantin Belousov wrote:
> > On Tue, Dec 11, 2012 at 08:58:47PM -0500, Rick Macklem wrote:
> > > Ok, I'll test r243598 and then r243599 and r243835, just to
> > > see if it really is this.
> > >
> > > I'll email when I have done this.
> > If you test only r243598, I am sure that you would experience
> > corruption.
> > The r243599 should cause the deadlocks.
> >=20
> I think you meant r243599 will result in corruptions and
> r243835 deadlocks.
>=20
> I have run r243598 for a while without a hang. (r243599 doesn't
> build) I haven't tried r243835 yet.
>=20
> > >
> > > > >
> > > > >  Also, do you use the post-r244095 kernel ?
> > > > >
> > > > > Before and after. The most recent tests were post-r244095.
> > > > > (If anything the more recent kernels hang more easily.)
> > > > >
> > > > >
> > > > > >
> > > > > > Is your machine SMP ?
> > > > >
> > > > > Old, slow single core i386.
> > > >
> > > > Try this. Please note that this is mostly a debugging facility.
> > > >
> > > It seemed to help, but didn't stop the hangs completely.
> > > r244125 without the patch would hang somewhere in a kernel
> > > build. r244125 plus this patch ran almost 2 kernel builds
> > > before it got hung.
> >=20
> > Can you try to look into the system state on the hang (on the kernel
> > with the if (1 || patch applied) ? Using the ddb and recipe from the
> > web page. Basically watch for a thread looping in the mnt_active
> > iterator and threads owning vnode interlocks.
>=20
> Ok, there is only one process in the mnt_active iterator and its
> trace is as follows (syncer):
> dle+0x12d/frame 0xdfe33adc             (I suspect the screen lost an 'I')
> intr_execute_handlers(c5e3d064,dfe33b20,0,dfe33b64,c0ec2115,...) at intr_=
execute_handlers+0x49/frame 0xdfe33afc
> lapic_handle_intr(3c,dfe33b20) at lapic_handle_intr+0x36/frame 0xdfe33b10
> Xapic_isr1() at Xapic_isr1+0x35/frame 0xdfe33b10
> --- interrupt, eip =3D 0xc0eca8db, esp =3D 0xdfe33b60, ebp =3D 0xdfe33b64=
 ---
> spinlock_exit(c128be90,4,c10b5017,130,1710,...) at spinlock_exit+0x2b/fra=
me 0xdfe33b64
> __mtx_unlock_spin_flags(c128be90,0,c10b80be,25d,0,...) at __mtx_unlock_sp=
in_flags+0x112/frame 0xdfe33b90
> kern_yield(ffffffff,0,c10c75c9,127b,c8b05238,...) at kern_yield+0x125/fra=
me 0xdfe33bbc
> __mnt_vnode_next_active(dfe33c08,c857ba80,c10c75c9,dac,5d7,...) at __mnt_=
vnode_next_active+0xda/frame 0xdfe33be0
> vfs_msync(c857ba80,2,2,e6b,c857ba80,...) at vfs_msync+0x175/frame 0xdfe33=
c18
> sync_fsync(dfe33ca8,c85cf470,80400,c10c75c9,6f4,...) at sync_fsync+0x141/=
frame 0xdfe33c64
> VOP_FSYNC_APV(c12008a0,dfe33ca8,c10c75c9,6f4,4e20,...) at VOP_FSYNC_APV+0=
xb4/frame 0xdfe33c64
> sched_sync(0,dfe33d08,c10b0e10,3db,c85395a0,...) at sched_sync+0x399/fram=
e 0xdfe33cc8
> fork_exit(c0b79420,0,dfe33d08) at fork_exit+0xc0/frame 0xdfe33cf4
> fork_trampoline() at fork_trampoline+0x8/frame 0xdfe33cf4
> --- trap 0, eip =3D 0, esp =3D 0xdfe33d40, ebp =3D 0 ---
>=20
> This process holds:
> exclusive lockmgr syncer (syncer) r =3D 0 (0xc85cf4c8) locked @ kern/vfs_=
subr.c:1780
>=20
> The only other process that is doing anything in the VFS subsystem
> holds the vnode interlock. It's trace is:
> dle+0x12d/frame 0xdfe6a850
> intr_execute_handlers(c5f721c0,dfe6a894,0,dfe6a908,c0ec2115,...) at intr_=
execute_handlers+0x49/frame 0xdfe6a870
> lapic_handle_intr(31,dfe6a894) at lapic_handle_intr+0x36/frame 0xdfe6a884
> Xapic_isr1() at Xapic_isr1+0x35/frame 0xdfe6a884
> --- interrupt, eip =3D 0xc0b2206a, esp =3D 0xdfe6a8d4, ebp =3D 0xdfe6a908=
 ---
> witness_unlock(c8972a74,8,c10c75c9,965,0,...) at witness_unlock+0x3a/fram=
e 0xdfe6a908
> __mtx_unlock_flags(c8972a84,0,c10c75c9,965,c89729fc,...) at __mtx_unlock_=
flags+0x9f/frame 0xdfe6a938
> vdropl(c89729fc,dfe6a974,c10c75c9,8e7,c1238020,...) at vdropl+0x63/frame =
0xdfe6a95c
> vputx(dfe6aa04,c0b67acc,c89729fc,dfe6a9e4,dfe6abc4,...) at vputx+0x300/fr=
ame 0xdfe6a994
> vput(c89729fc,dfe6a9e4,dfe6abc4,31d,dfe6a9e4,...) at vput+0x10/frame 0xdf=
e6a99c
> lookup(dfe6ab84,c857e000,0,ce,c13c83c8,...) at lookup+0x9bc/frame 0xdfe6a=
a04
> namei(dfe6ab84,0,0,246,0,...) at namei+0x4fe/frame 0xdfe6aa80
> vn_open_cred(dfe6ab84,dfe6ac24,1a4,0,c5dd4580,...) at vn_open_cred+0x2c0/=
frame 0xdfe6ab40
> vn_open(dfe6ab84,dfe6ac24,1a4,c85922a0,c853a2d0,...) at vn_open+0x3b/fram=
e 0xdfe6ab60
> kern_openat(c85c55e0,ffffff9c,2882dcc0,0,8001,...) at kern_openat+0x1e2/f=
rame 0xdfe6ac0c
> kern_open(c85c55e0,2882dcc0,0,8000,1b6,...) at kern_open+0x35/frame 0xdfe=
6ac2c
> sys_open(c85c55e0,dfe6accc,c02acde7,7307f55d,5e5b00,...) at sys_open+0x30=
/frame 0xdfe6ac48
> syscall(dfe6ad08) at syscall+0x2e5/frame 0xdfe6acfc
> Xint0x80_syscall() at Xint0x80_syscall+0x21/frame 0xdfe6acfc
> --- syscall (5, FreeBSD ELF32, sys_open), eip =3D 0x84a1667, esp =3D 0xbf=
bfcffc, ebp =3D 0xbfbfd018 ---
>=20
> The locks this process holds are:
> exclusive sleep mutex vnode interlock (vnode interlock) r =3D 0 (0x8972a7=
4) locked @ kern/vfs_subr.c:2513
> shared lockmgr ufs (ufs) r =3D 0 (0xc8bd181c) locked @ kern/vfs_subr.c:21=
61
>=20
> The only other lock held by any thread/process is:
> Process 12 (intr) thread 0xc5dfc5e0 (100012)
> exclusive sleep mutex Giant (Giant) r =3D 1 (0xc127b690) locked @ dev/sys=
cons/syscons.c:724
>=20
> The only 2 locked vnodes are the ufs one and the syncer one, as
> shown above.
>=20
> The rest of the processes/threads don't hold any locks and don't seem
> to be in vfs code.
>=20
> db> show pcpu (this machine is single core)
> cpuid =3D 0
> dynamic pcpu =3D 0x5e5b00
> curthread =3D 0xc5dfc5e0: pid 12 "swi6: Giant taskq"
> curpcb =3D 0xc592fd60
> fpcurthread =3D none
> idlethread =3D 0xc5dfb8d0: tid 100003 "idle: cpu0"
> APIC ID =3D 0
> currentldt =3D 0x50
> spin locks held:
> db>
>=20
> When I do a "db> cont", wait a while and then go
> back into the debugger with <ctl><alt><esc>, everything
> looks the same.
>=20
> I'll leave this machine like this, in case you want me
> to do anything else in the debugger. The kernel is
> r244125 + your little patch and it took about 5 kernel
> build cycles to get it hung. (Without your little
> patch, it usually hangs in the first kernel build cycle.)
>=20
> Do you happen to know how I can tell where it was executing
> when I <ctl><alt><esc>? (show registers just has the eip in
> kdb_enter)
> Can I work up a the stack referenced by esp and find a saved
> eip?
The thread that was executing is shown as curthread from 'show pcpu'.
Was it taskq always, or was taskq only sporadic, and syncer process
rest of the time ?

FWIW, you could try this patch too, at least I have a theory now why
the if (1) patch did not helped. My guess is that kern_yield() effectively
returned the same high-priority syncer process back to the CPU.

diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c
index 67e078d..2325201 100644
--- a/sys/kern/vfs_subr.c
+++ b/sys/kern/vfs_subr.c
@@ -4710,32 +4710,54 @@ __mnt_vnode_markerfree_all(struct vnode **mvp, stru=
ct mount *mp)
  * These are helper functions for filesystems to traverse their
  * active vnodes.  See MNT_VNODE_FOREACH_ACTIVE() in sys/mount.h
  */
-struct vnode *
-__mnt_vnode_next_active(struct vnode **mvp, struct mount *mp)
+static void
+mnt_vnode_markerfree_active(struct vnode **mvp, struct mount *mp)
+{
+
+	KASSERT((*mvp)->v_mount =3D=3D mp, ("marker vnode mount list mismatch"));
+
+	MNT_ILOCK(mp);
+	MNT_REL(mp);
+	MNT_IUNLOCK(mp);
+	free(*mvp, M_VNODE_MARKER);
+	*mvp =3D NULL;
+}
+
+#ifdef SMP
+#define	ALWAYS_YIELD	(mp_ncpus =3D=3D 1)
+#else
+#define	ALWAYS_YIELD	1
+#endif
+
+static struct vnode *
+mnt_vnode_next_active(struct vnode **mvp, struct mount *mp)
 {
 	struct vnode *vp, *nvp;
=20
-	if (should_yield())
-		kern_yield(PRI_UNCHANGED);
-	mtx_lock(&vnode_free_list_mtx);
-restart:
+	mtx_assert(&vnode_free_list_mtx, MA_OWNED);
 	KASSERT((*mvp)->v_mount =3D=3D mp, ("marker vnode mount list mismatch"));
+restart:
 	vp =3D TAILQ_NEXT(*mvp, v_actfreelist);
+	TAILQ_REMOVE(&mp->mnt_activevnodelist, *mvp, v_actfreelist);
 	while (vp !=3D NULL) {
 		if (vp->v_type =3D=3D VMARKER) {
 			vp =3D TAILQ_NEXT(vp, v_actfreelist);
 			continue;
 		}
 		if (!VI_TRYLOCK(vp)) {
-			if (should_yield()) {
+			if (ALWAYS_YIELD || should_yield()) {
+				TAILQ_INSERT_BEFORE(vp, *mvp, v_actfreelist);
 				mtx_unlock(&vnode_free_list_mtx);
-				kern_yield(PRI_UNCHANGED);
+				kern_yield(PRI_USER);
 				mtx_lock(&vnode_free_list_mtx);
+				goto restart;
 			}
-			goto restart;
+			continue;
 		}
-		if (vp->v_mount =3D=3D mp && vp->v_type !=3D VMARKER &&
-		    (vp->v_iflag & VI_DOOMED) =3D=3D 0)
+		KASSERT(vp->v_type !=3D VMARKER, ("locked marker %p", vp));
+		KASSERT(vp->v_mount =3D=3D mp || vp->v_mount =3D=3D NULL,
+		    ("alien vnode on the active list %p %p", vp, mp));
+		if (vp->v_mount =3D=3D mp && (vp->v_iflag & VI_DOOMED) =3D=3D 0)
 			break;
 		nvp =3D TAILQ_NEXT(vp, v_actfreelist);
 		VI_UNLOCK(vp);
@@ -4745,22 +4767,31 @@ restart:
 	/* Check if we are done */
 	if (vp =3D=3D NULL) {
 		mtx_unlock(&vnode_free_list_mtx);
-		__mnt_vnode_markerfree_active(mvp, mp);
-		mtx_assert(MNT_MTX(mp), MA_NOTOWNED);
+		mnt_vnode_markerfree_active(mvp, mp);
 		return (NULL);
 	}
-	TAILQ_REMOVE(&mp->mnt_activevnodelist, *mvp, v_actfreelist);
 	TAILQ_INSERT_AFTER(&mp->mnt_activevnodelist, vp, *mvp, v_actfreelist);
 	mtx_unlock(&vnode_free_list_mtx);
 	ASSERT_VI_LOCKED(vp, "active iter");
 	KASSERT((vp->v_iflag & VI_ACTIVE) !=3D 0, ("Non-active vp %p", vp));
 	return (vp);
 }
+#undef ALWAYS_YIELD
+
+struct vnode *
+__mnt_vnode_next_active(struct vnode **mvp, struct mount *mp)
+{
+
+	if (should_yield())
+		kern_yield(PRI_UNCHANGED);
+	mtx_lock(&vnode_free_list_mtx);
+	return (mnt_vnode_next_active(mvp, mp));
+}
=20
 struct vnode *
 __mnt_vnode_first_active(struct vnode **mvp, struct mount *mp)
 {
-	struct vnode *vp, *nvp;
+	struct vnode *vp;
=20
 	*mvp =3D malloc(sizeof(struct vnode), M_VNODE_MARKER, M_WAITOK | M_ZERO);
 	MNT_ILOCK(mp);
@@ -4770,44 +4801,14 @@ __mnt_vnode_first_active(struct vnode **mvp, struct=
 mount *mp)
 	(*mvp)->v_mount =3D mp;
=20
 	mtx_lock(&vnode_free_list_mtx);
-restart:
 	vp =3D TAILQ_FIRST(&mp->mnt_activevnodelist);
-	while (vp !=3D NULL) {
-		if (vp->v_type =3D=3D VMARKER) {
-			vp =3D TAILQ_NEXT(vp, v_actfreelist);
-			continue;
-		}
-		if (!VI_TRYLOCK(vp)) {
-			if (should_yield()) {
-				mtx_unlock(&vnode_free_list_mtx);
-				kern_yield(PRI_UNCHANGED);
-				mtx_lock(&vnode_free_list_mtx);
-			}
-			goto restart;
-		}
-		if (vp->v_mount =3D=3D mp && vp->v_type !=3D VMARKER &&
-		    (vp->v_iflag & VI_DOOMED) =3D=3D 0)
-			break;
-		nvp =3D TAILQ_NEXT(vp, v_actfreelist);
-		VI_UNLOCK(vp);
-		vp =3D nvp;
-	}
-
-	/* Check if we are done */
 	if (vp =3D=3D NULL) {
 		mtx_unlock(&vnode_free_list_mtx);
-		MNT_ILOCK(mp);
-		MNT_REL(mp);
-		MNT_IUNLOCK(mp);
-		free(*mvp, M_VNODE_MARKER);
-		*mvp =3D NULL;
+		mnt_vnode_markerfree_active(mvp, mp);
 		return (NULL);
 	}
-	TAILQ_INSERT_AFTER(&mp->mnt_activevnodelist, vp, *mvp, v_actfreelist);
-	mtx_unlock(&vnode_free_list_mtx);
-	ASSERT_VI_LOCKED(vp, "active iter first");
-	KASSERT((vp->v_iflag & VI_ACTIVE) !=3D 0, ("Non-active vp %p", vp));
-	return (vp);
+	TAILQ_INSERT_BEFORE(vp, *mvp, v_actfreelist);
+	return (mnt_vnode_next_active(mvp, mp));
 }
=20
 void
@@ -4817,14 +4818,8 @@ __mnt_vnode_markerfree_active(struct vnode **mvp, st=
ruct mount *mp)
 	if (*mvp =3D=3D NULL)
 		return;
=20
-	KASSERT((*mvp)->v_mount =3D=3D mp, ("marker vnode mount list mismatch"));
-
 	mtx_lock(&vnode_free_list_mtx);
 	TAILQ_REMOVE(&mp->mnt_activevnodelist, *mvp, v_actfreelist);
 	mtx_unlock(&vnode_free_list_mtx);
-	MNT_ILOCK(mp);
-	MNT_REL(mp);
-	MNT_IUNLOCK(mp);
-	free(*mvp, M_VNODE_MARKER);
-	*mvp =3D NULL;
+	mnt_vnode_markerfree_active(mvp, mp);
 }

--s2ZSL+KKDSLx8OML
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)

iEYEARECAAYFAlDJWiMACgkQC3+MBN1Mb4iiZgCg9SBNPgZcGW9uqDB/xAzIvfRf
uYcAoMadJl1T2CF2zJvIsperxCrOOL6q
=oTXG
-----END PGP SIGNATURE-----

--s2ZSL+KKDSLx8OML--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20121213043132.GB71906>