From owner-freebsd-current@FreeBSD.ORG Thu Dec 13 04:31:43 2012 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id F3F90689; Thu, 13 Dec 2012 04:31:42 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) by mx1.freebsd.org (Postfix) with ESMTP id C6F108FC08; Thu, 13 Dec 2012 04:31:41 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.5/8.14.5) with ESMTP id qBD4VXbR074293; Thu, 13 Dec 2012 06:31:33 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.7.1 kib.kiev.ua qBD4VXbR074293 Received: (from kostik@localhost) by tom.home (8.14.5/8.14.5/Submit) id qBD4VWLj074292; Thu, 13 Dec 2012 06:31:32 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 13 Dec 2012 06:31:32 +0200 From: Konstantin Belousov To: Rick Macklem Subject: Re: r244036 kernel hangs under load. Message-ID: <20121213043132.GB71906@kib.kiev.ua> References: <20121212043520.GF3013@kib.kiev.ua> <1645494460.1368157.1355367699558.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="s2ZSL+KKDSLx8OML" Content-Disposition: inline In-Reply-To: <1645494460.1368157.1355367699558.JavaMail.root@erie.cs.uoguelph.ca> User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: Tim Kientzle , freebsd-current Current X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Dec 2012 04:31:43 -0000 --s2ZSL+KKDSLx8OML Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Dec 12, 2012 at 10:01:39PM -0500, Rick Macklem wrote: > Konstantin Belousov wrote: > > On Tue, Dec 11, 2012 at 08:58:47PM -0500, Rick Macklem wrote: > > > Ok, I'll test r243598 and then r243599 and r243835, just to > > > see if it really is this. > > > > > > I'll email when I have done this. > > If you test only r243598, I am sure that you would experience > > corruption. > > The r243599 should cause the deadlocks. > >=20 > I think you meant r243599 will result in corruptions and > r243835 deadlocks. >=20 > I have run r243598 for a while without a hang. (r243599 doesn't > build) I haven't tried r243835 yet. >=20 > > > > > > > > > > > > > Also, do you use the post-r244095 kernel ? > > > > > > > > > > Before and after. The most recent tests were post-r244095. > > > > > (If anything the more recent kernels hang more easily.) > > > > > > > > > > > > > > > > > > > > > > Is your machine SMP ? > > > > > > > > > > Old, slow single core i386. > > > > > > > > Try this. Please note that this is mostly a debugging facility. > > > > > > > It seemed to help, but didn't stop the hangs completely. > > > r244125 without the patch would hang somewhere in a kernel > > > build. r244125 plus this patch ran almost 2 kernel builds > > > before it got hung. > >=20 > > Can you try to look into the system state on the hang (on the kernel > > with the if (1 || patch applied) ? Using the ddb and recipe from the > > web page. Basically watch for a thread looping in the mnt_active > > iterator and threads owning vnode interlocks. >=20 > Ok, there is only one process in the mnt_active iterator and its > trace is as follows (syncer): > dle+0x12d/frame 0xdfe33adc (I suspect the screen lost an 'I') > intr_execute_handlers(c5e3d064,dfe33b20,0,dfe33b64,c0ec2115,...) at intr_= execute_handlers+0x49/frame 0xdfe33afc > lapic_handle_intr(3c,dfe33b20) at lapic_handle_intr+0x36/frame 0xdfe33b10 > Xapic_isr1() at Xapic_isr1+0x35/frame 0xdfe33b10 > --- interrupt, eip =3D 0xc0eca8db, esp =3D 0xdfe33b60, ebp =3D 0xdfe33b64= --- > spinlock_exit(c128be90,4,c10b5017,130,1710,...) at spinlock_exit+0x2b/fra= me 0xdfe33b64 > __mtx_unlock_spin_flags(c128be90,0,c10b80be,25d,0,...) at __mtx_unlock_sp= in_flags+0x112/frame 0xdfe33b90 > kern_yield(ffffffff,0,c10c75c9,127b,c8b05238,...) at kern_yield+0x125/fra= me 0xdfe33bbc > __mnt_vnode_next_active(dfe33c08,c857ba80,c10c75c9,dac,5d7,...) at __mnt_= vnode_next_active+0xda/frame 0xdfe33be0 > vfs_msync(c857ba80,2,2,e6b,c857ba80,...) at vfs_msync+0x175/frame 0xdfe33= c18 > sync_fsync(dfe33ca8,c85cf470,80400,c10c75c9,6f4,...) at sync_fsync+0x141/= frame 0xdfe33c64 > VOP_FSYNC_APV(c12008a0,dfe33ca8,c10c75c9,6f4,4e20,...) at VOP_FSYNC_APV+0= xb4/frame 0xdfe33c64 > sched_sync(0,dfe33d08,c10b0e10,3db,c85395a0,...) at sched_sync+0x399/fram= e 0xdfe33cc8 > fork_exit(c0b79420,0,dfe33d08) at fork_exit+0xc0/frame 0xdfe33cf4 > fork_trampoline() at fork_trampoline+0x8/frame 0xdfe33cf4 > --- trap 0, eip =3D 0, esp =3D 0xdfe33d40, ebp =3D 0 --- >=20 > This process holds: > exclusive lockmgr syncer (syncer) r =3D 0 (0xc85cf4c8) locked @ kern/vfs_= subr.c:1780 >=20 > The only other process that is doing anything in the VFS subsystem > holds the vnode interlock. It's trace is: > dle+0x12d/frame 0xdfe6a850 > intr_execute_handlers(c5f721c0,dfe6a894,0,dfe6a908,c0ec2115,...) at intr_= execute_handlers+0x49/frame 0xdfe6a870 > lapic_handle_intr(31,dfe6a894) at lapic_handle_intr+0x36/frame 0xdfe6a884 > Xapic_isr1() at Xapic_isr1+0x35/frame 0xdfe6a884 > --- interrupt, eip =3D 0xc0b2206a, esp =3D 0xdfe6a8d4, ebp =3D 0xdfe6a908= --- > witness_unlock(c8972a74,8,c10c75c9,965,0,...) at witness_unlock+0x3a/fram= e 0xdfe6a908 > __mtx_unlock_flags(c8972a84,0,c10c75c9,965,c89729fc,...) at __mtx_unlock_= flags+0x9f/frame 0xdfe6a938 > vdropl(c89729fc,dfe6a974,c10c75c9,8e7,c1238020,...) at vdropl+0x63/frame = 0xdfe6a95c > vputx(dfe6aa04,c0b67acc,c89729fc,dfe6a9e4,dfe6abc4,...) at vputx+0x300/fr= ame 0xdfe6a994 > vput(c89729fc,dfe6a9e4,dfe6abc4,31d,dfe6a9e4,...) at vput+0x10/frame 0xdf= e6a99c > lookup(dfe6ab84,c857e000,0,ce,c13c83c8,...) at lookup+0x9bc/frame 0xdfe6a= a04 > namei(dfe6ab84,0,0,246,0,...) at namei+0x4fe/frame 0xdfe6aa80 > vn_open_cred(dfe6ab84,dfe6ac24,1a4,0,c5dd4580,...) at vn_open_cred+0x2c0/= frame 0xdfe6ab40 > vn_open(dfe6ab84,dfe6ac24,1a4,c85922a0,c853a2d0,...) at vn_open+0x3b/fram= e 0xdfe6ab60 > kern_openat(c85c55e0,ffffff9c,2882dcc0,0,8001,...) at kern_openat+0x1e2/f= rame 0xdfe6ac0c > kern_open(c85c55e0,2882dcc0,0,8000,1b6,...) at kern_open+0x35/frame 0xdfe= 6ac2c > sys_open(c85c55e0,dfe6accc,c02acde7,7307f55d,5e5b00,...) at sys_open+0x30= /frame 0xdfe6ac48 > syscall(dfe6ad08) at syscall+0x2e5/frame 0xdfe6acfc > Xint0x80_syscall() at Xint0x80_syscall+0x21/frame 0xdfe6acfc > --- syscall (5, FreeBSD ELF32, sys_open), eip =3D 0x84a1667, esp =3D 0xbf= bfcffc, ebp =3D 0xbfbfd018 --- >=20 > The locks this process holds are: > exclusive sleep mutex vnode interlock (vnode interlock) r =3D 0 (0x8972a7= 4) locked @ kern/vfs_subr.c:2513 > shared lockmgr ufs (ufs) r =3D 0 (0xc8bd181c) locked @ kern/vfs_subr.c:21= 61 >=20 > The only other lock held by any thread/process is: > Process 12 (intr) thread 0xc5dfc5e0 (100012) > exclusive sleep mutex Giant (Giant) r =3D 1 (0xc127b690) locked @ dev/sys= cons/syscons.c:724 >=20 > The only 2 locked vnodes are the ufs one and the syncer one, as > shown above. >=20 > The rest of the processes/threads don't hold any locks and don't seem > to be in vfs code. >=20 > db> show pcpu (this machine is single core) > cpuid =3D 0 > dynamic pcpu =3D 0x5e5b00 > curthread =3D 0xc5dfc5e0: pid 12 "swi6: Giant taskq" > curpcb =3D 0xc592fd60 > fpcurthread =3D none > idlethread =3D 0xc5dfb8d0: tid 100003 "idle: cpu0" > APIC ID =3D 0 > currentldt =3D 0x50 > spin locks held: > db> >=20 > When I do a "db> cont", wait a while and then go > back into the debugger with , everything > looks the same. >=20 > I'll leave this machine like this, in case you want me > to do anything else in the debugger. The kernel is > r244125 + your little patch and it took about 5 kernel > build cycles to get it hung. (Without your little > patch, it usually hangs in the first kernel build cycle.) >=20 > Do you happen to know how I can tell where it was executing > when I ? (show registers just has the eip in > kdb_enter) > Can I work up a the stack referenced by esp and find a saved > eip? The thread that was executing is shown as curthread from 'show pcpu'. Was it taskq always, or was taskq only sporadic, and syncer process rest of the time ? FWIW, you could try this patch too, at least I have a theory now why the if (1) patch did not helped. My guess is that kern_yield() effectively returned the same high-priority syncer process back to the CPU. diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c index 67e078d..2325201 100644 --- a/sys/kern/vfs_subr.c +++ b/sys/kern/vfs_subr.c @@ -4710,32 +4710,54 @@ __mnt_vnode_markerfree_all(struct vnode **mvp, stru= ct mount *mp) * These are helper functions for filesystems to traverse their * active vnodes. See MNT_VNODE_FOREACH_ACTIVE() in sys/mount.h */ -struct vnode * -__mnt_vnode_next_active(struct vnode **mvp, struct mount *mp) +static void +mnt_vnode_markerfree_active(struct vnode **mvp, struct mount *mp) +{ + + KASSERT((*mvp)->v_mount =3D=3D mp, ("marker vnode mount list mismatch")); + + MNT_ILOCK(mp); + MNT_REL(mp); + MNT_IUNLOCK(mp); + free(*mvp, M_VNODE_MARKER); + *mvp =3D NULL; +} + +#ifdef SMP +#define ALWAYS_YIELD (mp_ncpus =3D=3D 1) +#else +#define ALWAYS_YIELD 1 +#endif + +static struct vnode * +mnt_vnode_next_active(struct vnode **mvp, struct mount *mp) { struct vnode *vp, *nvp; =20 - if (should_yield()) - kern_yield(PRI_UNCHANGED); - mtx_lock(&vnode_free_list_mtx); -restart: + mtx_assert(&vnode_free_list_mtx, MA_OWNED); KASSERT((*mvp)->v_mount =3D=3D mp, ("marker vnode mount list mismatch")); +restart: vp =3D TAILQ_NEXT(*mvp, v_actfreelist); + TAILQ_REMOVE(&mp->mnt_activevnodelist, *mvp, v_actfreelist); while (vp !=3D NULL) { if (vp->v_type =3D=3D VMARKER) { vp =3D TAILQ_NEXT(vp, v_actfreelist); continue; } if (!VI_TRYLOCK(vp)) { - if (should_yield()) { + if (ALWAYS_YIELD || should_yield()) { + TAILQ_INSERT_BEFORE(vp, *mvp, v_actfreelist); mtx_unlock(&vnode_free_list_mtx); - kern_yield(PRI_UNCHANGED); + kern_yield(PRI_USER); mtx_lock(&vnode_free_list_mtx); + goto restart; } - goto restart; + continue; } - if (vp->v_mount =3D=3D mp && vp->v_type !=3D VMARKER && - (vp->v_iflag & VI_DOOMED) =3D=3D 0) + KASSERT(vp->v_type !=3D VMARKER, ("locked marker %p", vp)); + KASSERT(vp->v_mount =3D=3D mp || vp->v_mount =3D=3D NULL, + ("alien vnode on the active list %p %p", vp, mp)); + if (vp->v_mount =3D=3D mp && (vp->v_iflag & VI_DOOMED) =3D=3D 0) break; nvp =3D TAILQ_NEXT(vp, v_actfreelist); VI_UNLOCK(vp); @@ -4745,22 +4767,31 @@ restart: /* Check if we are done */ if (vp =3D=3D NULL) { mtx_unlock(&vnode_free_list_mtx); - __mnt_vnode_markerfree_active(mvp, mp); - mtx_assert(MNT_MTX(mp), MA_NOTOWNED); + mnt_vnode_markerfree_active(mvp, mp); return (NULL); } - TAILQ_REMOVE(&mp->mnt_activevnodelist, *mvp, v_actfreelist); TAILQ_INSERT_AFTER(&mp->mnt_activevnodelist, vp, *mvp, v_actfreelist); mtx_unlock(&vnode_free_list_mtx); ASSERT_VI_LOCKED(vp, "active iter"); KASSERT((vp->v_iflag & VI_ACTIVE) !=3D 0, ("Non-active vp %p", vp)); return (vp); } +#undef ALWAYS_YIELD + +struct vnode * +__mnt_vnode_next_active(struct vnode **mvp, struct mount *mp) +{ + + if (should_yield()) + kern_yield(PRI_UNCHANGED); + mtx_lock(&vnode_free_list_mtx); + return (mnt_vnode_next_active(mvp, mp)); +} =20 struct vnode * __mnt_vnode_first_active(struct vnode **mvp, struct mount *mp) { - struct vnode *vp, *nvp; + struct vnode *vp; =20 *mvp =3D malloc(sizeof(struct vnode), M_VNODE_MARKER, M_WAITOK | M_ZERO); MNT_ILOCK(mp); @@ -4770,44 +4801,14 @@ __mnt_vnode_first_active(struct vnode **mvp, struct= mount *mp) (*mvp)->v_mount =3D mp; =20 mtx_lock(&vnode_free_list_mtx); -restart: vp =3D TAILQ_FIRST(&mp->mnt_activevnodelist); - while (vp !=3D NULL) { - if (vp->v_type =3D=3D VMARKER) { - vp =3D TAILQ_NEXT(vp, v_actfreelist); - continue; - } - if (!VI_TRYLOCK(vp)) { - if (should_yield()) { - mtx_unlock(&vnode_free_list_mtx); - kern_yield(PRI_UNCHANGED); - mtx_lock(&vnode_free_list_mtx); - } - goto restart; - } - if (vp->v_mount =3D=3D mp && vp->v_type !=3D VMARKER && - (vp->v_iflag & VI_DOOMED) =3D=3D 0) - break; - nvp =3D TAILQ_NEXT(vp, v_actfreelist); - VI_UNLOCK(vp); - vp =3D nvp; - } - - /* Check if we are done */ if (vp =3D=3D NULL) { mtx_unlock(&vnode_free_list_mtx); - MNT_ILOCK(mp); - MNT_REL(mp); - MNT_IUNLOCK(mp); - free(*mvp, M_VNODE_MARKER); - *mvp =3D NULL; + mnt_vnode_markerfree_active(mvp, mp); return (NULL); } - TAILQ_INSERT_AFTER(&mp->mnt_activevnodelist, vp, *mvp, v_actfreelist); - mtx_unlock(&vnode_free_list_mtx); - ASSERT_VI_LOCKED(vp, "active iter first"); - KASSERT((vp->v_iflag & VI_ACTIVE) !=3D 0, ("Non-active vp %p", vp)); - return (vp); + TAILQ_INSERT_BEFORE(vp, *mvp, v_actfreelist); + return (mnt_vnode_next_active(mvp, mp)); } =20 void @@ -4817,14 +4818,8 @@ __mnt_vnode_markerfree_active(struct vnode **mvp, st= ruct mount *mp) if (*mvp =3D=3D NULL) return; =20 - KASSERT((*mvp)->v_mount =3D=3D mp, ("marker vnode mount list mismatch")); - mtx_lock(&vnode_free_list_mtx); TAILQ_REMOVE(&mp->mnt_activevnodelist, *mvp, v_actfreelist); mtx_unlock(&vnode_free_list_mtx); - MNT_ILOCK(mp); - MNT_REL(mp); - MNT_IUNLOCK(mp); - free(*mvp, M_VNODE_MARKER); - *mvp =3D NULL; + mnt_vnode_markerfree_active(mvp, mp); } --s2ZSL+KKDSLx8OML Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAlDJWiMACgkQC3+MBN1Mb4iiZgCg9SBNPgZcGW9uqDB/xAzIvfRf uYcAoMadJl1T2CF2zJvIsperxCrOOL6q =oTXG -----END PGP SIGNATURE----- --s2ZSL+KKDSLx8OML--