Date: Thu, 13 Dec 2012 22:14:29 -0500 (EST) From: Rick Macklem <rmacklem@uoguelph.ca> To: Konstantin Belousov <kostikbel@gmail.com> Cc: Tim Kientzle <kientzle@freebsd.org>, freebsd-current Current <freebsd-current@freebsd.org> Subject: Re: r244036 kernel hangs under load. Message-ID: <1783866028.1396164.1355454869890.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <20121213043132.GB71906@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
Konstantin Belousov wrote: > On Wed, Dec 12, 2012 at 10:01:39PM -0500, Rick Macklem wrote: > > Konstantin Belousov wrote: > > > On Tue, Dec 11, 2012 at 08:58:47PM -0500, Rick Macklem wrote: > > > > Ok, I'll test r243598 and then r243599 and r243835, just to > > > > see if it really is this. > > > > > > > > I'll email when I have done this. > > > If you test only r243598, I am sure that you would experience > > > corruption. > > > The r243599 should cause the deadlocks. > > > > > I think you meant r243599 will result in corruptions and > > r243835 deadlocks. > > > > I have run r243598 for a while without a hang. (r243599 doesn't > > build) I haven't tried r243835 yet. > > > > > > > > > > > > > > > > > > Also, do you use the post-r244095 kernel ? > > > > > > > > > > > > Before and after. The most recent tests were post-r244095. > > > > > > (If anything the more recent kernels hang more easily.) > > > > > > > > > > > > > > > > > > > > > > > > > > Is your machine SMP ? > > > > > > > > > > > > Old, slow single core i386. > > > > > > > > > > Try this. Please note that this is mostly a debugging > > > > > facility. > > > > > > > > > It seemed to help, but didn't stop the hangs completely. > > > > r244125 without the patch would hang somewhere in a kernel > > > > build. r244125 plus this patch ran almost 2 kernel builds > > > > before it got hung. > > > > > > Can you try to look into the system state on the hang (on the > > > kernel > > > with the if (1 || patch applied) ? Using the ddb and recipe from > > > the > > > web page. Basically watch for a thread looping in the mnt_active > > > iterator and threads owning vnode interlocks. > > > > Ok, there is only one process in the mnt_active iterator and its > > trace is as follows (syncer): > > dle+0x12d/frame 0xdfe33adc (I suspect the screen lost an 'I') > > intr_execute_handlers(c5e3d064,dfe33b20,0,dfe33b64,c0ec2115,...) at > > intr_execute_handlers+0x49/frame 0xdfe33afc > > lapic_handle_intr(3c,dfe33b20) at lapic_handle_intr+0x36/frame > > 0xdfe33b10 > > Xapic_isr1() at Xapic_isr1+0x35/frame 0xdfe33b10 > > --- interrupt, eip = 0xc0eca8db, esp = 0xdfe33b60, ebp = 0xdfe33b64 > > --- > > spinlock_exit(c128be90,4,c10b5017,130,1710,...) at > > spinlock_exit+0x2b/frame 0xdfe33b64 > > __mtx_unlock_spin_flags(c128be90,0,c10b80be,25d,0,...) at > > __mtx_unlock_spin_flags+0x112/frame 0xdfe33b90 > > kern_yield(ffffffff,0,c10c75c9,127b,c8b05238,...) at > > kern_yield+0x125/frame 0xdfe33bbc > > __mnt_vnode_next_active(dfe33c08,c857ba80,c10c75c9,dac,5d7,...) at > > __mnt_vnode_next_active+0xda/frame 0xdfe33be0 > > vfs_msync(c857ba80,2,2,e6b,c857ba80,...) at vfs_msync+0x175/frame > > 0xdfe33c18 > > sync_fsync(dfe33ca8,c85cf470,80400,c10c75c9,6f4,...) at > > sync_fsync+0x141/frame 0xdfe33c64 > > VOP_FSYNC_APV(c12008a0,dfe33ca8,c10c75c9,6f4,4e20,...) at > > VOP_FSYNC_APV+0xb4/frame 0xdfe33c64 > > sched_sync(0,dfe33d08,c10b0e10,3db,c85395a0,...) at > > sched_sync+0x399/frame 0xdfe33cc8 > > fork_exit(c0b79420,0,dfe33d08) at fork_exit+0xc0/frame 0xdfe33cf4 > > fork_trampoline() at fork_trampoline+0x8/frame 0xdfe33cf4 > > --- trap 0, eip = 0, esp = 0xdfe33d40, ebp = 0 --- > > > > This process holds: > > exclusive lockmgr syncer (syncer) r = 0 (0xc85cf4c8) locked @ > > kern/vfs_subr.c:1780 > > > > The only other process that is doing anything in the VFS subsystem > > holds the vnode interlock. It's trace is: > > dle+0x12d/frame 0xdfe6a850 > > intr_execute_handlers(c5f721c0,dfe6a894,0,dfe6a908,c0ec2115,...) at > > intr_execute_handlers+0x49/frame 0xdfe6a870 > > lapic_handle_intr(31,dfe6a894) at lapic_handle_intr+0x36/frame > > 0xdfe6a884 > > Xapic_isr1() at Xapic_isr1+0x35/frame 0xdfe6a884 > > --- interrupt, eip = 0xc0b2206a, esp = 0xdfe6a8d4, ebp = 0xdfe6a908 > > --- > > witness_unlock(c8972a74,8,c10c75c9,965,0,...) at > > witness_unlock+0x3a/frame 0xdfe6a908 > > __mtx_unlock_flags(c8972a84,0,c10c75c9,965,c89729fc,...) at > > __mtx_unlock_flags+0x9f/frame 0xdfe6a938 > > vdropl(c89729fc,dfe6a974,c10c75c9,8e7,c1238020,...) at > > vdropl+0x63/frame 0xdfe6a95c > > vputx(dfe6aa04,c0b67acc,c89729fc,dfe6a9e4,dfe6abc4,...) at > > vputx+0x300/frame 0xdfe6a994 > > vput(c89729fc,dfe6a9e4,dfe6abc4,31d,dfe6a9e4,...) at vput+0x10/frame > > 0xdfe6a99c > > lookup(dfe6ab84,c857e000,0,ce,c13c83c8,...) at lookup+0x9bc/frame > > 0xdfe6aa04 > > namei(dfe6ab84,0,0,246,0,...) at namei+0x4fe/frame 0xdfe6aa80 > > vn_open_cred(dfe6ab84,dfe6ac24,1a4,0,c5dd4580,...) at > > vn_open_cred+0x2c0/frame 0xdfe6ab40 > > vn_open(dfe6ab84,dfe6ac24,1a4,c85922a0,c853a2d0,...) at > > vn_open+0x3b/frame 0xdfe6ab60 > > kern_openat(c85c55e0,ffffff9c,2882dcc0,0,8001,...) at > > kern_openat+0x1e2/frame 0xdfe6ac0c > > kern_open(c85c55e0,2882dcc0,0,8000,1b6,...) at kern_open+0x35/frame > > 0xdfe6ac2c > > sys_open(c85c55e0,dfe6accc,c02acde7,7307f55d,5e5b00,...) at > > sys_open+0x30/frame 0xdfe6ac48 > > syscall(dfe6ad08) at syscall+0x2e5/frame 0xdfe6acfc > > Xint0x80_syscall() at Xint0x80_syscall+0x21/frame 0xdfe6acfc > > --- syscall (5, FreeBSD ELF32, sys_open), eip = 0x84a1667, esp = > > 0xbfbfcffc, ebp = 0xbfbfd018 --- > > > > The locks this process holds are: > > exclusive sleep mutex vnode interlock (vnode interlock) r = 0 > > (0x8972a74) locked @ kern/vfs_subr.c:2513 > > shared lockmgr ufs (ufs) r = 0 (0xc8bd181c) locked @ > > kern/vfs_subr.c:2161 > > > > The only other lock held by any thread/process is: > > Process 12 (intr) thread 0xc5dfc5e0 (100012) > > exclusive sleep mutex Giant (Giant) r = 1 (0xc127b690) locked @ > > dev/syscons/syscons.c:724 > > > > The only 2 locked vnodes are the ufs one and the syncer one, as > > shown above. > > > > The rest of the processes/threads don't hold any locks and don't > > seem > > to be in vfs code. > > > > db> show pcpu (this machine is single core) > > cpuid = 0 > > dynamic pcpu = 0x5e5b00 > > curthread = 0xc5dfc5e0: pid 12 "swi6: Giant taskq" > > curpcb = 0xc592fd60 > > fpcurthread = none > > idlethread = 0xc5dfb8d0: tid 100003 "idle: cpu0" > > APIC ID = 0 > > currentldt = 0x50 > > spin locks held: > > db> > > > > When I do a "db> cont", wait a while and then go > > back into the debugger with <ctl><alt><esc>, everything > > looks the same. > > > > I'll leave this machine like this, in case you want me > > to do anything else in the debugger. The kernel is > > r244125 + your little patch and it took about 5 kernel > > build cycles to get it hung. (Without your little > > patch, it usually hangs in the first kernel build cycle.) > > > > Do you happen to know how I can tell where it was executing > > when I <ctl><alt><esc>? (show registers just has the eip in > > kdb_enter) > > Can I work up a the stack referenced by esp and find a saved > > eip? > The thread that was executing is shown as curthread from 'show pcpu'. > Was it taskq always, or was taskq only sporadic, and syncer process > rest of the time ? > > FWIW, you could try this patch too, at least I have a theory now why > the if (1) patch did not helped. My guess is that kern_yield() > effectively > returned the same high-priority syncer process back to the CPU. > > diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c > index 67e078d..2325201 100644 > --- a/sys/kern/vfs_subr.c > +++ b/sys/kern/vfs_subr.c > @@ -4710,32 +4710,54 @@ __mnt_vnode_markerfree_all(struct vnode **mvp, > struct mount *mp) > * These are helper functions for filesystems to traverse their > * active vnodes. See MNT_VNODE_FOREACH_ACTIVE() in sys/mount.h > */ > -struct vnode * > -__mnt_vnode_next_active(struct vnode **mvp, struct mount *mp) > +static void > +mnt_vnode_markerfree_active(struct vnode **mvp, struct mount *mp) > +{ > + > + KASSERT((*mvp)->v_mount == mp, ("marker vnode mount list > mismatch")); > + > + MNT_ILOCK(mp); > + MNT_REL(mp); > + MNT_IUNLOCK(mp); > + free(*mvp, M_VNODE_MARKER); > + *mvp = NULL; > +} > + > +#ifdef SMP > +#define ALWAYS_YIELD (mp_ncpus == 1) > +#else > +#define ALWAYS_YIELD 1 > +#endif > + > +static struct vnode * > +mnt_vnode_next_active(struct vnode **mvp, struct mount *mp) > { > struct vnode *vp, *nvp; > > - if (should_yield()) > - kern_yield(PRI_UNCHANGED); > - mtx_lock(&vnode_free_list_mtx); > -restart: > + mtx_assert(&vnode_free_list_mtx, MA_OWNED); > KASSERT((*mvp)->v_mount == mp, ("marker vnode mount list mismatch")); > +restart: > vp = TAILQ_NEXT(*mvp, v_actfreelist); > + TAILQ_REMOVE(&mp->mnt_activevnodelist, *mvp, v_actfreelist); > while (vp != NULL) { > if (vp->v_type == VMARKER) { > vp = TAILQ_NEXT(vp, v_actfreelist); > continue; > } > if (!VI_TRYLOCK(vp)) { > - if (should_yield()) { > + if (ALWAYS_YIELD || should_yield()) { > + TAILQ_INSERT_BEFORE(vp, *mvp, v_actfreelist); > mtx_unlock(&vnode_free_list_mtx); > - kern_yield(PRI_UNCHANGED); > + kern_yield(PRI_USER); > mtx_lock(&vnode_free_list_mtx); > + goto restart; > } > - goto restart; > + continue; > } > - if (vp->v_mount == mp && vp->v_type != VMARKER && > - (vp->v_iflag & VI_DOOMED) == 0) > + KASSERT(vp->v_type != VMARKER, ("locked marker %p", vp)); > + KASSERT(vp->v_mount == mp || vp->v_mount == NULL, > + ("alien vnode on the active list %p %p", vp, mp)); > + if (vp->v_mount == mp && (vp->v_iflag & VI_DOOMED) == 0) > break; > nvp = TAILQ_NEXT(vp, v_actfreelist); > VI_UNLOCK(vp); > @@ -4745,22 +4767,31 @@ restart: > /* Check if we are done */ > if (vp == NULL) { > mtx_unlock(&vnode_free_list_mtx); > - __mnt_vnode_markerfree_active(mvp, mp); > - mtx_assert(MNT_MTX(mp), MA_NOTOWNED); > + mnt_vnode_markerfree_active(mvp, mp); > return (NULL); > } > - TAILQ_REMOVE(&mp->mnt_activevnodelist, *mvp, v_actfreelist); > TAILQ_INSERT_AFTER(&mp->mnt_activevnodelist, vp, *mvp, v_actfreelist); > mtx_unlock(&vnode_free_list_mtx); > ASSERT_VI_LOCKED(vp, "active iter"); > KASSERT((vp->v_iflag & VI_ACTIVE) != 0, ("Non-active vp %p", vp)); > return (vp); > } > +#undef ALWAYS_YIELD > + > +struct vnode * > +__mnt_vnode_next_active(struct vnode **mvp, struct mount *mp) > +{ > + > + if (should_yield()) > + kern_yield(PRI_UNCHANGED); > + mtx_lock(&vnode_free_list_mtx); > + return (mnt_vnode_next_active(mvp, mp)); > +} > > struct vnode * > __mnt_vnode_first_active(struct vnode **mvp, struct mount *mp) > { > - struct vnode *vp, *nvp; > + struct vnode *vp; > > *mvp = malloc(sizeof(struct vnode), M_VNODE_MARKER, M_WAITOK | > M_ZERO); > MNT_ILOCK(mp); > @@ -4770,44 +4801,14 @@ __mnt_vnode_first_active(struct vnode **mvp, > struct mount *mp) > (*mvp)->v_mount = mp; > > mtx_lock(&vnode_free_list_mtx); > -restart: > vp = TAILQ_FIRST(&mp->mnt_activevnodelist); > - while (vp != NULL) { > - if (vp->v_type == VMARKER) { > - vp = TAILQ_NEXT(vp, v_actfreelist); > - continue; > - } > - if (!VI_TRYLOCK(vp)) { > - if (should_yield()) { > - mtx_unlock(&vnode_free_list_mtx); > - kern_yield(PRI_UNCHANGED); > - mtx_lock(&vnode_free_list_mtx); > - } > - goto restart; > - } > - if (vp->v_mount == mp && vp->v_type != VMARKER && > - (vp->v_iflag & VI_DOOMED) == 0) > - break; > - nvp = TAILQ_NEXT(vp, v_actfreelist); > - VI_UNLOCK(vp); > - vp = nvp; > - } > - > - /* Check if we are done */ > if (vp == NULL) { > mtx_unlock(&vnode_free_list_mtx); > - MNT_ILOCK(mp); > - MNT_REL(mp); > - MNT_IUNLOCK(mp); > - free(*mvp, M_VNODE_MARKER); > - *mvp = NULL; > + mnt_vnode_markerfree_active(mvp, mp); > return (NULL); > } > - TAILQ_INSERT_AFTER(&mp->mnt_activevnodelist, vp, *mvp, > v_actfreelist); > - mtx_unlock(&vnode_free_list_mtx); > - ASSERT_VI_LOCKED(vp, "active iter first"); > - KASSERT((vp->v_iflag & VI_ACTIVE) != 0, ("Non-active vp %p", vp)); > - return (vp); > + TAILQ_INSERT_BEFORE(vp, *mvp, v_actfreelist); > + return (mnt_vnode_next_active(mvp, mp)); > } > > void > @@ -4817,14 +4818,8 @@ __mnt_vnode_markerfree_active(struct vnode > **mvp, struct mount *mp) > if (*mvp == NULL) > return; > > - KASSERT((*mvp)->v_mount == mp, ("marker vnode mount list > mismatch")); > - > mtx_lock(&vnode_free_list_mtx); > TAILQ_REMOVE(&mp->mnt_activevnodelist, *mvp, v_actfreelist); > mtx_unlock(&vnode_free_list_mtx); > - MNT_ILOCK(mp); > - MNT_REL(mp); > - MNT_IUNLOCK(mp); > - free(*mvp, M_VNODE_MARKER); > - *mvp = NULL; > + mnt_vnode_markerfree_active(mvp, mp); > } Good work. This patch seems to have done the trick. I've run quite a few kernel build cycles without a hang. I'll keep running them, but I would have expected to see a hang by now. Maybe Tim can test the patch as well? (I needed to add #include <sys/smp.h> btw.) rick
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1783866028.1396164.1355454869890.JavaMail.root>