Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 26 Feb 2009 10:12:24 -0500
From:      John Baldwin <jhb@freebsd.org>
To:        freebsd-stable@freebsd.org
Cc:        Guy Helmer <ghelmer@palisadesys.com>
Subject:   Re: 7.1 hangs in cache_lookup mutex?
Message-ID:  <200902261012.24325.jhb@freebsd.org>
In-Reply-To: <49A55D78.1070604@palisadesys.com>
References:  <49A46AB4.3080003@palisadesys.com> <200902250915.57562.jhb@freebsd.org> <49A55D78.1070604@palisadesys.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wednesday 25 February 2009 10:02:16 am Guy Helmer wrote:
> John Baldwin wrote:
> > On Tuesday 24 February 2009 4:46:28 pm Guy Helmer wrote:
> >   
> >> I think I may have found a clue regarding some of the hangs I'm seeing 
> >> on FreeBSD 7.1.
> >> I have a program (kvoop), compiled under FreeBSD 6 and using 
> >> compatibility libraries under FreeBSD 7, that seems to be consistently 
> >> involved during these hangs.  This time, I noticed that many processes 
> >> are stopped, waiting on the ufs lock.  I can't tell for certain, but I 
> >> assume this kvoop process 33203 is blocking the other processes.  The 
> >> kvoop process looks to me like it is waiting for a mutex, but nothing 
> >> shows up being deadlocked.  Am I on the right track?  Is there something 
> >> else I should be looking for?
> >>
> >> (kgdb) proc 33203
> >> [Switching to thread 129 (Thread 100241)]#0  sched_switch (
> >>     td=0xffffff0019109a50, newtd=0x0, flags=1)
> >>     at ../../../kern/sched_ule.c:1944
> >> 1944                    cpuid = PCPU_GET(cpuid);
> >> (kgdb) where
> >> #0  sched_switch (td=0xffffff0019109a50, newtd=0x0, flags=1)
> >>     at ../../../kern/sched_ule.c:1944
> >> #1  0xffffffff803a573b in mi_switch (flags=1, newtd=0x0)
> >>     at ../../../kern/kern_synch.c:440
> >> #2  0xffffffff803d1214 in turnstile_wait (ts=Variable "ts" is not available.
> >> )
> >>     at ../../../kern/subr_turnstile.c:748
> >> #3  0xffffffff80392db0 in _mtx_lock_sleep (m=0xffffffff8083c220,
> >>     tid=18446742974618442320, opts=Variable "opts" is not available.
> >> ) at ../../../kern/kern_mutex.c:420
> >> #4  0xffffffff80392ea6 in _mtx_lock_flags (m=0xffffffff8083c220, opts=0,
> >>     file=0xffffffff805bd438 "../../../kern/vfs_cache.c", line=327)
> >>     at ../../../kern/kern_mutex.c:186
> >> #5  0xffffffff80403bc6 in cache_lookup (dvp=0xffffff00013e0dc8,
> >>     vpp=0xffffffffa2d93a18, cnp=0xffffffffa2d93a40)
> >>     at ../../../kern/vfs_cache.c:327
> >> #6  0xffffffff80404093 in vfs_cache_lookup (ap=Variable "ap" is not 
> >> available.
> >> )
> >>     at ../../../kern/vfs_cache.c:674
> >> #7  0xffffffff805628a0 in VOP_LOOKUP_APV (vop=0xffffffff8076e440,
> >>     a=0xffffffffa2d936b0) at vnode_if.c:99
> >> #8  0xffffffff80409afd in lookup (ndp=0xffffffffa2d939f0) at vnode_if.h:57
> >> #9  0xffffffff8040a824 in namei (ndp=0xffffffffa2d939f0)
> >>     at ../../../kern/vfs_lookup.c:219
> >> #10 0xffffffff8041dc4f in vn_open_cred (ndp=0xffffffffa2d939f0,
> >>     flagp=0xffffffffa2d9393c, cmode=0, cred=0xffffff0001588600,
> >>     fp=0xffffff0001964200) at ../../../kern/vfs_vnops.c:188
> >> #11 0xffffffff8041ccc7 in kern_open (td=0xffffff0019109a50,
> >>     path=0xffffac30 <Address 0xffffac30 out of bounds>, pathseg=Variable 
> >> "pathseg" is not available.
> >> )
> >>     at ../../../kern/vfs_syscalls.c:1032
> >> #12 0xffffffff80544660 in ia32_syscall (frame=0xffffffffa2d93c80)
> >>     at ../../../amd64/ia32/ia32_syscall.c:182
> >> #13 0xffffffff805014d0 in Xint0x80_syscall () at ia32_exception.S:65
> >> #14 0x0000000028284037 in ?? ()
> >>
> >> (kgdb) frame 4
> >> #4  0xffffffff80392ea6 in _mtx_lock_flags (m=0xffffffff8083c220, opts=0,
> >>     file=0xffffffff805bd438 "../../../kern/vfs_cache.c", line=327)
> >>     at ../../../kern/kern_mutex.c:186
> >> 186             _get_sleep_lock(m, curthread, opts, file, line);
> >> (kgdb) list
> >> 181                 ("mtx_lock() of spin mutex %s @ %s:%d", 
> >> m->lock_object.lo_name,
> >> 182                 file, line));
> >> 183             WITNESS_CHECKORDER(&m->lock_object, opts | LOP_NEWORDER 
> >> | LOP_EXCLUSIVE,
> >> 184                 file, line);
> >> 185
> >> 186             _get_sleep_lock(m, curthread, opts, file, line);
> >> 187             LOCK_LOG_LOCK("LOCK", &m->lock_object, opts, 
> >> m->mtx_recurse, file,
> >> 188                 line);
> >> 189             WITNESS_LOCK(&m->lock_object, opts | LOP_EXCLUSIVE, 
> >> file, line);
> >> 190             curthread->td_locks++;
> >>
> >> (kgdb) print *m
> >> $2 = {lock_object = {lo_name = 0xffffffff805bd5d2 "Name Cache",
> >>     lo_type = 0xffffffff805bd5d2 "Name Cache", lo_flags = 16973824,
> >>     lo_witness_data = {lod_list = {stqe_next = 0xffffffff807cd600},
> >>       lod_witness = 0xffffffff807cd600}}, mtx_lock = 4, mtx_recurse = 0}
> >>     
> >
> > Is this from a coredump or while the system is live?  mtx_lock = 4 indicates 
> > the mutex is unlocked, so there shouldn't be any threads waiting on it.
> >
> >   
> That was from running kgdb on a vmcore file.  Would the state of the 
> mutex be different than if I was running kgdb on a live kernel?

Well, if it was live perhaps the thread wasn't hung but was changing states
while you were watching it.  However, that is not true in a coredump.  This is
a very disturbing bug as it means something is broken in the mutex implementation
itself.  Have you reproduced this on multiple machines?

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200902261012.24325.jhb>