Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 26 Feb 2009 16:27:07 -0600
From:      Guy Helmer <ghelmer@palisadesys.com>
To:        John Baldwin <jhb@freebsd.org>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: 7.1 hangs in cache_lookup mutex?
Message-ID:  <49A7173B.4030608@palisadesys.com>
In-Reply-To: <200902261648.32845.jhb@freebsd.org>
References:  <49A46AB4.3080003@palisadesys.com> <200902261012.24325.jhb@freebsd.org> <49A70807.9020600@palisadesys.com> <200902261648.32845.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
John Baldwin wrote:
> On Thursday 26 February 2009 4:22:15 pm Guy Helmer wrote:
>   
>> db> show sleepchain 23110
>> thread 100181 (pid 23110, vmstat) blocked on sx "user map" XLOCK
>> thread 100208 (pid 23092, kvoop) is on a run queue
>> db> show sleepchain 23092
>> thread 100208 (pid 23092, kvoop) is on a run queue
>>     
>
> Ah, so this is normal (well, mostly) in that kvoop is simply on the run queue 
> waiting for a CPU.  Can you find the thread pointer for kvoop and check on 
> things such as if it is pinned and if so to which CPU (td_pinned will tell 
> you the first, and td_sched->ts_cpu will tell you the second with ULE).
>   
(kgdb) print td->td_pinned
$2 = 0
(kgdb) print td->td_sched->ts_cpu
$3 = 3 '\003'
(kgdb) print td->td_state
$4 = TDS_RUNQ

 From my captured ddb run:
cpuid        = 3
curthread    = 0xc5e2f000: pid 23090 "filter"
curpcb       = 0xe6f90d90
fpcurthread  = none
idlethread   = 0xc442daf0: pid 11 "idle: cpu3"
APIC ID      = 7
currentldt   = 0x50
spin locks held:

Back to kgdb:
(kgdb) proc 23090
[Switching to thread 131 (Thread 100199)]#0  cpustop_handler () at 
atomic.h:253
253     ATOMIC_ASM(set,      int,   "orl %1,%0",   "ir",  v);
(kgdb) where
#0  cpustop_handler () at atomic.h:253
#1  0xc07eedef in ipi_nmi_handler () at ../../../i386/i386/mp_machdep.c:1300
#2  0xc07f85e0 in trap (frame=0xe6f90b64) at ../../../i386/i386/trap.c:216
#3  0xc07e02db in calltrap () at ../../../i386/i386/exception.s:159
#4  0xc068a066 in witness_unlock (lock=0xc5e4cb2c, flags=0,
    file=0xc08529f0 "../../../kern/kern_descrip.c", line=2083)
    at ../../../kern/subr_witness.c:1266
#5  0xc065c95e in _sx_sunlock (sx=0xc5e4cb2c,
    file=0xc08529f0 "../../../kern/kern_descrip.c", line=2083)
    at ../../../kern/kern_sx.c:294
#6  0xc062e1ab in fget_read (td=0xc5e2f000, fd=15, fpp=0xe6f90c34)
    at ../../../kern/kern_descrip.c:2083
#7  0xc068c2e8 in kern_readv (td=0xc5e2f000, fd=15, auio=0xe6f90c60)
    at ../../../kern/sys_generic.c:189
#8  0xc068c3ff in read (td=0xc5e2f000, uap=0xe6f90cfc)
    at ../../../kern/sys_generic.c:108
#9  0xc07f8343 in syscall (frame=0xe6f90d38) at 
../../../i386/i386/trap.c:1090
#10 0xc07e0340 in Xint0x80_syscall () at ../../../i386/i386/exception.s:255
#11 0x00000033 in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb) frame 4
#4  0xc068a066 in witness_unlock (lock=0xc5e4cb2c, flags=0,
    file=0xc08529f0 "../../../kern/kern_descrip.c", line=2083)
    at ../../../kern/subr_witness.c:1266
1266            if (witness_cold || witness_watch == 0 || 
lock->lo_witness == NULL ||
(kgdb) print *lock
$5 = {lo_name = 0xc0852b03 "filedesc structure",
  lo_type = 0xc0852b03 "filedesc structure", lo_flags = 37421056,
  lo_witness_data = {lod_list = {stqe_next = 0xc091c228},
    lod_witness = 0xc091c228}}

> Then you will want to see what is running on that CPU.  You might want to 
> check your other coredump and find the td_state member of the thread for 
> kvoop there as well.
>   

On the amd64, kvoop was on the run queue for cpu 1 and another process, 
filter, looks like it must have been running in user mode when I broke 
into ddb:
(kgdb) proc 33201
[Switching to thread 127 (Thread 100113)]#0  cpustop_handler () at 
atomic.h:264
264     ATOMIC_ASM(set,      int,   "orl %1,%0",   "ir",  v);
(kgdb) where
#0  cpustop_handler () at atomic.h:264
#1  0xffffffff8050c560 in ipi_nmi_handler ()
    at ../../../amd64/amd64/mp_machdep.c:1119
#2  0xffffffff8051aba7 in trap (frame=0xffffffff9fe18f40)
    at ../../../amd64/amd64/trap.c:198
#3  0xffffffff805013eb in nmi_calltrap ()
    at ../../../amd64/amd64/exception.S:427
#4  0x00000000280af9d4 in ?? ()
Previous frame inner to this frame (corrupt stack?)



I sure wish I could find the root cause of the hangs.  On a hunch, I 
tried setting "machdep.cpu_idle_hlt=0" on the amd64 machine, and it has 
run 32 hours without a hang.  It could just be coincidence, though...

Thanks for your help,
Guy



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?49A7173B.4030608>