Date: Thu, 26 Feb 2009 16:27:07 -0600 From: Guy Helmer <ghelmer@palisadesys.com> To: John Baldwin <jhb@freebsd.org> Cc: freebsd-stable@freebsd.org Subject: Re: 7.1 hangs in cache_lookup mutex? Message-ID: <49A7173B.4030608@palisadesys.com> In-Reply-To: <200902261648.32845.jhb@freebsd.org> References: <49A46AB4.3080003@palisadesys.com> <200902261012.24325.jhb@freebsd.org> <49A70807.9020600@palisadesys.com> <200902261648.32845.jhb@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
John Baldwin wrote: > On Thursday 26 February 2009 4:22:15 pm Guy Helmer wrote: > >> db> show sleepchain 23110 >> thread 100181 (pid 23110, vmstat) blocked on sx "user map" XLOCK >> thread 100208 (pid 23092, kvoop) is on a run queue >> db> show sleepchain 23092 >> thread 100208 (pid 23092, kvoop) is on a run queue >> > > Ah, so this is normal (well, mostly) in that kvoop is simply on the run queue > waiting for a CPU. Can you find the thread pointer for kvoop and check on > things such as if it is pinned and if so to which CPU (td_pinned will tell > you the first, and td_sched->ts_cpu will tell you the second with ULE). > (kgdb) print td->td_pinned $2 = 0 (kgdb) print td->td_sched->ts_cpu $3 = 3 '\003' (kgdb) print td->td_state $4 = TDS_RUNQ From my captured ddb run: cpuid = 3 curthread = 0xc5e2f000: pid 23090 "filter" curpcb = 0xe6f90d90 fpcurthread = none idlethread = 0xc442daf0: pid 11 "idle: cpu3" APIC ID = 7 currentldt = 0x50 spin locks held: Back to kgdb: (kgdb) proc 23090 [Switching to thread 131 (Thread 100199)]#0 cpustop_handler () at atomic.h:253 253 ATOMIC_ASM(set, int, "orl %1,%0", "ir", v); (kgdb) where #0 cpustop_handler () at atomic.h:253 #1 0xc07eedef in ipi_nmi_handler () at ../../../i386/i386/mp_machdep.c:1300 #2 0xc07f85e0 in trap (frame=0xe6f90b64) at ../../../i386/i386/trap.c:216 #3 0xc07e02db in calltrap () at ../../../i386/i386/exception.s:159 #4 0xc068a066 in witness_unlock (lock=0xc5e4cb2c, flags=0, file=0xc08529f0 "../../../kern/kern_descrip.c", line=2083) at ../../../kern/subr_witness.c:1266 #5 0xc065c95e in _sx_sunlock (sx=0xc5e4cb2c, file=0xc08529f0 "../../../kern/kern_descrip.c", line=2083) at ../../../kern/kern_sx.c:294 #6 0xc062e1ab in fget_read (td=0xc5e2f000, fd=15, fpp=0xe6f90c34) at ../../../kern/kern_descrip.c:2083 #7 0xc068c2e8 in kern_readv (td=0xc5e2f000, fd=15, auio=0xe6f90c60) at ../../../kern/sys_generic.c:189 #8 0xc068c3ff in read (td=0xc5e2f000, uap=0xe6f90cfc) at ../../../kern/sys_generic.c:108 #9 0xc07f8343 in syscall (frame=0xe6f90d38) at ../../../i386/i386/trap.c:1090 #10 0xc07e0340 in Xint0x80_syscall () at ../../../i386/i386/exception.s:255 #11 0x00000033 in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) frame 4 #4 0xc068a066 in witness_unlock (lock=0xc5e4cb2c, flags=0, file=0xc08529f0 "../../../kern/kern_descrip.c", line=2083) at ../../../kern/subr_witness.c:1266 1266 if (witness_cold || witness_watch == 0 || lock->lo_witness == NULL || (kgdb) print *lock $5 = {lo_name = 0xc0852b03 "filedesc structure", lo_type = 0xc0852b03 "filedesc structure", lo_flags = 37421056, lo_witness_data = {lod_list = {stqe_next = 0xc091c228}, lod_witness = 0xc091c228}} > Then you will want to see what is running on that CPU. You might want to > check your other coredump and find the td_state member of the thread for > kvoop there as well. > On the amd64, kvoop was on the run queue for cpu 1 and another process, filter, looks like it must have been running in user mode when I broke into ddb: (kgdb) proc 33201 [Switching to thread 127 (Thread 100113)]#0 cpustop_handler () at atomic.h:264 264 ATOMIC_ASM(set, int, "orl %1,%0", "ir", v); (kgdb) where #0 cpustop_handler () at atomic.h:264 #1 0xffffffff8050c560 in ipi_nmi_handler () at ../../../amd64/amd64/mp_machdep.c:1119 #2 0xffffffff8051aba7 in trap (frame=0xffffffff9fe18f40) at ../../../amd64/amd64/trap.c:198 #3 0xffffffff805013eb in nmi_calltrap () at ../../../amd64/amd64/exception.S:427 #4 0x00000000280af9d4 in ?? () Previous frame inner to this frame (corrupt stack?) I sure wish I could find the root cause of the hangs. On a hunch, I tried setting "machdep.cpu_idle_hlt=0" on the amd64 machine, and it has run 32 hours without a hang. It could just be coincidence, though... Thanks for your help, Guy
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?49A7173B.4030608>