Date: Wed, 10 Aug 2016 17:19:48 +0300 From: Konstantin Belousov <kostikbel@gmail.com> To: Hooman Fazaeli <hoomanfazaeli@gmail.com> Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org> Subject: Re: 9.3-RELEASE panic: spin lock held too long Message-ID: <20160810141948.GP83214@kib.kiev.ua> In-Reply-To: <57AB349B.2010805@gmail.com> References: <57AB349B.2010805@gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Aug 10, 2016 at 06:35:15PM +0430, Hooman Fazaeli wrote: > > Hi > > on a 9.3-REL i386 box we have occasional "spin lock held too long" panics. > > System info: > ------------- > - Intel(R) Core(TM) i5-4440 CPU @ 3.10GHz CPU (4 cores, no hyper theading) > - 4G non-ECC RAM > - asterisk-1.8.30.0 from ports > - dahdi-kmod26-2.6.1.r10738 from ports > - powerd disabled. > - Workload: ISDN & SIP call processing. > ------------ > > The panics are either on 'sched lock' or 'turnstile lock' spin locks. > > PANIC 1 > ======= > As below trace shows: > > 1- input arrives on a UDP socket > 2- doselwakeup is called. > 3- That wakeup call ends up in sched_add. > 4- sched_add grabs 'sched lock 0' spin lock, and aparenlty, holds it for a too long time. > 5- The pancing thread does the same calls as owner thread but panics because > it can't grab the the same spin lock. > > > kgdb /boot/kernel/kernel /var/crash/vmcore.14 > ... > kernel trap 12 with interrupts disabled > spin lock 0xc140a4c0 (sched lock 0) held by 0xc807a2f0 (tid 100045) too long > panic: spin lock held too long > cpuid = 3 > KDB: stack backtrace: > #0 0xc0b17eaf at kdb_backtrace+0x4f > #1 0xc0adeaef at panic+0x16f > #2 0xc0ac9cff at _mtx_lock_spin_failed+0x3f > #3 0xc0ac9e75 at _mtx_lock_spin+0x165 > #4 0xc0b096c5 at sched_add+0xf5 > #5 0xc0b09890 at sched_wakeup+0x70 > #6 0xc0ae8968 at setrunnable+0x88 > #7 0xc0b2227e at sleepq_resume_thread+0x12e > #8 0xc0b22fd3 at sleepq_broadcast+0xd3 > #9 0xc0a8c4cd at cv_broadcastpri+0x4d > #10 0xc0b2a406 at doselwakeup+0xe6 > #11 0xc0b2a4be at selwakeuppri+0xe > #12 0xc0a9fa59 at knote_enqueue+0x59 > #13 0xc0aa073f at kqueue_register+0x84f > #14 0xc0aa09f3 at kern_kevent+0xf3 > #15 0xc0aa16ce at sys_kevent+0x19e > #16 0xc0fcc8c3 at syscall+0x443 > #17 0xc0fb60f1 at Xint0x80_syscall+0x21 > Uptime: 7m44s > > > bt > #0 doadump (textdump=1) at pcpu.h:250 > #1 0xc0ade835 in kern_reboot (howto=260) at ../../../kern/kern_shutdown.c:454 > #2 0xc0adeb32 in panic (fmt=<value optimized out>) at ../../../kern/kern_shutdown.c:642 > #3 0xc0ac9cff in _mtx_lock_spin_failed (m=0x0) at ../../../kern/kern_mutex.c:515 > #4 0xc0ac9e75 in _mtx_lock_spin (m=0xc140a4c0, tid=3384060112, opts=0, file=0x0, line=0) at ../../../kern/kern_mutex.c:557 > #5 0xc0b096c5 in sched_add (td=0xc9b00bc0, flags=0) at ../../../kern/sched_ule.c:1153 > #6 0xc0b09890 in sched_wakeup (td=0xc9b00bc0) at ../../../kern/sched_ule.c:1991 > #7 0xc0ae8968 in setrunnable (td=0xc9b00bc0) at ../../../kern/kern_synch.c:537 > #8 0xc0b2227e in sleepq_resume_thread (sq=0xc869fd40, td=0xc9b00bc0, pri=104) at ../../../kern/subr_sleepqueue.c:763 > #9 0xc0b22fd3 in sleepq_broadcast (wchan=0xc95741e4, flags=1, pri=104, queue=0) at ../../../kern/subr_sleepqueue.c:865 > #10 0xc0a8c4cd in cv_broadcastpri (cvp=0xc95741e4, pri=104) at ../../../kern/kern_condvar.c:448 > #11 0xc0b2a406 in doselwakeup (sip=0xc963faac, pri=104) at ../../../kern/sys_generic.c:1683 > #12 0xc0b2a4be in selwakeuppri (sip=0xc963faac, pri=104) at ../../../kern/sys_generic.c:1651 > #13 0xc0a9fa59 in knote_enqueue (kn=<value optimized out>) at ../../../kern/kern_event.c:1786 > #14 0xc0aa073f in kqueue_register (kq=0xc963fa80, kev=0xf0e07b20, td=0xc9b4a8d0, waitok=1) at ../../../kern/kern_event.c:1154 > #15 0xc0aa09f3 in kern_kevent (td=0xc9b4a8d0, fd=152, nchanges=2, nevents=0, k_ops=0xf0e07c20, timeout=0x0) at ../../../kern/kern_event.c:850 > #16 0xc0aa16ce in sys_kevent (td=0xc9b4a8d0, uap=0xf0e07ccc) at ../../../kern/kern_event.c:771 > #17 0xc0fcc8c3 in syscall (frame=0xf0e07d08) at subr_syscall.c:135 > #18 0xc0fb60f1 in Xint0x80_syscall () at ../../../i386/i386/exception.s:270 > #19 0x00000033 in ?? () > Previous frame inner to this frame (corrupt stack?) > > (kgdb) tid 100045 > [Switching to thread 34 (Thread 100045)]#0 cpustop_handler () at ../../../i386/i386/mp_machdep.c:1496 > 1496 ../../../i386/i386/mp_machdep.c: No such file or directory. > in ../../../i386/i386/mp_machdep.c > > (kgdb) bt > #0 cpustop_handler () at ../../../i386/i386/mp_machdep.c:1496 > #1 0xc0fc1805 in ipi_nmi_handler () at ../../../i386/i386/mp_machdep.c:1478 > #2 0xc0fccf38 in trap (frame=0xe1be9620) at ../../../i386/i386/trap.c:227 > #3 0xc0fb605c in calltrap () at ../../../i386/i386/exception.s:170 > #4 0xc0fddf45 in DELAY (n=1) at ../../../x86/isa/clock.c:283 > #5 0xc0ac9e6c in _mtx_lock_spin (m=0xc140b1c0, tid=3355943664, opts=0, file=0x0, line=0) at ../../../kern/kern_mutex.c:555 > #6 0xc0b096c5 in sched_add (td=0xc7de0000, flags=4) at ../../../kern/sched_ule.c:1153 > #7 0xc0aae008 in intr_event_schedule_thread (ie=<value optimized out>) at ../../../kern/kern_intr.c:921 > #8 0xc0aaf8fd in swi_sched (cookie=0xc7d95300, flags=<value optimized out>) at ../../../kern/kern_intr.c:1174 > #9 0xc0af5b53 in callout_tick () at ../../../kern/kern_timeout.c:361 > #10 0xc0a8c3e1 in hardclock_cnt (cnt=1, usermode=0) at ../../../kern/kern_clock.c:554 > #11 0xc0fd403f in handleevents (now=0xe1be9870, fake=0) at ../../../kern/kern_clocksource.c:215 > #12 0xc0fd5d3f in timercb (et=0xc148c860, arg=0x0) at ../../../kern/kern_clocksource.c:390 > #13 0xc0fe731f in lapic_handle_timer (frame=0xe1be98b4) at ../../../x86/x86/local_apic.c:818 > #14 0xc0fb6600 in Xtimerint () at apic_vector.s:108 > #15 0xc0aca0df in _mtx_lock_sleep (m=0xc95741d0, tid=3355943664, opts=<value optimized out>, file=0x0, line=0) at ../../../kern/kern_mutex.c:400 > #16 0xc0b2a3e9 in doselwakeup (sip=0xc9a491f4, pri=104) at ../../../kern/sys_generic.c:1681 > #17 0xc0b2a4be in selwakeuppri (sip=0xc9a491f4, pri=104) at ../../../kern/sys_generic.c:1651 > #18 0xc0b4886f in sowakeup (so=0xc9a491a0, sb=0xc9a491f4) at ../../../kern/uipc_sockbuf.c:182 > #19 0xc0c9680d in udp_append (inp=0xc87bb0fc, ip=0xc8ead80e, n=0xc8ebf500, off=28, udp_in=0xe1be9a7c) at ../../../netinet/udp_usrreq.c:330 > #20 0xc0c98ad6 in udp_input (m=0xc8ebf500, off=20) at ../../../netinet/udp_usrreq.c:616 > #21 0xc0c0d5e7 in ip_input (m=0xc8ebf500) at ../../../netinet/ip_input.c:760 > #22 0xc0bae68f in netisr_dispatch_src (proto=1, source=0, m=0xc8ebf500) at ../../../net/netisr.c:1013 > #23 0xc0bae930 in netisr_dispatch (proto=1, m=0xc8ebf500) at ../../../net/netisr.c:1104 > #24 0xc0ba50c1 in ether_demux (ifp=0xc807f000, m=0xc8ebf500) at ../../../net/if_ethersubr.c:943 > #25 0xc0ba551f in ether_nh_input (m=0xc8ebf500) at ../../../net/if_ethersubr.c:762 > #26 0xc0bae68f in netisr_dispatch_src (proto=9, source=0, m=0xc8ebf500) at ../../../net/netisr.c:1013 > #27 0xc0bae930 in netisr_dispatch (proto=9, m=0xc8ebf500) at ../../../net/netisr.c:1104 > #28 0xc0ba4c09 in ether_input (ifp=0xc807f000, m=0xc8ebf500) at ../../../net/if_ethersubr.c:803 > #29 0xc06930c2 in igb_rxeof (que=0xc8078180, count=99, done=0x0) at ../../../dev/e1000/if_igb.c:4735 > #30 0xc0693328 in igb_msix_que (arg=0xc8078180) at ../../../dev/e1000/if_igb.c:1601 > #31 0xc0aae18b in intr_event_execute_handlers (p=0xc7d9e5b0, ie=0xc8077a00) at ../../../kern/kern_intr.c:1272 > #32 0xc0aaf990 in ithread_loop (arg=0xc80a43e0) at ../../../kern/kern_intr.c:1285 > #33 0xc0aaa96f in fork_exit (callout=0xc0aaf910 <ithread_loop>, arg=0xc80a43e0, frame=0xe1be9d08) at ../../../kern/kern_fork.c:996 > #34 0xc0fb6104 in fork_trampoline () at ../../../i386/i386/exception.s:279 I am not convinced that the thread 100045 is the owner of the spin lock. mtx_lock_spin()->DELAY() frames on the stack mean that the thread is trying to obtain the spin lock but did not succeeded yet. Or did you not mean that ? You may obtain the current owner by printing the mutex content from kgdb. Look at the mtx_lock field, it is a pointer to the struct thread owning the spin lock, modulo some flags in lower bits.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160810141948.GP83214>