Date: Wed, 10 Aug 2016 19:11:37 +0300 From: Konstantin Belousov <kostikbel@gmail.com> To: Hooman Fazaeli <hoomanfazaeli@gmail.com> Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org> Subject: Re: 9.3-RELEASE panic: spin lock held too long Message-ID: <20160810161137.GU83214@kib.kiev.ua> In-Reply-To: <57AB462A.2080608@gmail.com> References: <57AB349B.2010805@gmail.com> <20160810141948.GP83214@kib.kiev.ua> <57AB462A.2080608@gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Aug 10, 2016 at 07:50:10PM +0430, Hooman Fazaeli wrote: > On 2016-08-10 18:49, Konstantin Belousov wrote: > > On Wed, Aug 10, 2016 at 06:35:15PM +0430, Hooman Fazaeli wrote: > >> Hi > >> > >> on a 9.3-REL i386 box we have occasional "spin lock held too long" panics. > >> > >> System info: > >> ------------- > >> - Intel(R) Core(TM) i5-4440 CPU @ 3.10GHz CPU (4 cores, no hyper theading) > >> - 4G non-ECC RAM > >> - asterisk-1.8.30.0 from ports > >> - dahdi-kmod26-2.6.1.r10738 from ports > >> - powerd disabled. > >> - Workload: ISDN & SIP call processing. > >> ------------ > >> > >> The panics are either on 'sched lock' or 'turnstile lock' spin locks. > >> > >> PANIC 1 > >> ======= > >> As below trace shows: > >> > >> 1- input arrives on a UDP socket > >> 2- doselwakeup is called. > >> 3- That wakeup call ends up in sched_add. > >> 4- sched_add grabs 'sched lock 0' spin lock, and aparenlty, holds it for a too long time. > >> 5- The pancing thread does the same calls as owner thread but panics because > >> it can't grab the the same spin lock. > >> > >> > kgdb /boot/kernel/kernel /var/crash/vmcore.14 > >> ... > >> kernel trap 12 with interrupts disabled > >> spin lock 0xc140a4c0 (sched lock 0) held by 0xc807a2f0 (tid 100045) too long > (kgdb) up 4 > #4 0xc0ac9e75 in _mtx_lock_spin (m=0xc140a4c0, tid=3384060112, opts=0, file=0x0, line=0) at ../../../kern/kern_mutex.c:557 > 557 ../../../kern/kern_mutex.c: No such file or directory. > in ../../../kern/kern_mutex.c > > (kgdb) p *m > $1 = {lock_object = {lo_name = 0xc140ab08 "sched lock 0", lo_flags = 720896, lo_data = 0, lo_witness = 0x0}, mtx_lock = 3355943664} > > ------------ > > As you see, the mtx_lock is 3355943664 (0xc807a2f0), the same TID reported in panic string. > > (kgdb) info threads > ... > 34 Thread 100045 (PID=12: intr/irq267: igb0:que 0) sched_switch (td=0xc807a2f0, newtd=0xc7da18d0, flags=265) at ../../../kern/sched_ule.c:1904 > ... > I see. What else could be, is the spinlock leak. Can you _try_ to enable the WITNESS, without WITNESS_SKIPSPIN option. Then show alllocks from the ddb prompt after the panic could reveal the place which originally locked it.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160810161137.GU83214>