Date: Thu, 16 Oct 2014 15:23:32 -0400 From: John Baldwin <jhb@freebsd.org> To: Jason Wolfe <nitroboost@gmail.com> Cc: Sean Bruno <sbruno@llnw.com>, freebsd-net@freebsd.org Subject: Re: ixgbe(4) spin lock held too long Message-ID: <201410161523.32415.jhb@freebsd.org> In-Reply-To: <CAAAm0r01DqqNLUr0Mu15o2RLLdgnKntnFNKrw8TJut8XE7KNOw@mail.gmail.com> References: <1410203348.1343.1.camel@bruno> <3567780.Mf6fMnzmYG@ralph.baldwin.cx> <CAAAm0r01DqqNLUr0Mu15o2RLLdgnKntnFNKrw8TJut8XE7KNOw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Saturday, October 11, 2014 2:19:19 am Jason Wolfe wrote: > On Fri, Oct 10, 2014 at 8:53 AM, John Baldwin <jhb@freebsd.org> wrote: > > > On Thursday, October 09, 2014 02:31:32 PM Jason Wolfe wrote: > > > On Wed, Oct 8, 2014 at 12:29 PM, John Baldwin <jhb@freebsd.org> wrote: > > > > My only other thought is if a direct timeout routine ran for a long > > time. > > > > > > > > I just committed a change to current that can let you capture KTR > > traces > > > > of > > > > callout routines for use with schedgraph (r272757). Unfortunately, > > > > enabling KTR_SCHED can be slow. If you are up for trying it, I do > > think > > > > that > > > > building a kernel with KTR and KTR_SCHED enabled (KTR_COMPILE=KTR_SCHED > > > > and > > > > KTR_MASK=KTR_SCHED) with the kernel part of the commit I referenced > > above > > > > applied is the best bet to figure out why it is spinning so long. If > > you > > > > can > > > > try that (and if it doesn't slow things down too much) and get a panic > > > > with > > > > those options enabled, then capture the output of > > > > 'ktrdump -e /path/to/kernel -m /var/crash/vmcore.X -ct', we can use > > > > Src/tools/sched/schedgraph.py to look at that output to get a picture > > of > > > > what > > > > was going on just prior to the crash. > > > > > > > > -- > > > > John Baldwin > > > > > > As luck would have it.. caught one of the boxes with the new KTR > > > code/options after only an hour. Crashed in the same way w tid 100003 > > and > > > looking the same in callout_process > > > > > > spin lock 0xffffffff81262d00 (callout) held by 0xfffff800151fe000 (tid > > > 100003) too long > > > spin lock 0xffffffff81262d00 (callout) held by 0xfffff800151fe000 (tid > > > 100003) too long > > > > > > #4 0xffffffff8070d6fa in callout_process (now=7915682202423) at > > > /usr/src/sys/kern/kern_ > > > timeout.c:490 > > > > > > The ktrdump oddly only seems to have the last 702, though > > debug.ktr.entries > > > is set to 1024. It appears the buffer may also start after 100003 had > > > already hung? I've bumped debug.ktr.entries up in case we don't have > > > enough history here. > > > > > > http://nitrology.com/ktrdump-spinlock.txt > > > > Hmm, schedgraph.py chokes on this, but it's a bit interesting. It seems > > that > > in this time sample, CPUs 1, 2, 4, and 5 were constantly running ixgbe > > interrupt handlers. No actual thread state changes are logged (which is > > why > > schedgraph choked). > > > > Try setting the sysctl 'net.inet.tcp.per_cpu_timers' to 1 and see if that > > makes a difference. I'm guessing they are all contending on the default > > callout lock which is causing your headache. > > > > -- > > John Baldwin > > > > net.inet.tcp.per_cpu_timers=1 triggered some other fun :) Saw this same > panic across a handful of machines: > > panic: tcp_setpersist: retransmit pending > cpuid = 3 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame > 0xfffffe1f9e9ab800 > panic() at panic+0x155/frame 0xfffffe1f9e9ab880 > tcp_setpersist() at tcp_setpersist+0xa3/frame 0xfffffe1f9e9ab8b0 > tcp_timer_persist() at tcp_timer_persist+0x176/frame 0xfffffe1f9e9ab8e0 > softclock_call_cc() at softclock_call_cc+0x1ce/frame 0xfffffe1f9e9ab9e0 > softclock() at softclock+0x44/frame 0xfffffe1f9e9aba00 > intr_event_execute_handlers() at intr_event_execute_handlers+0x83/frame > 0xfffffe1f9e9aba30 > ithread_loop() at ithread_loop+0x96/frame 0xfffffe1f9e9aba70 > fork_exit() at fork_exit+0x71/frame 0xfffffe1f9e9abab0 > fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe1f9e9abab0 > --- trap 0, rip = 0, rsp = 0xfffffe1f9e9abb70, rbp = 0 --- > > (kgdb) up 3 > #3 0xffffffff808467d3 in tcp_setpersist (tp=<value optimized out>) at > /usr/src/sys/netinet/tcp_output.c:1619 > 1619 panic("tcp_setpersist: retransmit pending"); > (kgdb) list > 1614 int t = ((tp->t_srtt >> 2) + tp->t_rttvar) >> 1; > 1615 int tt; > 1616 > 1617 tp->t_flags &= ~TF_PREVVALID; > 1618 if (tcp_timer_active(tp, TT_REXMT)) > 1619 panic("tcp_setpersist: retransmit pending"); > 1620 /* > 1621 * Start/restart persistance timer. > 1622 */ > 1623 TCPT_RANGESET(tt, t * tcp_backoff[tp->t_rxtshift], > > I have debug.ktr.entries set to 204800 on the machines compiled with KTR > options, maybe a bit more history can provide some extra context. I looked at the other trace and I don't think it disagrees with my previous theory. I do have more KTR patches to log when we spin on locks which would really confirm this, but I haven't tested those fully on HEAD yet. However, I'd rather spend time debugging this panic. I think I added that assertion myself. The root problem is that the persist and retransmit timers share state, so only one should ever be active at a time. In this case, the persist timer has fired and is rescheduling itself, but some other thread has scheduled the retransmit timer to fire. The bug is probably in that other thread. It should either not scheduled the retransmit timer, or it should first cancel the persist timer. A new assertion should be able to find this. Note, this is definitely going to panic, but one panic is probably enough to find this. Index: tcp_timer.c =================================================================== --- tcp_timer.c (revision 272718) +++ tcp_timer.c (working copy) @@ -713,10 +713,14 @@ tcp_timer_activate(struct tcpcb *tp, int timer_typ case TT_REXMT: t_callout = &tp->t_timers->tt_rexmt; f_callout = tcp_timer_rexmt; + if (callout_active(&tp->t_timers->tt_persist)) + panic("scheduling retransmit with persist active"); break; case TT_PERSIST: t_callout = &tp->t_timers->tt_persist; f_callout = tcp_timer_persist; + if (callout_active(&tp->t_timers->tt_rexmt)) + panic("scheduling persist with retransmit active"); break; case TT_KEEP: t_callout = &tp->t_timers->tt_keep; -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201410161523.32415.jhb>