Date: Sat, 17 Jan 2015 12:21:57 -0700 From: Jason Wolfe <nitroboost@gmail.com> To: John Baldwin <jhb@freebsd.org>, freebsd-net@freebsd.org Subject: Re: ixgbe(4) spin lock held too long Message-ID: <CAAAm0r2Y2r6h91fsypZEf=h6RR8izdymfM=mH9zxRxaXADBCFw@mail.gmail.com> In-Reply-To: <6335518.Tis0CbBHTO@ralph.baldwin.cx> References: <1410203348.1343.1.camel@bruno> <CAAAm0r0BYxk-bPN=R15Xs1S9B8QuN6t7u=Uh2oantNnM0HTYfA@mail.gmail.com> <CAAAm0r3Fje34JQuk3zp0mWJW-Aa1s3uAAatUm6yNhs3j%2BGNYoA@mail.gmail.com> <6335518.Tis0CbBHTO@ralph.baldwin.cx>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Dec 15, 2014 at 9:23 AM, John Baldwin <jhb@freebsd.org> wrote: > On Wednesday, December 10, 2014 12:47:02 PM Jason Wolfe wrote: >> John, >> >> So apparently the concurrent timer scheduling was not fixed, though it >> does seem rarer. We had about 2 weeks of stability, then last night >> we had a crash on a machine with the 4 KTR options listed in the prior >> email. I'm not sure if this is relevant, but the only request for the >> tcp lock that was 'blocked' in the ktrdump (38k entries) happened just >> before the persist timer was armed: >> >> 37898 3 23540056055651738 KTRGRAPH group:"thread", id:"swi4: clock >> tid 100012", state:"spinning", attributes: lockname:"tcp" >> 34794 0 23540056027080726 KTRGRAPH group:"thread", id:"irq278: >> ix0:que 0 tid 100059", state:"blocked", attributes: prio:8, >> wmesg:"(null)", lockname:"tcp" > > I was hoping to see when the rexmit timer was scheduled, but it isn't > mentioned at all in the ktrdump (granted, it only covers ~125 milliseconds). > The line at 37897 is the only one that mentions a timer for this connection. > The dump doesn't include when either timer was scheduled. One thing that you > can tell from the schedgraph (and from above) is that some other thread was > accessing this connection's TCB at the same time as the persist timer spun for > 2 microseconds on the lock before it was able to run, so something did frob > the state just before this ran. However, all the threads that were running at > the time were either idle or in the process of switching out. > >> 32909 1 23540056008857406 KTRGRAPH group:"thread", id:"swi4: clock >> tid 100010", state:"spinning", attributes: lockname:"tcp" >> 29394 4 23540055974204380 KTRGRAPH group:"thread", id:"irq282: >> ix0:que 4 tid 100067", state:"spinning", attributes: lockname:"tcp" >> 29390 1 23540055974198218 KTRGRAPH group:"thread", id:"irq279: >> ix0:que 1 tid 100061", state:"spinning", attributes: lockname:"tcp" >> 7130 0 23540055791854144 KTRGRAPH group:"thread", id:"irq278: >> ix0:que 0 tid 100059", state:"spinning", attributes: lockname:"tcp" >> 5558 5 23540055779687890 KTRGRAPH group:"thread", id:"squid tid >> 100207", state:"spinning", attributes: lockname:"tcp" >> 5306 1 23540055776901006 KTRGRAPH group:"thread", id:"irq279: >> ix0:que 1 tid 100061", state:"spinning", attributes: lockname:"tcp" >> >> panic: tcp_setpersist: retransmit pending >> >> (kgdb) bt >> #0 doadump (textdump=1) at pcpu.h:219 >> #1 0xffffffff806fcb11 in kern_reboot (howto=260) at >> /usr/src/sys/kern/kern_shutdown.c:452 >> #2 0xffffffff806fce74 in panic (fmt=<value optimized out>) at >> /usr/src/sys/kern/kern_shutdown.c:759 >> #3 0xffffffff8084c853 in tcp_setpersist (tp=<value optimized out>) at >> /usr/src/sys/netinet/tcp_output.c:1718 >> #4 0xffffffff80854856 in tcp_timer_persist (xtp=0xfffff8003e58e800) >> at /usr/src/sys/netinet/tcp_timer.c:518 >> #5 0xffffffff8070faeb in softclock_call_cc (c=0xfffff8003e58eac0, >> cc=0xffffffff8126ce00, direct=0) >> at /usr/src/sys/kern/kern_timeout.c:682 >> #6 0xffffffff8070feb4 in softclock (arg=<value optimized out>) at >> /usr/src/sys/kern/kern_timeout.c:810 >> >> Jason >> >> On Tue, Dec 2, 2014 at 1:18 PM, Jason Wolfe <nitroboost@gmail.com> wrote: >> > So we've resolved some of our other issues, including the concurrent >> > retrans/persist timer scheduling we saw with >> > net.inet.tcp.per_cpu_timers on. Seems it may have been fixed by >> > disabling flowdirector in ixgbe as Adriann mentioned. Running on >> > 10.1-STABLE from 11/20 (kern.osreldate: 1001502), we are still seeing >> > the spinlock held too long panic though. I have a kernel compiled >> > with these options, and we hit it: >> > >> > options KTR >> > options KTR_COMPILE=(KTR_CALLOUT|KTR_SCHED) >> > options KTR_MASK=(KTR_CALLOUT|KTR_SCHED) >> > options KTR_ENTRIES=131072 >> > >> > I had to grab the last 30000 to see actions other than the statclock >> > entries on the ix queues, if you look at the last 2500 lines or so you >> > can see the last callouts happening. > > I did look at the > ktrdump via schedgraph and it is definitely true that several threads are all > blocked on the callout lock: > > 62185 5 19987364517280198 KTRGRAPH group:"thread", id:"pagedaemon tid > 100096", state:"spinning", attributes: lockname:"callout" > 62184 2 19987364505968920 KTRGRAPH group:"thread", id:"irq280: ix0:que 2 > tid 100063", state:"spinning", attributes: lockname:"callout" > 62178 4 19987364504400868 KTRGRAPH group:"thread", id:"irq282: ix0:que 4 > tid 100067", state:"spinning", attributes: lockname:"callout" > 62172 1 19987364504141702 KTRGRAPH group:"thread", id:"irq279: ix0:que 1 > tid 100061", state:"spinning", attributes: lockname:"callout" > 62166 3 19987364503806636 KTRGRAPH group:"thread", id:"irq281: ix0:que 3 > tid 100065", state:"spinning", attributes: lockname:"callout" > > In fact, the only CPU not spinning on this lock is CPU 0. According to > ktrdump, it was running the idle thread. That said, the ktrdump also claims > that these threads started running callouts (ones that timeout sleeps > like msleep(..., hz)) (which is odd because it didn't log a "running" event > when it got the lock): > > 65919 1 19987382901783634 callout 0xffffffff81253030 finished > 65917 1 19987382901782106 KTRGRAPH group:"thread", id:"irq279: ix0:que 1 > tid 100061", point:"wokeup", attributes: linkedto:"swapper tid 100000" > 65915 1 19987382901775874 callout 0xffffffff81253030 func > 0xffffffff8073f9e0 arg 0xffffffff81252ca0 > 64142 1 19987373824962362 callout 0xfffff801e5394850 finished > 64140 1 19987373824961286 KTRGRAPH group:"thread", id:"irq279: ix0:que 1 > tid 100061", point:"wokeup", attributes: linkedto:"sendmail tid 100167" > 64138 1 19987373824960070 callout 0xfffff801e5394850 func > 0xffffffff8073f9e0 arg 0xfffff801e53944c0 > 62721 4 19987366581196864 callout 0xfffff80022426850 finished > 62719 4 19987366581189888 KTRGRAPH group:"thread", id:"irq282: ix0:que 4 > tid 100067", point:"wokeup", attributes: linkedto:"named tid 100158" > 62717 4 19987366581189192 callout 0xfffff80022426850 func > 0xffffffff8073f9e0 arg 0xfffff800224264c0 > 62680 2 19987366404396696 callout 0xfffff800220add10 finished > ... > > So these aren't TCP timers, but td_slpcallout timers. Those are already > scheduled on multiple CPUs. Maybe we can try moving those all to zero > instead: > > Index: subr_sleepqueue.c > =================================================================== > --- subr_sleepqueue.c (revision 275806) > +++ subr_sleepqueue.c (working copy) > @@ -382,8 +382,8 @@ sleepq_set_timeout_sbt(void *wchan, sbintime_t sbt > MPASS(TD_ON_SLEEPQ(td)); > MPASS(td->td_sleepqueue == NULL); > MPASS(wchan != NULL); > - callout_reset_sbt_on(&td->td_slpcallout, sbt, pr, > - sleepq_timeout, td, PCPU_GET(cpuid), flags | C_DIRECT_EXEC); > + callout_reset_sbt(&td->td_slpcallout, sbt, pr, > + sleepq_timeout, td, flags | C_DIRECT_EXEC); > } > > /* > > > -- > John Baldwin Just to round this thread out for anyone that comes around hunting for similar problems.. it was found to be an issues in the td_slpcallout timers. HPS later came to the same conclusions based on some issues he was having, and produced a patch that went into current a few days ago: https://svnweb.freebsd.org/base?view=revision&revision=277213 Thanks to both for the time/work. Jason
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAAAm0r2Y2r6h91fsypZEf=h6RR8izdymfM=mH9zxRxaXADBCFw>