Date: Wed, 08 Oct 2014 15:29:32 -0400 From: John Baldwin <jhb@freebsd.org> To: Jason Wolfe <nitroboost@gmail.com> Cc: Sean Bruno <sbruno@llnw.com>, freebsd-net@freebsd.org, Eric van Gyzen <eric@vangyzen.net>, davide@freebsd.org Subject: Re: ixgbe(4) spin lock held too long Message-ID: <2077446.sYcZo9xEXb@ralph.baldwin.cx> In-Reply-To: <CAAAm0r0=7G=MgKEpXGDLqYE5%2BK9gfV3PLWJ=hLfvf2KWvPtd7w@mail.gmail.com> References: <1410203348.1343.1.camel@bruno> <201410071428.15753.jhb@freebsd.org> <CAAAm0r0=7G=MgKEpXGDLqYE5%2BK9gfV3PLWJ=hLfvf2KWvPtd7w@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wednesday, October 08, 2014 10:56:56 AM Jason Wolfe wrote: > On Tue, Oct 7, 2014 at 11:28 AM, John Baldwin <jhb@freebsd.org> wrote: > > On Tuesday, October 07, 2014 2:06:42 pm Jason Wolfe wrote: > > > Hey John, > > > > > > Happy to do this, but the pool of boxes is about 500 large, which is the > > > reason I'm able to see a crash every day or so. I've pulled a portion > > > of them out of service to switch them over to a a kernel with WITNESS options, > > > might that assist in the same manner if we catch a crash there? If you're > > > thinking the KTR traces are required to diagnose this I'll get started on > > > that though. I've also cut 10.1-RC1 with some ixgbe fixes for future use, > > > but noticed kern_timeout.c has been static for the past 8 months. > > > Thanks again for your help. > > > > I don't think WITNESS will really help in this case. ixgbe just happens to > > be trying to schedule a timer and running into a problem that the timer is > > spinning on another CPU. While the kern_timeout code hasn't changed in a > > while, it may be an edge case provoked by another bug elsewhere. It's also > > possible I suppose that a lot of callouts are scheduled for some reason. > > > > Hmm, maybe we can still figure this out from your dumps. We can still get to > > '*cc' via an alternate route. First, IIRC, the thread is an idle thread, > > so its name should indicate which CPU it is for: > > > > % sudo procstat -at | grep 100003 > > > > 11 100003 idle idle: cpu0 0 255 run - > > > > (Hmm, I bet that is always CPU 0). That means we can find 'cc' by doing > > 'p cc_cpu[0]' in kgdb. Can you do that from the last core dump along with > > 'p last' in frame 4? > > > > Grrr. Looks like cc->cc_lastscan is already overwritten by the time you've > > crashed, so we can't work back to the original value of 'firstb'. > > > > However, there is a sysctl you can invoke to get callout stats. Maybe run > > 'sysctl kern.callout_stats=1' on one of your existing servers and then > > check dmesg. On an idle machine sitting here running stable/10 I have 86 > > callout scheduled currently: > > Correct 100003 is always CPU0, on our end at least. Jackpot on the > cc_cpu[0] also, here are cores across 3 different 'spin lock callout held > by tid 100003 too long' machines. Callouts on an idle machine are in line > with yours at 90, and a somewhat idle box pushing ~500Mb/s has in the > 3000-4000 range. Unfortunately, the thing I wanted out of cc_cpu[0] is the previous value of cc_lastscan so I could work back to what 'firstb' is and see if it sticks in the loop forever for some reason. However, cc_lastscan is overwritten by the time this cores. Still, we can still tell something perhaps: > (kgdb) tid 100003 > [Switching to thread 40 (Thread 100003)]#0 0xffffffff80ac39b8 in > cpustop_handler () > at /usr/src/sys/amd64/amd64/mp_machdep.c:1432 > 1432 savectx(&stoppcbs[cpu]); > (kgdb) up 4 > #4 0xffffffff80733f79 in callout_process (now=3549882673091690) at > /usr/src/sys/kern/kern_timeout.c:443 > 443 tmp = LIST_FIRST(sc); > (kgdb) list > 438 } > 439 > 440 /* Iterate callwheel from firstb to nowb and then up to > lastb. */ > 441 do { > 442 sc = &cc->cc_callwheel[firstb & callwheelmask]; > 443 tmp = LIST_FIRST(sc); > 444 while (tmp != NULL) { > 445 /* Run the callout if present time within > allowed. */ > 446 if (tmp->c_time <= now) { > 447 /* > (kgdb) p cc_cpu[0] > $1 = {cc_lock = {lock_object = {lo_name = 0xffffffff80ccbec7 "callout", > lo_flags = 720896, lo_data = 0, lo_witness = 0x0}, > mtx_lock = 4}, cc_exec_entity = {{cc_next = 0x0, cc_curr = 0x0, > ce_migration_func = 0, ce_migration_arg = 0x0, > ce_migration_cpu = 64, ce_migration_time = 0, ce_migration_prec = 0, > cc_cancel = false, cc_waiting = false}, { > cc_next = 0xffffffff812a7af0, cc_curr = 0x0, ce_migration_func = 0, > ce_migration_arg = 0x0, ce_migration_cpu = 64, > ce_migration_time = 0, ce_migration_prec = 0, cc_cancel = false, > cc_waiting = false}}, > cc_callout = 0xfffffe000069f000, cc_callwheel = 0xfffffe00007c1000, > cc_expireq = {tqh_first = 0x0, > tqh_last = 0xffffffff812a0510}, cc_callfree = {slh_first = > 0xfffffe00007c02c0}, cc_firstevent = 3549882673088716, > cc_lastscan = 3549882673091690, cc_cookie = 0xfffff800151f4b80, cc_bucket > = 6515} > (kgdb) p last > $2 = 3549882929381376 So, the bucket for 'cc_lastscan' (which is 'now' when this was called) is only 16 buckets behind the bucket for last. That doesn't seem like a lot of work to be done: (gdb) p (3549882673091690 >> (32 - 8)) $4 = 211589495 (gdb) p (3549882929381376 >> (32 - 8)) $5 = 211589511 (gdb) p $5 - $4 $6 = 16 OTOH, we know that cc_lastcan when this function was started was less than cc_firstevent. However, that is the same bucket as for 'cc_lastscan': (gdb) p (3549882673088716 >> (32 - 8)) $7 = 211589495 My only other thought is if a direct timeout routine ran for a long time. I just committed a change to current that can let you capture KTR traces of callout routines for use with schedgraph (r272757). Unfortunately, enabling KTR_SCHED can be slow. If you are up for trying it, I do think that building a kernel with KTR and KTR_SCHED enabled (KTR_COMPILE=KTR_SCHED and KTR_MASK=KTR_SCHED) with the kernel part of the commit I referenced above applied is the best bet to figure out why it is spinning so long. If you can try that (and if it doesn't slow things down too much) and get a panic with those options enabled, then capture the output of 'ktrdump -e /path/to/kernel -m /var/crash/vmcore.X -ct', we can use Src/tools/sched/schedgraph.py to look at that output to get a picture of what was going on just prior to the crash. -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2077446.sYcZo9xEXb>