Date: Thu, 9 Oct 2014 14:31:32 -0700 From: Jason Wolfe <nitroboost@gmail.com> To: John Baldwin <jhb@freebsd.org> Cc: Sean Bruno <sbruno@llnw.com>, freebsd-net@freebsd.org Subject: Re: ixgbe(4) spin lock held too long Message-ID: <CAAAm0r1N=F6wgroVJZx3CR-bzti45=VA==dv1VaVhx1hSufUEQ@mail.gmail.com> In-Reply-To: <2077446.sYcZo9xEXb@ralph.baldwin.cx> References: <1410203348.1343.1.camel@bruno> <201410071428.15753.jhb@freebsd.org> <CAAAm0r0=7G=MgKEpXGDLqYE5%2BK9gfV3PLWJ=hLfvf2KWvPtd7w@mail.gmail.com> <2077446.sYcZo9xEXb@ralph.baldwin.cx>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Oct 8, 2014 at 12:29 PM, John Baldwin <jhb@freebsd.org> wrote: > My only other thought is if a direct timeout routine ran for a long time. > > I just committed a change to current that can let you capture KTR traces of > callout routines for use with schedgraph (r272757). Unfortunately, > enabling KTR_SCHED can be slow. If you are up for trying it, I do think > that > building a kernel with KTR and KTR_SCHED enabled (KTR_COMPILE=KTR_SCHED and > KTR_MASK=KTR_SCHED) with the kernel part of the commit I referenced above > applied is the best bet to figure out why it is spinning so long. If you > can > try that (and if it doesn't slow things down too much) and get a panic with > those options enabled, then capture the output of > 'ktrdump -e /path/to/kernel -m /var/crash/vmcore.X -ct', we can use > Src/tools/sched/schedgraph.py to look at that output to get a picture of > what > was going on just prior to the crash. > > -- > John Baldwin > As luck would have it.. caught one of the boxes with the new KTR code/options after only an hour. Crashed in the same way w tid 100003 and looking the same in callout_process spin lock 0xffffffff81262d00 (callout) held by 0xfffff800151fe000 (tid 100003) too long spin lock 0xffffffff81262d00 (callout) held by 0xfffff800151fe000 (tid 100003) too long #4 0xffffffff8070d6fa in callout_process (now=7915682202423) at /usr/src/sys/kern/kern_ timeout.c:490 The ktrdump oddly only seems to have the last 702, though debug.ktr.entries is set to 1024. It appears the buffer may also start after 100003 had already hung? I've bumped debug.ktr.entries up in case we don't have enough history here. http://nitrology.com/ktrdump-spinlock.txt Jason
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAAAm0r1N=F6wgroVJZx3CR-bzti45=VA==dv1VaVhx1hSufUEQ>