Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 03 Oct 2014 08:52:10 -0400
From:      John Baldwin <jhb@freebsd.org>
To:        Jason Wolfe <nitroboost@gmail.com>
Cc:        freebsd-net@freebsd.org, Eric van Gyzen <eric@vangyzen.net>
Subject:   Re: ixgbe(4) spin lock held too long
Message-ID:  <2951452.cFrDFFRBbl@ralph.baldwin.cx>
In-Reply-To: <CAAAm0r1zXL0eCkfFijDg_1XcFQ48DjuoQBdGWTk4HFDaYviRCQ@mail.gmail.com>
References:  <1410203348.1343.1.camel@bruno> <1577813.IPE4JfnhZd@ralph.baldwin.cx> <CAAAm0r1zXL0eCkfFijDg_1XcFQ48DjuoQBdGWTk4HFDaYviRCQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thursday, October 02, 2014 06:40:21 PM Jason Wolfe wrote:
> On Wed, Sep 10, 2014 at 8:24 AM, John Baldwin <jhb@freebsd.org> wrote:
> > On Monday, September 08, 2014 03:34:02 PM Eric van Gyzen wrote:
> > > On 09/08/2014 15:19, Sean Bruno wrote:
> > > > On Mon, 2014-09-08 at 12:09 -0700, Sean Bruno wrote:
> > > >> This sort of looks like the hardware failed to respond to us in time?
> > > >> Too busy?
> > > >> 
> > > >> sean
> > > > 
> > > > This seems to be affecting my 10/stable machines from 15Aug2014.
> > > > 
> > > > Not a lot of churn in the code so I don't think this is new.  The
> > > > afflicted machines, quite a few by my count, appear to have not been
> > > > super busy (pushing about 200 Mb/s).
> > > > 
> > > > sean
> > > > 
> > > >> panic: spin lock held too long
> > > >> 
> > > >> GNU gdb 6.1.1 [FreeBSD]
> > > >> Copyright 2004 Free Software Foundation, Inc.
> > > >> GDB is free software, covered by the GNU General Public License, and
> > 
> > you
> > 
> > > >> are
> > > >> welcome to change it and/or distribute copies of it under certain
> > > >> conditions.
> > > >> Type "show copying" to see the conditions.
> > > >> There is absolutely no warranty for GDB.  Type "show warranty" for
> > > >> details.
> > > >> This GDB was configured as "amd64-marcel-freebsd"...
> > > >> 
> > > >> Unread portion of the kernel message buffer:
> > > >> spin lock 0xffffffff812a0400 (callout) held by 0xfffff800151fe000
> > > >> (tid
> > > >> 100003) too long
> > > 
> > > TID 100003 is usually a kernel idle thread, which would seem to indicate
> > > a dangling lock.  Can you enable WITNESS (without WITNESS_SKIPSPIN) on
> > > this box?
> > 
> > Also, do 'tid 100003' and 'bt' in kgdb to see what the thread holding the
> > lock
> > was doing.
> > 
> > --
> > John Baldwin
> 
> Sorry for the delay, I've been hoping to catch a crash on one of our
> machines running the WITNESS kernel.  Our luck seems to be in short supply,
> the machines running sans WITNESS crash in the same manner at a rate of 2/3
> a day.  I may have to grow the pool to catch this, but in the meantime here
> is the bt/tid.
> 
> (kgdb) bt 1000003
> #0  0xffffffff80ac39b8 in cpustop_handler () at
> /usr/src/sys/amd64/amd64/mp_machdep.c:1432
> #1  0xffffffff80ac397f in ipi_nmi_handler () at
> /usr/src/sys/amd64/amd64/mp_machdep.c:1417
> #2  0xffffffff80ad2d5a in trap (frame=0xffffffff81242830) at
> /usr/src/sys/amd64/amd64/trap.c:190
> #3  0xffffffff80ab93c3 in nmi_calltrap () at
> /usr/src/sys/amd64/amd64/exception.S:505
> #4  0xffffffff80734066 in callout_process (now=3278964590047193) at
> /usr/src/sys/kern/kern_timeout.c:487
> (kgdb) tid 100003
> [Switching to thread 40 (Thread 100003)]#0  0xffffffff80ac39b8 in
> cpustop_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1432
> 1432            savectx(&stoppcbs[cpu]);

Ok, so it is processing C_DIRECT callouts.  Can you go to frame 4 and see 
where it is at?

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?2951452.cFrDFFRBbl>