From owner-freebsd-net@FreeBSD.ORG Fri Oct 3 01:40:23 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id F0A42CF2; Fri, 3 Oct 2014 01:40:23 +0000 (UTC) Received: from mail-wg0-x231.google.com (mail-wg0-x231.google.com [IPv6:2a00:1450:400c:c00::231]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 5BD77FA1; Fri, 3 Oct 2014 01:40:23 +0000 (UTC) Received: by mail-wg0-f49.google.com with SMTP id x12so310913wgg.32 for ; Thu, 02 Oct 2014 18:40:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=v1y17cnqfgc3OIBQQL+Gdp3gJcItKMWaHQuekQmu8bw=; b=ZWO1sdNIKBf9NSvve7w5xu2tYK0hJ081PxHt4c62ABG1H2D4O5sq0K3z7qAPhwAQ/m KcIbOpOyKXxqhVUcU3174L03hwQDXQNjBuHunoTe5zjoF1c0IhUizBR6CQYlS8eHyNE7 9XE6MxGYfe6UZqbPLme5ApRt5Om/rS2kFShis3N64tCAnGRqu/J+P8qdP3hjqIbmtp4r Wm43qqkS5NPOMbS6sCSm6ebvneSabR1WFlOx4gRLfLCJtbTpZH5cESCiCAc9SeMKQk9f hjCdjr7uP/tNB44x6eSb7kGBGBdLdanjewsaftMZva0uZgVYxoL+derQODICg5Ad20Jq FhEA== MIME-Version: 1.0 X-Received: by 10.181.11.133 with SMTP id ei5mr8708877wid.9.1412300421641; Thu, 02 Oct 2014 18:40:21 -0700 (PDT) Received: by 10.217.67.201 with HTTP; Thu, 2 Oct 2014 18:40:21 -0700 (PDT) In-Reply-To: <1577813.IPE4JfnhZd@ralph.baldwin.cx> References: <1410203348.1343.1.camel@bruno> <1410203965.1343.3.camel@bruno> <540E04AA.80806@vangyzen.net> <1577813.IPE4JfnhZd@ralph.baldwin.cx> Date: Thu, 2 Oct 2014 18:40:21 -0700 Message-ID: Subject: Re: ixgbe(4) spin lock held too long From: Jason Wolfe To: John Baldwin Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: freebsd-net@freebsd.org, Eric van Gyzen X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 03 Oct 2014 01:40:24 -0000 On Wed, Sep 10, 2014 at 8:24 AM, John Baldwin wrote: > On Monday, September 08, 2014 03:34:02 PM Eric van Gyzen wrote: > > On 09/08/2014 15:19, Sean Bruno wrote: > > > On Mon, 2014-09-08 at 12:09 -0700, Sean Bruno wrote: > > >> This sort of looks like the hardware failed to respond to us in time? > > >> Too busy? > > >> > > >> sean > > > > > > This seems to be affecting my 10/stable machines from 15Aug2014. > > > > > > Not a lot of churn in the code so I don't think this is new. The > > > afflicted machines, quite a few by my count, appear to have not been > > > super busy (pushing about 200 Mb/s). > > > > > > sean > > > > > >> panic: spin lock held too long > > >> > > >> GNU gdb 6.1.1 [FreeBSD] > > >> Copyright 2004 Free Software Foundation, Inc. > > >> GDB is free software, covered by the GNU General Public License, and > you > > >> are > > >> welcome to change it and/or distribute copies of it under certain > > >> conditions. > > >> Type "show copying" to see the conditions. > > >> There is absolutely no warranty for GDB. Type "show warranty" for > > >> details. > > >> This GDB was configured as "amd64-marcel-freebsd"... > > >> > > >> Unread portion of the kernel message buffer: > > >> spin lock 0xffffffff812a0400 (callout) held by 0xfffff800151fe000 (tid > > >> 100003) too long > > > > TID 100003 is usually a kernel idle thread, which would seem to indicate > > a dangling lock. Can you enable WITNESS (without WITNESS_SKIPSPIN) on > > this box? > > Also, do 'tid 100003' and 'bt' in kgdb to see what the thread holding the > lock > was doing. > > -- > John Baldwin Sorry for the delay, I've been hoping to catch a crash on one of our machines running the WITNESS kernel. Our luck seems to be in short supply, the machines running sans WITNESS crash in the same manner at a rate of 2/3 a day. I may have to grow the pool to catch this, but in the meantime here is the bt/tid. (kgdb) bt 1000003 #0 0xffffffff80ac39b8 in cpustop_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1432 #1 0xffffffff80ac397f in ipi_nmi_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1417 #2 0xffffffff80ad2d5a in trap (frame=0xffffffff81242830) at /usr/src/sys/amd64/amd64/trap.c:190 #3 0xffffffff80ab93c3 in nmi_calltrap () at /usr/src/sys/amd64/amd64/exception.S:505 #4 0xffffffff80734066 in callout_process (now=3278964590047193) at /usr/src/sys/kern/kern_timeout.c:487 (kgdb) tid 100003 [Switching to thread 40 (Thread 100003)]#0 0xffffffff80ac39b8 in cpustop_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1432 1432 savectx(&stoppcbs[cpu]);