From owner-freebsd-net@FreeBSD.ORG Fri Mar 21 15:22:03 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 1AC6EE01 for ; Fri, 21 Mar 2014 15:22:03 +0000 (UTC) Received: from mail-qg0-x229.google.com (mail-qg0-x229.google.com [IPv6:2607:f8b0:400d:c04::229]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id CC23591A for ; Fri, 21 Mar 2014 15:22:02 +0000 (UTC) Received: by mail-qg0-f41.google.com with SMTP id i50so7452052qgf.0 for ; Fri, 21 Mar 2014 08:22:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=FQeETi4/16zB4drpIdQwxo6eQzQrBfOam2a4OyNUjjo=; b=q1ldr0xnFKkTXaB9CkOF+hJwoL8p+7II/k+QE5Z8njuOtwdQDDC+uR7YbW9HZucjT/ NGI2CP5NOssTW4S4X4RJfOzLECM8Y8fNZgecxJVBLmMjOHP5Oo+cvns9enW4DONkd83+ GLavAxsgiD5xe/xcUgApE782SMe2fJ1mwrVWYhAW3h32ckyNh9wOKuIebrd6ffVYVwbe Odm/sQH6TTj+Vn0N5G6IdGdMjfHiIBL70ZnOS3s/EuE3Z7oF4MHnjzxlUx4LjzHYnlHu SwL788EC0rZ4Sp6hSbpF+eeCKo7w/nup3p1qbcfwH4Jf2tWqGH+0MyAKsBpWj/L9FX1w dbUw== MIME-Version: 1.0 X-Received: by 10.224.22.147 with SMTP id n19mr10557332qab.93.1395415321712; Fri, 21 Mar 2014 08:22:01 -0700 (PDT) Received: by 10.96.79.97 with HTTP; Fri, 21 Mar 2014 08:22:01 -0700 (PDT) In-Reply-To: References: Date: Fri, 21 Mar 2014 12:22:01 -0300 Message-ID: Subject: Re: 9.2 ixgbe tx queue hang From: Christopher Forgeron To: Markus Gebert Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.17 Cc: FreeBSD Net , Rick Macklem , Jack Vogel X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Mar 2014 15:22:03 -0000 Markus, I don't know why I didn't notice this before.. I copied your cpuset ping verbatim, not realizing that I should be using 172.16.0.x as that's my network on the ix's On this tester box, 10.0.0.1 goes out a different interface, thus it never reported back any problems. Now that I've corrected that, I see I have problems on the same queues: CPU0 ping: sendto: No buffer space available ping: sendto: No buffer space available CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 CPU8 ping: sendto: No buffer space available ping: sendto: No buffer space available CPU9 CPU10 CPU11 CPU12 CPU13 CPU14 CPU15 CPU16 ping: sendto: No buffer space available ping: sendto: No buffer space available CPU17 CPU18 CPU19 CPU20 CPU21 CPU22 CPU23 I can run that three times and get the same CPU's. I'll try a reboot and see if they always fail on the same queues, tho I don't know if that would show anything. At this stage, NFS connections coming into the box are down, but I can still ping out. Incoming pings show 'host is down' Here is the dump of ix0 's sysctls (only ix0 is in use on this machine for testing) dev.ix.0.queue0.interrupt_rate: 500000 dev.ix.0.queue0.irqs: 100179 dev.ix.0.queue0.txd_head: 0 dev.ix.0.queue0.txd_tail: 0 dev.ix.0.queue0.tso_tx: 104156 dev.ix.0.queue0.no_tx_dma_setup: 0 dev.ix.0.queue0.no_desc_avail: 5 dev.ix.0.queue0.tx_packets: 279480 dev.ix.0.queue0.rxd_head: 513 dev.ix.0.queue0.rxd_tail: 512 dev.ix.0.queue0.rx_packets: 774424 dev.ix.0.queue0.rx_bytes: 281916 dev.ix.0.queue0.rx_copies: 4609 dev.ix.0.queue0.lro_queued: 0 dev.ix.0.queue0.lro_flushed: 0 dev.ix.0.queue1.interrupt_rate: 71428 dev.ix.0.queue1.irqs: 540682 dev.ix.0.queue1.txd_head: 1295 dev.ix.0.queue1.txd_tail: 1295 dev.ix.0.queue1.tso_tx: 15 dev.ix.0.queue1.no_tx_dma_setup: 0 dev.ix.0.queue1.no_desc_avail: 0 dev.ix.0.queue1.tx_packets: 93248 dev.ix.0.queue1.rxd_head: 0 dev.ix.0.queue1.rxd_tail: 2047 dev.ix.0.queue1.rx_packets: 462225 dev.ix.0.queue1.rx_bytes: 0 dev.ix.0.queue1.rx_copies: 0 dev.ix.0.queue1.lro_queued: 0 dev.ix.0.queue1.lro_flushed: 0 dev.ix.0.queue2.interrupt_rate: 71428 dev.ix.0.queue2.irqs: 282801 dev.ix.0.queue2.txd_head: 367 dev.ix.0.queue2.txd_tail: 367 dev.ix.0.queue2.tso_tx: 312757 dev.ix.0.queue2.no_tx_dma_setup: 0 dev.ix.0.queue2.no_desc_avail: 0 dev.ix.0.queue2.tx_packets: 876533 dev.ix.0.queue2.rxd_head: 0 dev.ix.0.queue2.rxd_tail: 2047 dev.ix.0.queue2.rx_packets: 2324954 dev.ix.0.queue2.rx_bytes: 0 dev.ix.0.queue2.rx_copies: 0 dev.ix.0.queue2.lro_queued: 0 dev.ix.0.queue2.lro_flushed: 0 dev.ix.0.queue3.interrupt_rate: 71428 dev.ix.0.queue3.irqs: 1424108 dev.ix.0.queue3.txd_head: 499 dev.ix.0.queue3.txd_tail: 499 dev.ix.0.queue3.tso_tx: 1263116 dev.ix.0.queue3.no_tx_dma_setup: 0 dev.ix.0.queue3.no_desc_avail: 0 dev.ix.0.queue3.tx_packets: 1590798 dev.ix.0.queue3.rxd_head: 0 dev.ix.0.queue3.rxd_tail: 2047 dev.ix.0.queue3.rx_packets: 8319143 dev.ix.0.queue3.rx_bytes: 0 dev.ix.0.queue3.rx_copies: 0 dev.ix.0.queue3.lro_queued: 0 dev.ix.0.queue3.lro_flushed: 0 dev.ix.0.queue4.interrupt_rate: 71428 dev.ix.0.queue4.irqs: 138019 dev.ix.0.queue4.txd_head: 1620 dev.ix.0.queue4.txd_tail: 1620 dev.ix.0.queue4.tso_tx: 29235 dev.ix.0.queue4.no_tx_dma_setup: 0 dev.ix.0.queue4.no_desc_avail: 0 dev.ix.0.queue4.tx_packets: 200853 dev.ix.0.queue4.rxd_head: 6 dev.ix.0.queue4.rxd_tail: 5 dev.ix.0.queue4.rx_packets: 218327 dev.ix.0.queue4.rx_bytes: 1527 dev.ix.0.queue4.rx_copies: 0 dev.ix.0.queue4.lro_queued: 0 dev.ix.0.queue4.lro_flushed: 0 dev.ix.0.queue5.interrupt_rate: 71428 dev.ix.0.queue5.irqs: 131367 dev.ix.0.queue5.txd_head: 330 dev.ix.0.queue5.txd_tail: 330 dev.ix.0.queue5.tso_tx: 9907 dev.ix.0.queue5.no_tx_dma_setup: 0 dev.ix.0.queue5.no_desc_avail: 0 dev.ix.0.queue5.tx_packets: 150955 dev.ix.0.queue5.rxd_head: 0 dev.ix.0.queue5.rxd_tail: 2047 dev.ix.0.queue5.rx_packets: 72814 dev.ix.0.queue5.rx_bytes: 0 dev.ix.0.queue5.rx_copies: 0 dev.ix.0.queue5.lro_queued: 0 dev.ix.0.queue5.lro_flushed: 0 dev.ix.0.queue6.interrupt_rate: 71428 dev.ix.0.queue6.irqs: 839814 dev.ix.0.queue6.txd_head: 1402 dev.ix.0.queue6.txd_tail: 1402 dev.ix.0.queue6.tso_tx: 327633 dev.ix.0.queue6.no_tx_dma_setup: 0 dev.ix.0.queue6.no_desc_avail: 0 dev.ix.0.queue6.tx_packets: 1371262 dev.ix.0.queue6.rxd_head: 0 dev.ix.0.queue6.rxd_tail: 2047 dev.ix.0.queue6.rx_packets: 2559592 dev.ix.0.queue6.rx_bytes: 0 dev.ix.0.queue6.rx_copies: 0 dev.ix.0.queue6.lro_queued: 0 dev.ix.0.queue6.lro_flushed: 0 dev.ix.0.queue7.interrupt_rate: 71428 dev.ix.0.queue7.irqs: 150693 dev.ix.0.queue7.txd_head: 1965 dev.ix.0.queue7.txd_tail: 1965 dev.ix.0.queue7.tso_tx: 248 dev.ix.0.queue7.no_tx_dma_setup: 0 dev.ix.0.queue7.no_desc_avail: 0 dev.ix.0.queue7.tx_packets: 145736 dev.ix.0.queue7.rxd_head: 0 dev.ix.0.queue7.rxd_tail: 2047 dev.ix.0.queue7.rx_packets: 19030 dev.ix.0.queue7.rx_bytes: 0 dev.ix.0.queue7.rx_copies: 0 dev.ix.0.queue7.lro_queued: 0 dev.ix.0.queue7.lro_flushed: 0 On Thu, Mar 20, 2014 at 7:40 AM, Markus Gebert wrote: > > > Can you try this when the problem occurs? > > for CPU in {0..7}; do echo "CPU${CPU}"; cpuset -l ${CPU} ping -i 0.2 -c 2 > -W 1 10.0.0.1 | grep sendto; done > > It will tie ping to certain cpus to test the different tx queues of your > ix interface. If the pings reliably fail only on some queues, then your > problem is more likely to be the same as ours. > > Also, if you have dtrace available: > > kldload dtraceall > dtrace -n 'fbt:::return / arg1 == EFBIG && execname == "ping" / { stack(); > }' > > while you run pings over the interface affected. This will give you hints > about where the EFBIG error comes from. > > > [...] > > > Markus > > >