Date: Fri, 21 Mar 2014 12:22:01 -0300 From: Christopher Forgeron <csforgeron@gmail.com> To: Markus Gebert <markus.gebert@hostpoint.ch> Cc: FreeBSD Net <freebsd-net@freebsd.org>, Rick Macklem <rmacklem@uoguelph.ca>, Jack Vogel <jfvogel@gmail.com> Subject: Re: 9.2 ixgbe tx queue hang Message-ID: <CAB2_NwBE=i7-HsyhsVAvOvNwJ6pD=hDTZBzdmV9KwJTXW45Hvw@mail.gmail.com> In-Reply-To: <FA262955-B3A9-48EC-828B-FF0D4D5D0498@hostpoint.ch> References: <CAB2_NwDG=gB1WCJ7JKTHpkJCrvPuAhipkn%2BvPyT%2BxXzOBrTGkg@mail.gmail.com> <FA262955-B3A9-48EC-828B-FF0D4D5D0498@hostpoint.ch>
next in thread | previous in thread | raw e-mail | index | archive | help
Markus, I don't know why I didn't notice this before.. I copied your cpuset ping verbatim, not realizing that I should be using 172.16.0.x as that's my network on the ix's On this tester box, 10.0.0.1 goes out a different interface, thus it never reported back any problems. Now that I've corrected that, I see I have problems on the same queues: CPU0 ping: sendto: No buffer space available ping: sendto: No buffer space available CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 CPU8 ping: sendto: No buffer space available ping: sendto: No buffer space available CPU9 CPU10 CPU11 CPU12 CPU13 CPU14 CPU15 CPU16 ping: sendto: No buffer space available ping: sendto: No buffer space available CPU17 CPU18 CPU19 CPU20 CPU21 CPU22 CPU23 I can run that three times and get the same CPU's. I'll try a reboot and see if they always fail on the same queues, tho I don't know if that would show anything. At this stage, NFS connections coming into the box are down, but I can still ping out. Incoming pings show 'host is down' Here is the dump of ix0 's sysctls (only ix0 is in use on this machine for testing) dev.ix.0.queue0.interrupt_rate: 500000 dev.ix.0.queue0.irqs: 100179 dev.ix.0.queue0.txd_head: 0 dev.ix.0.queue0.txd_tail: 0 dev.ix.0.queue0.tso_tx: 104156 dev.ix.0.queue0.no_tx_dma_setup: 0 dev.ix.0.queue0.no_desc_avail: 5 dev.ix.0.queue0.tx_packets: 279480 dev.ix.0.queue0.rxd_head: 513 dev.ix.0.queue0.rxd_tail: 512 dev.ix.0.queue0.rx_packets: 774424 dev.ix.0.queue0.rx_bytes: 281916 dev.ix.0.queue0.rx_copies: 4609 dev.ix.0.queue0.lro_queued: 0 dev.ix.0.queue0.lro_flushed: 0 dev.ix.0.queue1.interrupt_rate: 71428 dev.ix.0.queue1.irqs: 540682 dev.ix.0.queue1.txd_head: 1295 dev.ix.0.queue1.txd_tail: 1295 dev.ix.0.queue1.tso_tx: 15 dev.ix.0.queue1.no_tx_dma_setup: 0 dev.ix.0.queue1.no_desc_avail: 0 dev.ix.0.queue1.tx_packets: 93248 dev.ix.0.queue1.rxd_head: 0 dev.ix.0.queue1.rxd_tail: 2047 dev.ix.0.queue1.rx_packets: 462225 dev.ix.0.queue1.rx_bytes: 0 dev.ix.0.queue1.rx_copies: 0 dev.ix.0.queue1.lro_queued: 0 dev.ix.0.queue1.lro_flushed: 0 dev.ix.0.queue2.interrupt_rate: 71428 dev.ix.0.queue2.irqs: 282801 dev.ix.0.queue2.txd_head: 367 dev.ix.0.queue2.txd_tail: 367 dev.ix.0.queue2.tso_tx: 312757 dev.ix.0.queue2.no_tx_dma_setup: 0 dev.ix.0.queue2.no_desc_avail: 0 dev.ix.0.queue2.tx_packets: 876533 dev.ix.0.queue2.rxd_head: 0 dev.ix.0.queue2.rxd_tail: 2047 dev.ix.0.queue2.rx_packets: 2324954 dev.ix.0.queue2.rx_bytes: 0 dev.ix.0.queue2.rx_copies: 0 dev.ix.0.queue2.lro_queued: 0 dev.ix.0.queue2.lro_flushed: 0 dev.ix.0.queue3.interrupt_rate: 71428 dev.ix.0.queue3.irqs: 1424108 dev.ix.0.queue3.txd_head: 499 dev.ix.0.queue3.txd_tail: 499 dev.ix.0.queue3.tso_tx: 1263116 dev.ix.0.queue3.no_tx_dma_setup: 0 dev.ix.0.queue3.no_desc_avail: 0 dev.ix.0.queue3.tx_packets: 1590798 dev.ix.0.queue3.rxd_head: 0 dev.ix.0.queue3.rxd_tail: 2047 dev.ix.0.queue3.rx_packets: 8319143 dev.ix.0.queue3.rx_bytes: 0 dev.ix.0.queue3.rx_copies: 0 dev.ix.0.queue3.lro_queued: 0 dev.ix.0.queue3.lro_flushed: 0 dev.ix.0.queue4.interrupt_rate: 71428 dev.ix.0.queue4.irqs: 138019 dev.ix.0.queue4.txd_head: 1620 dev.ix.0.queue4.txd_tail: 1620 dev.ix.0.queue4.tso_tx: 29235 dev.ix.0.queue4.no_tx_dma_setup: 0 dev.ix.0.queue4.no_desc_avail: 0 dev.ix.0.queue4.tx_packets: 200853 dev.ix.0.queue4.rxd_head: 6 dev.ix.0.queue4.rxd_tail: 5 dev.ix.0.queue4.rx_packets: 218327 dev.ix.0.queue4.rx_bytes: 1527 dev.ix.0.queue4.rx_copies: 0 dev.ix.0.queue4.lro_queued: 0 dev.ix.0.queue4.lro_flushed: 0 dev.ix.0.queue5.interrupt_rate: 71428 dev.ix.0.queue5.irqs: 131367 dev.ix.0.queue5.txd_head: 330 dev.ix.0.queue5.txd_tail: 330 dev.ix.0.queue5.tso_tx: 9907 dev.ix.0.queue5.no_tx_dma_setup: 0 dev.ix.0.queue5.no_desc_avail: 0 dev.ix.0.queue5.tx_packets: 150955 dev.ix.0.queue5.rxd_head: 0 dev.ix.0.queue5.rxd_tail: 2047 dev.ix.0.queue5.rx_packets: 72814 dev.ix.0.queue5.rx_bytes: 0 dev.ix.0.queue5.rx_copies: 0 dev.ix.0.queue5.lro_queued: 0 dev.ix.0.queue5.lro_flushed: 0 dev.ix.0.queue6.interrupt_rate: 71428 dev.ix.0.queue6.irqs: 839814 dev.ix.0.queue6.txd_head: 1402 dev.ix.0.queue6.txd_tail: 1402 dev.ix.0.queue6.tso_tx: 327633 dev.ix.0.queue6.no_tx_dma_setup: 0 dev.ix.0.queue6.no_desc_avail: 0 dev.ix.0.queue6.tx_packets: 1371262 dev.ix.0.queue6.rxd_head: 0 dev.ix.0.queue6.rxd_tail: 2047 dev.ix.0.queue6.rx_packets: 2559592 dev.ix.0.queue6.rx_bytes: 0 dev.ix.0.queue6.rx_copies: 0 dev.ix.0.queue6.lro_queued: 0 dev.ix.0.queue6.lro_flushed: 0 dev.ix.0.queue7.interrupt_rate: 71428 dev.ix.0.queue7.irqs: 150693 dev.ix.0.queue7.txd_head: 1965 dev.ix.0.queue7.txd_tail: 1965 dev.ix.0.queue7.tso_tx: 248 dev.ix.0.queue7.no_tx_dma_setup: 0 dev.ix.0.queue7.no_desc_avail: 0 dev.ix.0.queue7.tx_packets: 145736 dev.ix.0.queue7.rxd_head: 0 dev.ix.0.queue7.rxd_tail: 2047 dev.ix.0.queue7.rx_packets: 19030 dev.ix.0.queue7.rx_bytes: 0 dev.ix.0.queue7.rx_copies: 0 dev.ix.0.queue7.lro_queued: 0 dev.ix.0.queue7.lro_flushed: 0 On Thu, Mar 20, 2014 at 7:40 AM, Markus Gebert <markus.gebert@hostpoint.ch>wrote: > > > Can you try this when the problem occurs? > > for CPU in {0..7}; do echo "CPU${CPU}"; cpuset -l ${CPU} ping -i 0.2 -c 2 > -W 1 10.0.0.1 | grep sendto; done > > It will tie ping to certain cpus to test the different tx queues of your > ix interface. If the pings reliably fail only on some queues, then your > problem is more likely to be the same as ours. > > Also, if you have dtrace available: > > kldload dtraceall > dtrace -n 'fbt:::return / arg1 == EFBIG && execname == "ping" / { stack(); > }' > > while you run pings over the interface affected. This will give you hints > about where the EFBIG error comes from. > > > [...] > > > Markus > > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAB2_NwBE=i7-HsyhsVAvOvNwJ6pD=hDTZBzdmV9KwJTXW45Hvw>