Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 21 Mar 2014 12:22:01 -0300
From:      Christopher Forgeron <csforgeron@gmail.com>
To:        Markus Gebert <markus.gebert@hostpoint.ch>
Cc:        FreeBSD Net <freebsd-net@freebsd.org>, Rick Macklem <rmacklem@uoguelph.ca>, Jack Vogel <jfvogel@gmail.com>
Subject:   Re: 9.2 ixgbe tx queue hang
Message-ID:  <CAB2_NwBE=i7-HsyhsVAvOvNwJ6pD=hDTZBzdmV9KwJTXW45Hvw@mail.gmail.com>
In-Reply-To: <FA262955-B3A9-48EC-828B-FF0D4D5D0498@hostpoint.ch>
References:  <CAB2_NwDG=gB1WCJ7JKTHpkJCrvPuAhipkn%2BvPyT%2BxXzOBrTGkg@mail.gmail.com> <FA262955-B3A9-48EC-828B-FF0D4D5D0498@hostpoint.ch>

next in thread | previous in thread | raw e-mail | index | archive | help
Markus,

 I don't know why I didn't notice this before.. I copied your cpuset ping
verbatim, not realizing that I should be using 172.16.0.x as that's my
network on the ix's

 On this tester box, 10.0.0.1 goes out a different interface, thus it never
reported back any problems.

 Now that I've corrected that, I see I have problems on the same queues:

CPU0
ping: sendto: No buffer space available
ping: sendto: No buffer space available
CPU1
CPU2
CPU3
CPU4
CPU5
CPU6
CPU7
CPU8
ping: sendto: No buffer space available
ping: sendto: No buffer space available
CPU9
CPU10
CPU11
CPU12
CPU13
CPU14
CPU15
CPU16
ping: sendto: No buffer space available
ping: sendto: No buffer space available
CPU17
CPU18
CPU19
CPU20
CPU21
CPU22
CPU23

I can run that three times and get the same CPU's. I'll try a reboot and
see if they always fail on the same queues, tho I don't know if that would
show anything.

At this stage, NFS connections coming into the box are down, but I can
still ping out.  Incoming pings show 'host is down'

Here is the dump of ix0 's sysctls (only ix0 is in use on this machine for
testing)

dev.ix.0.queue0.interrupt_rate: 500000
dev.ix.0.queue0.irqs: 100179
dev.ix.0.queue0.txd_head: 0
dev.ix.0.queue0.txd_tail: 0
dev.ix.0.queue0.tso_tx: 104156
dev.ix.0.queue0.no_tx_dma_setup: 0
dev.ix.0.queue0.no_desc_avail: 5
dev.ix.0.queue0.tx_packets: 279480
dev.ix.0.queue0.rxd_head: 513
dev.ix.0.queue0.rxd_tail: 512
dev.ix.0.queue0.rx_packets: 774424
dev.ix.0.queue0.rx_bytes: 281916
dev.ix.0.queue0.rx_copies: 4609
dev.ix.0.queue0.lro_queued: 0
dev.ix.0.queue0.lro_flushed: 0
dev.ix.0.queue1.interrupt_rate: 71428
dev.ix.0.queue1.irqs: 540682
dev.ix.0.queue1.txd_head: 1295
dev.ix.0.queue1.txd_tail: 1295
dev.ix.0.queue1.tso_tx: 15
dev.ix.0.queue1.no_tx_dma_setup: 0
dev.ix.0.queue1.no_desc_avail: 0
dev.ix.0.queue1.tx_packets: 93248
dev.ix.0.queue1.rxd_head: 0
dev.ix.0.queue1.rxd_tail: 2047
dev.ix.0.queue1.rx_packets: 462225
dev.ix.0.queue1.rx_bytes: 0
dev.ix.0.queue1.rx_copies: 0
dev.ix.0.queue1.lro_queued: 0
dev.ix.0.queue1.lro_flushed: 0
dev.ix.0.queue2.interrupt_rate: 71428
dev.ix.0.queue2.irqs: 282801
dev.ix.0.queue2.txd_head: 367
dev.ix.0.queue2.txd_tail: 367
dev.ix.0.queue2.tso_tx: 312757
dev.ix.0.queue2.no_tx_dma_setup: 0
dev.ix.0.queue2.no_desc_avail: 0
dev.ix.0.queue2.tx_packets: 876533
dev.ix.0.queue2.rxd_head: 0
dev.ix.0.queue2.rxd_tail: 2047
dev.ix.0.queue2.rx_packets: 2324954
dev.ix.0.queue2.rx_bytes: 0
dev.ix.0.queue2.rx_copies: 0
dev.ix.0.queue2.lro_queued: 0
dev.ix.0.queue2.lro_flushed: 0
dev.ix.0.queue3.interrupt_rate: 71428
dev.ix.0.queue3.irqs: 1424108
dev.ix.0.queue3.txd_head: 499
dev.ix.0.queue3.txd_tail: 499
dev.ix.0.queue3.tso_tx: 1263116
dev.ix.0.queue3.no_tx_dma_setup: 0
dev.ix.0.queue3.no_desc_avail: 0
dev.ix.0.queue3.tx_packets: 1590798
dev.ix.0.queue3.rxd_head: 0
dev.ix.0.queue3.rxd_tail: 2047
dev.ix.0.queue3.rx_packets: 8319143
dev.ix.0.queue3.rx_bytes: 0
dev.ix.0.queue3.rx_copies: 0
dev.ix.0.queue3.lro_queued: 0
dev.ix.0.queue3.lro_flushed: 0
dev.ix.0.queue4.interrupt_rate: 71428
dev.ix.0.queue4.irqs: 138019
dev.ix.0.queue4.txd_head: 1620
dev.ix.0.queue4.txd_tail: 1620
dev.ix.0.queue4.tso_tx: 29235
dev.ix.0.queue4.no_tx_dma_setup: 0
dev.ix.0.queue4.no_desc_avail: 0
dev.ix.0.queue4.tx_packets: 200853
dev.ix.0.queue4.rxd_head: 6
dev.ix.0.queue4.rxd_tail: 5
dev.ix.0.queue4.rx_packets: 218327
dev.ix.0.queue4.rx_bytes: 1527
dev.ix.0.queue4.rx_copies: 0
dev.ix.0.queue4.lro_queued: 0
dev.ix.0.queue4.lro_flushed: 0
dev.ix.0.queue5.interrupt_rate: 71428
dev.ix.0.queue5.irqs: 131367
dev.ix.0.queue5.txd_head: 330
dev.ix.0.queue5.txd_tail: 330
dev.ix.0.queue5.tso_tx: 9907
dev.ix.0.queue5.no_tx_dma_setup: 0
dev.ix.0.queue5.no_desc_avail: 0
dev.ix.0.queue5.tx_packets: 150955
dev.ix.0.queue5.rxd_head: 0
dev.ix.0.queue5.rxd_tail: 2047
dev.ix.0.queue5.rx_packets: 72814
dev.ix.0.queue5.rx_bytes: 0
dev.ix.0.queue5.rx_copies: 0
dev.ix.0.queue5.lro_queued: 0
dev.ix.0.queue5.lro_flushed: 0
dev.ix.0.queue6.interrupt_rate: 71428
dev.ix.0.queue6.irqs: 839814
dev.ix.0.queue6.txd_head: 1402
dev.ix.0.queue6.txd_tail: 1402
dev.ix.0.queue6.tso_tx: 327633
dev.ix.0.queue6.no_tx_dma_setup: 0
dev.ix.0.queue6.no_desc_avail: 0
dev.ix.0.queue6.tx_packets: 1371262
dev.ix.0.queue6.rxd_head: 0
dev.ix.0.queue6.rxd_tail: 2047
dev.ix.0.queue6.rx_packets: 2559592
dev.ix.0.queue6.rx_bytes: 0
dev.ix.0.queue6.rx_copies: 0
dev.ix.0.queue6.lro_queued: 0
dev.ix.0.queue6.lro_flushed: 0
dev.ix.0.queue7.interrupt_rate: 71428
dev.ix.0.queue7.irqs: 150693
dev.ix.0.queue7.txd_head: 1965
dev.ix.0.queue7.txd_tail: 1965
dev.ix.0.queue7.tso_tx: 248
dev.ix.0.queue7.no_tx_dma_setup: 0
dev.ix.0.queue7.no_desc_avail: 0
dev.ix.0.queue7.tx_packets: 145736
dev.ix.0.queue7.rxd_head: 0
dev.ix.0.queue7.rxd_tail: 2047
dev.ix.0.queue7.rx_packets: 19030
dev.ix.0.queue7.rx_bytes: 0
dev.ix.0.queue7.rx_copies: 0
dev.ix.0.queue7.lro_queued: 0
dev.ix.0.queue7.lro_flushed: 0



On Thu, Mar 20, 2014 at 7:40 AM, Markus Gebert
<markus.gebert@hostpoint.ch>wrote:

>
>
> Can you try this when the problem occurs?
>
> for CPU in {0..7}; do echo "CPU${CPU}"; cpuset -l ${CPU} ping -i 0.2 -c 2
> -W 1 10.0.0.1 | grep sendto; done
>
> It will tie ping to certain cpus to test the different tx queues of your
> ix interface. If the pings reliably fail only on some queues, then your
> problem is more likely to be the same as ours.
>
> Also, if you have dtrace available:
>
> kldload dtraceall
> dtrace -n 'fbt:::return / arg1 == EFBIG && execname == "ping" / { stack();
> }'
>
> while you run pings over the interface affected. This will give you hints
> about where the EFBIG error comes from.
>
> > [...]
>
>
> Markus
>
>
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAB2_NwBE=i7-HsyhsVAvOvNwJ6pD=hDTZBzdmV9KwJTXW45Hvw>