FreeBSD Mail Archives

Date:      Tue, 8 Feb 2011 11:39:45 -0500
From:      Karim Fodil-Lemelin <fodillemlinkarim@gmail.com>
To:        freebsd-net@freebsd.org
Subject:   Re: igb driver RX (was TX) hangs when out of mbuf clusters
Message-ID:  <AANLkTikMuFRY=W0%2BVtGKdWkJcOFVbdy=OOZNe_xFUC3R@mail.gmail.com>
In-Reply-To: <AANLkTinLg6QZz67e3Hhda-bzTX69XWNcdEkr3EZHFmSZ@mail.gmail.com>
References:  <AANLkTikrjkHDaBq%2Bx6MTZhzOeqWA=xtFpqQPsthFGmuf@mail.gmail.com> <D70A2DA6-23B7-442D-856C-4267359D66A5@lurchi.franken.de> <AANLkTinLg6QZz67e3Hhda-bzTX69XWNcdEkr3EZHFmSZ@mail.gmail.com>

> 2011/2/8 Michael T=C3=BCxen <Michael.Tuexen@lurchi.franken.de>
>
>> On Feb 8, 2011, at 4:29 AM, Karim Fodil-Lemelin wrote:
>>
>> > 2011/2/7 Pyun YongHyeon <pyunyh@gmail.com>
>> >
>> >> On Mon, Feb 07, 2011 at 09:21:45PM -0500, Karim Fodil-Lemelin wrote:
>> >>> 2011/2/7 Pyun YongHyeon <pyunyh@gmail.com>
>> >>>
>> >>>> On Mon, Feb 07, 2011 at 05:33:47PM -0500, Karim Fodil-Lemelin wrote=
:
>> >>>>> Subject: Re: igb driver tx hangs when out of mbuf clusters
>> >>>>>
>> >>>>>> To: Lev Serebryakov <lev@serebryakov.spb.ru>
>> >>>>>> Cc: freebsd-net@freebsd.org
>> >>>>>>
>> >>>>>>
>> >>>>>> 2011/2/7 Lev Serebryakov <lev@serebryakov.spb.ru>
>> >>>>>>
>> >>>>>> Hello, Karim.
>> >>>>>>> You wrote 7 =D1=84=D0=B5=D0=B2=D1=80=D0=B0=D0=BB=D1=8F 2011 =D0=
=B3., 19:58:04:
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>> The issue is with the igb driver from 7.4 RC3 r218406. If the
>> >> driver
>> >>>>>>> runs
>> >>>>>>>> out of mbuf clusters it simply stops receiving even after the
>> >>>> clusters
>> >>>>>>> have
>> >>>>>>>> been freed.
>> >>>>>>>  It looks like my problems with em0 (see thread "em0 hangs
>> >> without
>> >>>>>>> any messages like "Watchdog timeout", only down/up reset it.")..=
.
>> >>>>>>> Codebase for em and igb is somewhat common...
>> >>>>>>>
>> >>>>>>> --
>> >>>>>>> // Black Lion AKA Lev Serebryakov <lev@serebryakov.spb.ru>
>> >>>>>>>
>> >>>>>>> I agree.
>> >>>>>>
>> >>>>>> Do you get missed packets in mac_stats (sysctl dev.em | grep
>> >> missed)?
>> >>>>>>
>> >>>>>> I might not have mentioned but I can also 'fix' the problem by
>> >> doing
>> >>>>>> ifconfig igb0 down/up.
>> >>>>>>
>> >>>>>> I will try using POLLING to 'automatize' the reset as you mention=
ed
>> >> in
>> >>>> your
>> >>>>>> thread.
>> >>>>>>
>> >>>>>> Karim.
>> >>>>>>
>> >>>>>>
>> >>>>> Follow up on tests with POLLING: The problem is still occurring
>> >> although
>> >>>> it
>> >>>>> takes more time ... Outputs of sysctl dev.igb0 and netstat -m will
>> >>>> follow:
>> >>>>>
>> >>>>> 9219/99426/108645 mbufs in use (current/cache/total)
>> >>>>> 9217/90783/100000/100000 mbuf clusters in use
>> >> (current/cache/total/max)
>> >>>>
>> >>>> Do you see network processes are stuck in keglim state? If you see
>> >>>> that I think that's not trivial to solve. You wouldn't even kill
>> >>>> that process if it is under keglim state unless some more mbuf
>> >>>> clusters are freed from other places.
>> >>>>
>> >>>
>> >>> No keglim state, here is a snapshot of top -SH while the problem is
>> >>> happening:
>> >>>
>> >>>   12 root          171 ki31     0K     8K CPU5   5  19:27 100.00%
>> idle:
>> >>> cpu5
>> >>>   10 root          171 ki31     0K     8K CPU7   7  19:26 100.00%
>> idle:
>> >>> cpu7
>> >>>   14 root          171 ki31     0K     8K CPU3   3  19:25 100.00%
>> idle:
>> >>> cpu3
>> >>>   11 root          171 ki31     0K     8K CPU6   6  19:25 100.00%
>> idle:
>> >>> cpu6
>> >>>   13 root          171 ki31     0K     8K CPU4   4  19:24 100.00%
>> idle:
>> >>> cpu4
>> >>>   15 root          171 ki31     0K     8K CPU2   2  19:22 100.00%
>> idle:
>> >>> cpu2
>> >>>   16 root          171 ki31     0K     8K CPU1   1  19:18 100.00%
>> idle:
>> >>> cpu1
>> >>>   17 root          171 ki31     0K     8K RUN    0  19:12 100.00%
>> idle:
>> >>> cpu0
>> >>>   18 root          -32    -     0K     8K WAIT   6   0:04  0.10% swi=
4:
>> >>> clock s
>> >>>   20 root          -44    -     0K     8K WAIT   4   0:08  0.00% swi=
1:
>> >> net
>> >>>   29 root          -68    -     0K     8K -      0   0:02  0.00% igb=
0
>> >> que
>> >>>   35 root          -68    -     0K     8K -      2   0:02  0.00% em1
>> >> taskq
>> >>>   28 root          -68    -     0K     8K WAIT   5   0:01  0.00%
>> irq256:
>> >>> igb0
>> >>>
>> >>> keep in mind that num_queues has been forced to 1.
>> >>>
>> >>>
>> >>>>
>> >>>> I think both igb(4) and em(4) pass received frame to upper stack
>> >>>> before allocating new RX buffer. If driver fails to allocate new RX
>> >>>> buffer driver will try to refill RX buffers in next run. Under
>> >>>> extreme resource shortage case, this situation can produce no more
>> >>>> RX buffers in RX descriptor ring and this will take the box out of
>> >>>> network. Other drivers avoid that situation by allocating new RX
>> >>>> buffer before passing received frame to upper stack. If RX buffer
>> >>>> allocation fails driver will just reuse old RX buffer without
>> >>>> passing received frame to upper stack. That does not completely
>> >>>> solve the keglim issue though. I think you should have enough mbuf
>> >>>> cluters to avoid keglim.
>> >>>>
>> >>>> However the output above indicates you have enough free mbuf
>> >>>> clusters. So I guess igb(4) encountered zero available RX buffer
>> >>>> situation in past but failed to refill the RX buffer again. I guess
>> >>>> driver may be able to periodically check available RX buffers.
>> >>>> Jack may have better idea if this was the case.(CCed)
>> >>>>
>> >>>
>> >>> That is exactly the pattern. The driver runs out of clusters but the=
y
>> >>> eventually get consumed and freed although the driver refuses to
>> process
>> >> any
>> >>> new frames. It is, on the other hand, perfectly capable of sending o=
ut
>> >>> packets.
>> >>>
>> >>
>> >> Ok, this clearly indicates igb(4) failed to refill RX buffers since
>> >> you can still send frames. I'm not sure whether igb(4) controllers
>> >> could be configured to generate no RX buffer interrupts but that
>> >> interrupt would be better suited to trigger RX refilling than timer
>> >> based refilling. Since igb(4) keeps track of available RX buffers,
>> >> igb(4) can selectively enable that interrupt once it see no RX
>> >> buffers in the RX descriptor ring. However this does not work with
>> >> polling.
>> >>
>> >
>> > I think that your evaluation of the problem is correct although I do n=
ot
>> > understand the selective interrupt mechanism you described.
>> >
>> > Precisely, the exact same behavior happens (RX hang) if options
>> > DEVICE_POLLING is _not_ used in the kernel configuration file. I tried
>> with
>> > POLLING since someone mentioned that it helped in a case mentioned
>> earlier
>> > today. Unfortunately for igb with or without polling yields the same r=
x
>> ring
>> > filing problem.
>> >
>> > By the way I fixed the subject where I erroneously said TX was hanging
>> while
>> > in fact RX is hanging and TX is just fine.
>>
>> could you apply the attached patch and report what the value of
>> rx_nxt_check and rx_nxt_refresh is when the interface hangs.
>> You get the values using sysctl -a dev.igb
>>
>> Thanks for the help, here is the complete dev.igb.0 output including the
> two new sysctls. Some observations:
>
> 1) As always this only happens when we run out of mbufs clusters:
>
> # netstat -m
> 0/222/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
>
> 2) Both rx_nxt_check and rx_nxt_refresh are essentially the same througho=
ut
> the test but as soon as RX hangs, rx_nxt_check lags behind by 1.
>
> # sysctl dev.igb.0
> dev.igb.0.%desc: Intel(R) PRO/1000 Network Connection version - 2.0.7
> dev.igb.0.%driver: igb
> dev.igb.0.%location: slot=3D0 function=3D0
> dev.igb.0.%pnpinfo: vendor=3D0x8086 device=3D0x10a7 subvendor=3D0x8086
> subdevice=3D0x0000 class=3D0x020000
> dev.igb.0.%parent: pci7
> dev.igb.0.nvm: -1
> dev.igb.0.flow_control: 3
> dev.igb.0.enable_aim: 1
> dev.igb.0.rx_processing_limit: 100
> dev.igb.0.link_irq: 4
> dev.igb.0.dropped: 0
> dev.igb.0.tx_dma_fail: 0
> dev.igb.0.rx_overruns: 0
> dev.igb.0.watchdog_timeouts: 0
> dev.igb.0.device_control: 1490027073
> dev.igb.0.rx_control: 67141658
> dev.igb.0.interrupt_mask: 4
> dev.igb.0.extended_int_mask: 2147483905
> dev.igb.0.tx_buf_alloc: 14
> dev.igb.0.rx_buf_alloc: 34
> dev.igb.0.fc_high_water: 29488
> dev.igb.0.fc_low_water: 29480
> dev.igb.0.queue0.interrupt_rate: 100000
> dev.igb.0.queue0.txd_head: 825
> dev.igb.0.queue0.txd_tail: 825
> dev.igb.0.queue0.no_desc_avail: 0
> dev.igb.0.queue0.tx_packets: 75577
> dev.igb.0.queue0.rxd_head: 714
> dev.igb.0.queue0.rxd_tail: 714
> dev.igb.0.queue0.rx_packets: 141002
> dev.igb.0.queue0.rx_bytes: 208279964
> dev.igb.0.queue0.rx_nxt_refresh: 715
> dev.igb.0.queue0.rx_nxt_check: 714
> dev.igb.0.queue0.lro_queued: 0
> dev.igb.0.queue0.lro_flushed: 0
> dev.igb.0.mac_stats.excess_coll: 0
> dev.igb.0.mac_stats.single_coll: 0
> dev.igb.0.mac_stats.multiple_coll: 0
> dev.igb.0.mac_stats.late_coll: 0
> dev.igb.0.mac_stats.collision_count: 0
> dev.igb.0.mac_stats.symbol_errors: 0
> dev.igb.0.mac_stats.sequence_errors: 0
> dev.igb.0.mac_stats.defer_count: 0
> dev.igb.0.mac_stats.missed_packets: 2844
> dev.igb.0.mac_stats.recv_no_buff: 495
> dev.igb.0.mac_stats.recv_undersize: 0
> dev.igb.0.mac_stats.recv_fragmented: 0
> dev.igb.0.mac_stats.recv_oversize: 0
> dev.igb.0.mac_stats.recv_jabber: 0
> dev.igb.0.mac_stats.recv_errs: 0
> dev.igb.0.mac_stats.crc_errs: 0
> dev.igb.0.mac_stats.alignment_errs: 0
> dev.igb.0.mac_stats.coll_ext_errs: 0
> dev.igb.0.mac_stats.xon_recvd: 0
> dev.igb.0.mac_stats.xon_txd: 127
> dev.igb.0.mac_stats.xoff_recvd: 0
> dev.igb.0.mac_stats.xoff_txd: 2967
> dev.igb.0.mac_stats.total_pkts_recvd: 143870
> dev.igb.0.mac_stats.good_pkts_recvd: 141026
> dev.igb.0.mac_stats.bcast_pkts_recvd: 0
> dev.igb.0.mac_stats.mcast_pkts_recvd: 11
> dev.igb.0.mac_stats.rx_frames_64: 70
> dev.igb.0.mac_stats.rx_frames_65_127: 693
> dev.igb.0.mac_stats.rx_frames_128_255: 11
> dev.igb.0.mac_stats.rx_frames_256_511: 12
> dev.igb.0.mac_stats.rx_frames_512_1023: 20
> dev.igb.0.mac_stats.rx_frames_1024_1522: 140220
> dev.igb.0.mac_stats.good_octets_recvd: 208878321
> dev.igb.0.mac_stats.good_octets_txd: 4864240
> dev.igb.0.mac_stats.total_pkts_txd: 78671
> dev.igb.0.mac_stats.good_pkts_txd: 75577
> dev.igb.0.mac_stats.bcast_pkts_txd: 693
> dev.igb.0.mac_stats.mcast_pkts_txd: 346
> dev.igb.0.mac_stats.tx_frames_64: 74175
> dev.igb.0.mac_stats.tx_frames_65_127: 1402
> dev.igb.0.mac_stats.tx_frames_128_255: 0
> dev.igb.0.mac_stats.tx_frames_256_511: 0
> dev.igb.0.mac_stats.tx_frames_512_1023: 0
> dev.igb.0.mac_stats.tx_frames_1024_1522: 0
> dev.igb.0.mac_stats.tso_txd: 0
> dev.igb.0.mac_stats.tso_ctx_fail: 0
> dev.igb.0.interrupts.asserts: 5716
> dev.igb.0.interrupts.rx_pkt_timer: 141023
> dev.igb.0.interrupts.rx_abs_timer: 140978
> dev.igb.0.interrupts.tx_pkt_timer: 75577
> dev.igb.0.interrupts.tx_abs_timer: 0
> dev.igb.0.interrupts.tx_queue_empty: 75577
> dev.igb.0.interrupts.tx_queue_min_thresh: 0
> dev.igb.0.interrupts.rx_desc_min_thresh: 16
> dev.igb.0.interrupts.rx_overrun: 0
> dev.igb.0.host.breaker_tx_pkt: 0
> dev.igb.0.host.host_tx_pkt_discard: 0
> dev.igb.0.host.rx_pkt: 3
> dev.igb.0.host.breaker_rx_pkts: 0
> dev.igb.0.host.breaker_rx_pkt_drop: 0
> dev.igb.0.host.tx_good_pkt: 0
> dev.igb.0.host.breaker_tx_pkt_drop: 0
> dev.igb.0.host.rx_good_bytes: 208878321
> dev.igb.0.host.tx_good_bytes: 4864304
> dev.igb.0.host.length_errors: 0
> dev.igb.0.host.serdes_violation_pkt: 0
> dev.igb.0.host.header_redir_missed: 0
>
>

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AANLkTikMuFRY=W0%2BVtGKdWkJcOFVbdy=OOZNe_xFUC3R>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation