Date: Tue, 8 Feb 2011 11:39:45 -0500 From: Karim Fodil-Lemelin <fodillemlinkarim@gmail.com> To: freebsd-net@freebsd.org Subject: Re: igb driver RX (was TX) hangs when out of mbuf clusters Message-ID: <AANLkTikMuFRY=W0%2BVtGKdWkJcOFVbdy=OOZNe_xFUC3R@mail.gmail.com> In-Reply-To: <AANLkTinLg6QZz67e3Hhda-bzTX69XWNcdEkr3EZHFmSZ@mail.gmail.com> References: <AANLkTikrjkHDaBq%2Bx6MTZhzOeqWA=xtFpqQPsthFGmuf@mail.gmail.com> <D70A2DA6-23B7-442D-856C-4267359D66A5@lurchi.franken.de> <AANLkTinLg6QZz67e3Hhda-bzTX69XWNcdEkr3EZHFmSZ@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
> 2011/2/8 Michael T=C3=BCxen <Michael.Tuexen@lurchi.franken.de> > >> On Feb 8, 2011, at 4:29 AM, Karim Fodil-Lemelin wrote: >> >> > 2011/2/7 Pyun YongHyeon <pyunyh@gmail.com> >> > >> >> On Mon, Feb 07, 2011 at 09:21:45PM -0500, Karim Fodil-Lemelin wrote: >> >>> 2011/2/7 Pyun YongHyeon <pyunyh@gmail.com> >> >>> >> >>>> On Mon, Feb 07, 2011 at 05:33:47PM -0500, Karim Fodil-Lemelin wrote= : >> >>>>> Subject: Re: igb driver tx hangs when out of mbuf clusters >> >>>>> >> >>>>>> To: Lev Serebryakov <lev@serebryakov.spb.ru> >> >>>>>> Cc: freebsd-net@freebsd.org >> >>>>>> >> >>>>>> >> >>>>>> 2011/2/7 Lev Serebryakov <lev@serebryakov.spb.ru> >> >>>>>> >> >>>>>> Hello, Karim. >> >>>>>>> You wrote 7 =D1=84=D0=B5=D0=B2=D1=80=D0=B0=D0=BB=D1=8F 2011 =D0= =B3., 19:58:04: >> >>>>>>> >> >>>>>>> >> >>>>>>>> The issue is with the igb driver from 7.4 RC3 r218406. If the >> >> driver >> >>>>>>> runs >> >>>>>>>> out of mbuf clusters it simply stops receiving even after the >> >>>> clusters >> >>>>>>> have >> >>>>>>>> been freed. >> >>>>>>> It looks like my problems with em0 (see thread "em0 hangs >> >> without >> >>>>>>> any messages like "Watchdog timeout", only down/up reset it.")..= . >> >>>>>>> Codebase for em and igb is somewhat common... >> >>>>>>> >> >>>>>>> -- >> >>>>>>> // Black Lion AKA Lev Serebryakov <lev@serebryakov.spb.ru> >> >>>>>>> >> >>>>>>> I agree. >> >>>>>> >> >>>>>> Do you get missed packets in mac_stats (sysctl dev.em | grep >> >> missed)? >> >>>>>> >> >>>>>> I might not have mentioned but I can also 'fix' the problem by >> >> doing >> >>>>>> ifconfig igb0 down/up. >> >>>>>> >> >>>>>> I will try using POLLING to 'automatize' the reset as you mention= ed >> >> in >> >>>> your >> >>>>>> thread. >> >>>>>> >> >>>>>> Karim. >> >>>>>> >> >>>>>> >> >>>>> Follow up on tests with POLLING: The problem is still occurring >> >> although >> >>>> it >> >>>>> takes more time ... Outputs of sysctl dev.igb0 and netstat -m will >> >>>> follow: >> >>>>> >> >>>>> 9219/99426/108645 mbufs in use (current/cache/total) >> >>>>> 9217/90783/100000/100000 mbuf clusters in use >> >> (current/cache/total/max) >> >>>> >> >>>> Do you see network processes are stuck in keglim state? If you see >> >>>> that I think that's not trivial to solve. You wouldn't even kill >> >>>> that process if it is under keglim state unless some more mbuf >> >>>> clusters are freed from other places. >> >>>> >> >>> >> >>> No keglim state, here is a snapshot of top -SH while the problem is >> >>> happening: >> >>> >> >>> 12 root 171 ki31 0K 8K CPU5 5 19:27 100.00% >> idle: >> >>> cpu5 >> >>> 10 root 171 ki31 0K 8K CPU7 7 19:26 100.00% >> idle: >> >>> cpu7 >> >>> 14 root 171 ki31 0K 8K CPU3 3 19:25 100.00% >> idle: >> >>> cpu3 >> >>> 11 root 171 ki31 0K 8K CPU6 6 19:25 100.00% >> idle: >> >>> cpu6 >> >>> 13 root 171 ki31 0K 8K CPU4 4 19:24 100.00% >> idle: >> >>> cpu4 >> >>> 15 root 171 ki31 0K 8K CPU2 2 19:22 100.00% >> idle: >> >>> cpu2 >> >>> 16 root 171 ki31 0K 8K CPU1 1 19:18 100.00% >> idle: >> >>> cpu1 >> >>> 17 root 171 ki31 0K 8K RUN 0 19:12 100.00% >> idle: >> >>> cpu0 >> >>> 18 root -32 - 0K 8K WAIT 6 0:04 0.10% swi= 4: >> >>> clock s >> >>> 20 root -44 - 0K 8K WAIT 4 0:08 0.00% swi= 1: >> >> net >> >>> 29 root -68 - 0K 8K - 0 0:02 0.00% igb= 0 >> >> que >> >>> 35 root -68 - 0K 8K - 2 0:02 0.00% em1 >> >> taskq >> >>> 28 root -68 - 0K 8K WAIT 5 0:01 0.00% >> irq256: >> >>> igb0 >> >>> >> >>> keep in mind that num_queues has been forced to 1. >> >>> >> >>> >> >>>> >> >>>> I think both igb(4) and em(4) pass received frame to upper stack >> >>>> before allocating new RX buffer. If driver fails to allocate new RX >> >>>> buffer driver will try to refill RX buffers in next run. Under >> >>>> extreme resource shortage case, this situation can produce no more >> >>>> RX buffers in RX descriptor ring and this will take the box out of >> >>>> network. Other drivers avoid that situation by allocating new RX >> >>>> buffer before passing received frame to upper stack. If RX buffer >> >>>> allocation fails driver will just reuse old RX buffer without >> >>>> passing received frame to upper stack. That does not completely >> >>>> solve the keglim issue though. I think you should have enough mbuf >> >>>> cluters to avoid keglim. >> >>>> >> >>>> However the output above indicates you have enough free mbuf >> >>>> clusters. So I guess igb(4) encountered zero available RX buffer >> >>>> situation in past but failed to refill the RX buffer again. I guess >> >>>> driver may be able to periodically check available RX buffers. >> >>>> Jack may have better idea if this was the case.(CCed) >> >>>> >> >>> >> >>> That is exactly the pattern. The driver runs out of clusters but the= y >> >>> eventually get consumed and freed although the driver refuses to >> process >> >> any >> >>> new frames. It is, on the other hand, perfectly capable of sending o= ut >> >>> packets. >> >>> >> >> >> >> Ok, this clearly indicates igb(4) failed to refill RX buffers since >> >> you can still send frames. I'm not sure whether igb(4) controllers >> >> could be configured to generate no RX buffer interrupts but that >> >> interrupt would be better suited to trigger RX refilling than timer >> >> based refilling. Since igb(4) keeps track of available RX buffers, >> >> igb(4) can selectively enable that interrupt once it see no RX >> >> buffers in the RX descriptor ring. However this does not work with >> >> polling. >> >> >> > >> > I think that your evaluation of the problem is correct although I do n= ot >> > understand the selective interrupt mechanism you described. >> > >> > Precisely, the exact same behavior happens (RX hang) if options >> > DEVICE_POLLING is _not_ used in the kernel configuration file. I tried >> with >> > POLLING since someone mentioned that it helped in a case mentioned >> earlier >> > today. Unfortunately for igb with or without polling yields the same r= x >> ring >> > filing problem. >> > >> > By the way I fixed the subject where I erroneously said TX was hanging >> while >> > in fact RX is hanging and TX is just fine. >> >> could you apply the attached patch and report what the value of >> rx_nxt_check and rx_nxt_refresh is when the interface hangs. >> You get the values using sysctl -a dev.igb >> >> Thanks for the help, here is the complete dev.igb.0 output including the > two new sysctls. Some observations: > > 1) As always this only happens when we run out of mbufs clusters: > > # netstat -m > 0/222/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) > > 2) Both rx_nxt_check and rx_nxt_refresh are essentially the same througho= ut > the test but as soon as RX hangs, rx_nxt_check lags behind by 1. > > # sysctl dev.igb.0 > dev.igb.0.%desc: Intel(R) PRO/1000 Network Connection version - 2.0.7 > dev.igb.0.%driver: igb > dev.igb.0.%location: slot=3D0 function=3D0 > dev.igb.0.%pnpinfo: vendor=3D0x8086 device=3D0x10a7 subvendor=3D0x8086 > subdevice=3D0x0000 class=3D0x020000 > dev.igb.0.%parent: pci7 > dev.igb.0.nvm: -1 > dev.igb.0.flow_control: 3 > dev.igb.0.enable_aim: 1 > dev.igb.0.rx_processing_limit: 100 > dev.igb.0.link_irq: 4 > dev.igb.0.dropped: 0 > dev.igb.0.tx_dma_fail: 0 > dev.igb.0.rx_overruns: 0 > dev.igb.0.watchdog_timeouts: 0 > dev.igb.0.device_control: 1490027073 > dev.igb.0.rx_control: 67141658 > dev.igb.0.interrupt_mask: 4 > dev.igb.0.extended_int_mask: 2147483905 > dev.igb.0.tx_buf_alloc: 14 > dev.igb.0.rx_buf_alloc: 34 > dev.igb.0.fc_high_water: 29488 > dev.igb.0.fc_low_water: 29480 > dev.igb.0.queue0.interrupt_rate: 100000 > dev.igb.0.queue0.txd_head: 825 > dev.igb.0.queue0.txd_tail: 825 > dev.igb.0.queue0.no_desc_avail: 0 > dev.igb.0.queue0.tx_packets: 75577 > dev.igb.0.queue0.rxd_head: 714 > dev.igb.0.queue0.rxd_tail: 714 > dev.igb.0.queue0.rx_packets: 141002 > dev.igb.0.queue0.rx_bytes: 208279964 > dev.igb.0.queue0.rx_nxt_refresh: 715 > dev.igb.0.queue0.rx_nxt_check: 714 > dev.igb.0.queue0.lro_queued: 0 > dev.igb.0.queue0.lro_flushed: 0 > dev.igb.0.mac_stats.excess_coll: 0 > dev.igb.0.mac_stats.single_coll: 0 > dev.igb.0.mac_stats.multiple_coll: 0 > dev.igb.0.mac_stats.late_coll: 0 > dev.igb.0.mac_stats.collision_count: 0 > dev.igb.0.mac_stats.symbol_errors: 0 > dev.igb.0.mac_stats.sequence_errors: 0 > dev.igb.0.mac_stats.defer_count: 0 > dev.igb.0.mac_stats.missed_packets: 2844 > dev.igb.0.mac_stats.recv_no_buff: 495 > dev.igb.0.mac_stats.recv_undersize: 0 > dev.igb.0.mac_stats.recv_fragmented: 0 > dev.igb.0.mac_stats.recv_oversize: 0 > dev.igb.0.mac_stats.recv_jabber: 0 > dev.igb.0.mac_stats.recv_errs: 0 > dev.igb.0.mac_stats.crc_errs: 0 > dev.igb.0.mac_stats.alignment_errs: 0 > dev.igb.0.mac_stats.coll_ext_errs: 0 > dev.igb.0.mac_stats.xon_recvd: 0 > dev.igb.0.mac_stats.xon_txd: 127 > dev.igb.0.mac_stats.xoff_recvd: 0 > dev.igb.0.mac_stats.xoff_txd: 2967 > dev.igb.0.mac_stats.total_pkts_recvd: 143870 > dev.igb.0.mac_stats.good_pkts_recvd: 141026 > dev.igb.0.mac_stats.bcast_pkts_recvd: 0 > dev.igb.0.mac_stats.mcast_pkts_recvd: 11 > dev.igb.0.mac_stats.rx_frames_64: 70 > dev.igb.0.mac_stats.rx_frames_65_127: 693 > dev.igb.0.mac_stats.rx_frames_128_255: 11 > dev.igb.0.mac_stats.rx_frames_256_511: 12 > dev.igb.0.mac_stats.rx_frames_512_1023: 20 > dev.igb.0.mac_stats.rx_frames_1024_1522: 140220 > dev.igb.0.mac_stats.good_octets_recvd: 208878321 > dev.igb.0.mac_stats.good_octets_txd: 4864240 > dev.igb.0.mac_stats.total_pkts_txd: 78671 > dev.igb.0.mac_stats.good_pkts_txd: 75577 > dev.igb.0.mac_stats.bcast_pkts_txd: 693 > dev.igb.0.mac_stats.mcast_pkts_txd: 346 > dev.igb.0.mac_stats.tx_frames_64: 74175 > dev.igb.0.mac_stats.tx_frames_65_127: 1402 > dev.igb.0.mac_stats.tx_frames_128_255: 0 > dev.igb.0.mac_stats.tx_frames_256_511: 0 > dev.igb.0.mac_stats.tx_frames_512_1023: 0 > dev.igb.0.mac_stats.tx_frames_1024_1522: 0 > dev.igb.0.mac_stats.tso_txd: 0 > dev.igb.0.mac_stats.tso_ctx_fail: 0 > dev.igb.0.interrupts.asserts: 5716 > dev.igb.0.interrupts.rx_pkt_timer: 141023 > dev.igb.0.interrupts.rx_abs_timer: 140978 > dev.igb.0.interrupts.tx_pkt_timer: 75577 > dev.igb.0.interrupts.tx_abs_timer: 0 > dev.igb.0.interrupts.tx_queue_empty: 75577 > dev.igb.0.interrupts.tx_queue_min_thresh: 0 > dev.igb.0.interrupts.rx_desc_min_thresh: 16 > dev.igb.0.interrupts.rx_overrun: 0 > dev.igb.0.host.breaker_tx_pkt: 0 > dev.igb.0.host.host_tx_pkt_discard: 0 > dev.igb.0.host.rx_pkt: 3 > dev.igb.0.host.breaker_rx_pkts: 0 > dev.igb.0.host.breaker_rx_pkt_drop: 0 > dev.igb.0.host.tx_good_pkt: 0 > dev.igb.0.host.breaker_tx_pkt_drop: 0 > dev.igb.0.host.rx_good_bytes: 208878321 > dev.igb.0.host.tx_good_bytes: 4864304 > dev.igb.0.host.length_errors: 0 > dev.igb.0.host.serdes_violation_pkt: 0 > dev.igb.0.host.header_redir_missed: 0 > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AANLkTikMuFRY=W0%2BVtGKdWkJcOFVbdy=OOZNe_xFUC3R>