From owner-freebsd-net@FreeBSD.ORG Tue Feb 8 10:44:38 2011 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 74ED8106564A; Tue, 8 Feb 2011 10:44:38 +0000 (UTC) (envelope-from Michael.Tuexen@lurchi.franken.de) Received: from mail-n.franken.de (drew.ipv6.franken.de [IPv6:2001:638:a02:a001:20e:cff:fe4a:feaa]) by mx1.freebsd.org (Postfix) with ESMTP id DB8F98FC15; Tue, 8 Feb 2011 10:44:37 +0000 (UTC) Received: from [212.201.127.66] (unknown [212.201.127.66]) (Authenticated sender: macmic) by mail-n.franken.de (Postfix) with ESMTP id 004F01C0B4619; Tue, 8 Feb 2011 11:44:36 +0100 (CET) Mime-Version: 1.0 (Apple Message framework v1082) Content-Type: multipart/mixed; boundary=Apple-Mail-4-1027096974 From: =?iso-8859-1?Q?Michael_T=FCxen?= In-Reply-To: Date: Tue, 8 Feb 2011 11:44:36 +0100 Message-Id: References: To: Karim Fodil-Lemelin X-Mailer: Apple Mail (2.1082) Cc: pyunyh@gmail.com, jfv@freebsd.org, freebsd-net@freebsd.org Subject: Re: igb driver RX (was TX) hangs when out of mbuf clusters X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Feb 2011 10:44:38 -0000 --Apple-Mail-4-1027096974 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 On Feb 8, 2011, at 4:29 AM, Karim Fodil-Lemelin wrote: > 2011/2/7 Pyun YongHyeon >=20 >> On Mon, Feb 07, 2011 at 09:21:45PM -0500, Karim Fodil-Lemelin wrote: >>> 2011/2/7 Pyun YongHyeon >>>=20 >>>> On Mon, Feb 07, 2011 at 05:33:47PM -0500, Karim Fodil-Lemelin = wrote: >>>>> Subject: Re: igb driver tx hangs when out of mbuf clusters >>>>>=20 >>>>>> To: Lev Serebryakov >>>>>> Cc: freebsd-net@freebsd.org >>>>>>=20 >>>>>>=20 >>>>>> 2011/2/7 Lev Serebryakov >>>>>>=20 >>>>>> Hello, Karim. >>>>>>> You wrote 7 =D1=84=D0=B5=D0=B2=D1=80=D0=B0=D0=BB=D1=8F 2011 =D0=B3= ., 19:58:04: >>>>>>>=20 >>>>>>>=20 >>>>>>>> The issue is with the igb driver from 7.4 RC3 r218406. If the >> driver >>>>>>> runs >>>>>>>> out of mbuf clusters it simply stops receiving even after the >>>> clusters >>>>>>> have >>>>>>>> been freed. >>>>>>> It looks like my problems with em0 (see thread "em0 hangs >> without >>>>>>> any messages like "Watchdog timeout", only down/up reset = it.")... >>>>>>> Codebase for em and igb is somewhat common... >>>>>>>=20 >>>>>>> -- >>>>>>> // Black Lion AKA Lev Serebryakov >>>>>>>=20 >>>>>>> I agree. >>>>>>=20 >>>>>> Do you get missed packets in mac_stats (sysctl dev.em | grep >> missed)? >>>>>>=20 >>>>>> I might not have mentioned but I can also 'fix' the problem by >> doing >>>>>> ifconfig igb0 down/up. >>>>>>=20 >>>>>> I will try using POLLING to 'automatize' the reset as you = mentioned >> in >>>> your >>>>>> thread. >>>>>>=20 >>>>>> Karim. >>>>>>=20 >>>>>>=20 >>>>> Follow up on tests with POLLING: The problem is still occurring >> although >>>> it >>>>> takes more time ... Outputs of sysctl dev.igb0 and netstat -m will >>>> follow: >>>>>=20 >>>>> 9219/99426/108645 mbufs in use (current/cache/total) >>>>> 9217/90783/100000/100000 mbuf clusters in use >> (current/cache/total/max) >>>>=20 >>>> Do you see network processes are stuck in keglim state? If you see >>>> that I think that's not trivial to solve. You wouldn't even kill >>>> that process if it is under keglim state unless some more mbuf >>>> clusters are freed from other places. >>>>=20 >>>=20 >>> No keglim state, here is a snapshot of top -SH while the problem is >>> happening: >>>=20 >>> 12 root 171 ki31 0K 8K CPU5 5 19:27 100.00% = idle: >>> cpu5 >>> 10 root 171 ki31 0K 8K CPU7 7 19:26 100.00% = idle: >>> cpu7 >>> 14 root 171 ki31 0K 8K CPU3 3 19:25 100.00% = idle: >>> cpu3 >>> 11 root 171 ki31 0K 8K CPU6 6 19:25 100.00% = idle: >>> cpu6 >>> 13 root 171 ki31 0K 8K CPU4 4 19:24 100.00% = idle: >>> cpu4 >>> 15 root 171 ki31 0K 8K CPU2 2 19:22 100.00% = idle: >>> cpu2 >>> 16 root 171 ki31 0K 8K CPU1 1 19:18 100.00% = idle: >>> cpu1 >>> 17 root 171 ki31 0K 8K RUN 0 19:12 100.00% = idle: >>> cpu0 >>> 18 root -32 - 0K 8K WAIT 6 0:04 0.10% = swi4: >>> clock s >>> 20 root -44 - 0K 8K WAIT 4 0:08 0.00% = swi1: >> net >>> 29 root -68 - 0K 8K - 0 0:02 0.00% = igb0 >> que >>> 35 root -68 - 0K 8K - 2 0:02 0.00% em1 >> taskq >>> 28 root -68 - 0K 8K WAIT 5 0:01 0.00% = irq256: >>> igb0 >>>=20 >>> keep in mind that num_queues has been forced to 1. >>>=20 >>>=20 >>>>=20 >>>> I think both igb(4) and em(4) pass received frame to upper stack >>>> before allocating new RX buffer. If driver fails to allocate new RX >>>> buffer driver will try to refill RX buffers in next run. Under >>>> extreme resource shortage case, this situation can produce no more >>>> RX buffers in RX descriptor ring and this will take the box out of >>>> network. Other drivers avoid that situation by allocating new RX >>>> buffer before passing received frame to upper stack. If RX buffer >>>> allocation fails driver will just reuse old RX buffer without >>>> passing received frame to upper stack. That does not completely >>>> solve the keglim issue though. I think you should have enough mbuf >>>> cluters to avoid keglim. >>>>=20 >>>> However the output above indicates you have enough free mbuf >>>> clusters. So I guess igb(4) encountered zero available RX buffer >>>> situation in past but failed to refill the RX buffer again. I guess >>>> driver may be able to periodically check available RX buffers. >>>> Jack may have better idea if this was the case.(CCed) >>>>=20 >>>=20 >>> That is exactly the pattern. The driver runs out of clusters but = they >>> eventually get consumed and freed although the driver refuses to = process >> any >>> new frames. It is, on the other hand, perfectly capable of sending = out >>> packets. >>>=20 >>=20 >> Ok, this clearly indicates igb(4) failed to refill RX buffers since >> you can still send frames. I'm not sure whether igb(4) controllers >> could be configured to generate no RX buffer interrupts but that >> interrupt would be better suited to trigger RX refilling than timer >> based refilling. Since igb(4) keeps track of available RX buffers, >> igb(4) can selectively enable that interrupt once it see no RX >> buffers in the RX descriptor ring. However this does not work with >> polling. >>=20 >=20 > I think that your evaluation of the problem is correct although I do = not > understand the selective interrupt mechanism you described. >=20 > Precisely, the exact same behavior happens (RX hang) if options > DEVICE_POLLING is _not_ used in the kernel configuration file. I tried = with > POLLING since someone mentioned that it helped in a case mentioned = earlier > today. Unfortunately for igb with or without polling yields the same = rx ring > filing problem. >=20 > By the way I fixed the subject where I erroneously said TX was hanging = while > in fact RX is hanging and TX is just fine. Katim, could you apply the attached patch and report what the value of rx_nxt_check and rx_nxt_refresh is when the interface hangs. You get the values using sysctl -a dev.igb Best regards Michael --Apple-Mail-4-1027096974 Content-Disposition: attachment; filename=patch Content-Type: application/octet-stream; x-unix-mode=0644; name="patch" Content-Transfer-Encoding: 7bit Index: if_igb.c =================================================================== --- if_igb.c (revision 218406) +++ if_igb.c (working copy) @@ -5158,6 +5158,12 @@ SYSCTL_ADD_UQUAD(ctx, queue_list, OID_AUTO, "rx_bytes", CTLFLAG_RD, &rxr->rx_bytes, "Queue Bytes Received"); + SYSCTL_ADD_UINT(ctx, queue_list, OID_AUTO, "rx_nxt_refresh", + CTLFLAG_RD, &rxr->next_to_refresh, 0, + "Next to refresh"); + SYSCTL_ADD_UINT(ctx, queue_list, OID_AUTO, "rx_nxt_check", + CTLFLAG_RD, &rxr->next_to_check, 0, + "Next to check"); SYSCTL_ADD_INT(ctx, queue_list, OID_AUTO, "lro_queued", CTLFLAG_RD, &lro->lro_queued, 0, "LRO Queued"); Index: if_em.c =================================================================== --- if_em.c (revision 218406) +++ if_em.c (working copy) @@ -5137,6 +5137,12 @@ SYSCTL_ADD_ULONG(ctx, queue_list, OID_AUTO, "rx_irq", CTLFLAG_RD, &rxr->rx_irq, "Queue MSI-X Receive Interrupts"); + SYSCTL_ADD_UINT(ctx, queue_list, OID_AUTO, "rx_nxt_refresh", + CTLFLAG_RD, &rxr->next_to_refresh, 0, + "Next to refresh"); + SYSCTL_ADD_UINT(ctx, queue_list, OID_AUTO, "rx_nxt_check", + CTLFLAG_RD, &rxr->next_to_check, 0, + "Next to check"); } /* MAC stats get their own sub node */ --Apple-Mail-4-1027096974 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > --Apple-Mail-4-1027096974--