From owner-freebsd-net@FreeBSD.ORG  Tue Feb  8 10:44:38 2011
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 74ED8106564A;
	Tue,  8 Feb 2011 10:44:38 +0000 (UTC)
	(envelope-from Michael.Tuexen@lurchi.franken.de)
Received: from mail-n.franken.de (drew.ipv6.franken.de
	[IPv6:2001:638:a02:a001:20e:cff:fe4a:feaa])
	by mx1.freebsd.org (Postfix) with ESMTP id DB8F98FC15;
	Tue,  8 Feb 2011 10:44:37 +0000 (UTC)
Received: from [212.201.127.66] (unknown [212.201.127.66])
	(Authenticated sender: macmic)
	by mail-n.franken.de (Postfix) with ESMTP id 004F01C0B4619;
	Tue,  8 Feb 2011 11:44:36 +0100 (CET)
Mime-Version: 1.0 (Apple Message framework v1082)
Content-Type: multipart/mixed; boundary=Apple-Mail-4-1027096974
From: =?iso-8859-1?Q?Michael_T=FCxen?= <Michael.Tuexen@lurchi.franken.de>
In-Reply-To: <AANLkTikrjkHDaBq+x6MTZhzOeqWA=xtFpqQPsthFGmuf@mail.gmail.com>
Date: Tue, 8 Feb 2011 11:44:36 +0100
Message-Id: <D70A2DA6-23B7-442D-856C-4267359D66A5@lurchi.franken.de>
References: <AANLkTikrjkHDaBq+x6MTZhzOeqWA=xtFpqQPsthFGmuf@mail.gmail.com>
To: Karim Fodil-Lemelin <fodillemlinkarim@gmail.com>
X-Mailer: Apple Mail (2.1082)
Cc: pyunyh@gmail.com, jfv@freebsd.org, freebsd-net@freebsd.org
Subject: Re: igb driver RX (was TX) hangs when out of mbuf clusters
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 08 Feb 2011 10:44:38 -0000


--Apple-Mail-4-1027096974
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=utf-8

On Feb 8, 2011, at 4:29 AM, Karim Fodil-Lemelin wrote:

> 2011/2/7 Pyun YongHyeon <pyunyh@gmail.com>
>=20
>> On Mon, Feb 07, 2011 at 09:21:45PM -0500, Karim Fodil-Lemelin wrote:
>>> 2011/2/7 Pyun YongHyeon <pyunyh@gmail.com>
>>>=20
>>>> On Mon, Feb 07, 2011 at 05:33:47PM -0500, Karim Fodil-Lemelin =
wrote:
>>>>> Subject: Re: igb driver tx hangs when out of mbuf clusters
>>>>>=20
>>>>>> To: Lev Serebryakov <lev@serebryakov.spb.ru>
>>>>>> Cc: freebsd-net@freebsd.org
>>>>>>=20
>>>>>>=20
>>>>>> 2011/2/7 Lev Serebryakov <lev@serebryakov.spb.ru>
>>>>>>=20
>>>>>> Hello, Karim.
>>>>>>> You wrote 7 =D1=84=D0=B5=D0=B2=D1=80=D0=B0=D0=BB=D1=8F 2011 =D0=B3=
., 19:58:04:
>>>>>>>=20
>>>>>>>=20
>>>>>>>> The issue is with the igb driver from 7.4 RC3 r218406. If the
>> driver
>>>>>>> runs
>>>>>>>> out of mbuf clusters it simply stops receiving even after the
>>>> clusters
>>>>>>> have
>>>>>>>> been freed.
>>>>>>>  It looks like my problems with em0 (see thread "em0 hangs
>> without
>>>>>>> any messages like "Watchdog timeout", only down/up reset =
it.")...
>>>>>>> Codebase for em and igb is somewhat common...
>>>>>>>=20
>>>>>>> --
>>>>>>> // Black Lion AKA Lev Serebryakov <lev@serebryakov.spb.ru>
>>>>>>>=20
>>>>>>> I agree.
>>>>>>=20
>>>>>> Do you get missed packets in mac_stats (sysctl dev.em | grep
>> missed)?
>>>>>>=20
>>>>>> I might not have mentioned but I can also 'fix' the problem by
>> doing
>>>>>> ifconfig igb0 down/up.
>>>>>>=20
>>>>>> I will try using POLLING to 'automatize' the reset as you =
mentioned
>> in
>>>> your
>>>>>> thread.
>>>>>>=20
>>>>>> Karim.
>>>>>>=20
>>>>>>=20
>>>>> Follow up on tests with POLLING: The problem is still occurring
>> although
>>>> it
>>>>> takes more time ... Outputs of sysctl dev.igb0 and netstat -m will
>>>> follow:
>>>>>=20
>>>>> 9219/99426/108645 mbufs in use (current/cache/total)
>>>>> 9217/90783/100000/100000 mbuf clusters in use
>> (current/cache/total/max)
>>>>=20
>>>> Do you see network processes are stuck in keglim state? If you see
>>>> that I think that's not trivial to solve. You wouldn't even kill
>>>> that process if it is under keglim state unless some more mbuf
>>>> clusters are freed from other places.
>>>>=20
>>>=20
>>> No keglim state, here is a snapshot of top -SH while the problem is
>>> happening:
>>>=20
>>>   12 root          171 ki31     0K     8K CPU5   5  19:27 100.00% =
idle:
>>> cpu5
>>>   10 root          171 ki31     0K     8K CPU7   7  19:26 100.00% =
idle:
>>> cpu7
>>>   14 root          171 ki31     0K     8K CPU3   3  19:25 100.00% =
idle:
>>> cpu3
>>>   11 root          171 ki31     0K     8K CPU6   6  19:25 100.00% =
idle:
>>> cpu6
>>>   13 root          171 ki31     0K     8K CPU4   4  19:24 100.00% =
idle:
>>> cpu4
>>>   15 root          171 ki31     0K     8K CPU2   2  19:22 100.00% =
idle:
>>> cpu2
>>>   16 root          171 ki31     0K     8K CPU1   1  19:18 100.00% =
idle:
>>> cpu1
>>>   17 root          171 ki31     0K     8K RUN    0  19:12 100.00% =
idle:
>>> cpu0
>>>   18 root          -32    -     0K     8K WAIT   6   0:04  0.10% =
swi4:
>>> clock s
>>>   20 root          -44    -     0K     8K WAIT   4   0:08  0.00% =
swi1:
>> net
>>>   29 root          -68    -     0K     8K -      0   0:02  0.00% =
igb0
>> que
>>>   35 root          -68    -     0K     8K -      2   0:02  0.00% em1
>> taskq
>>>   28 root          -68    -     0K     8K WAIT   5   0:01  0.00% =
irq256:
>>> igb0
>>>=20
>>> keep in mind that num_queues has been forced to 1.
>>>=20
>>>=20
>>>>=20
>>>> I think both igb(4) and em(4) pass received frame to upper stack
>>>> before allocating new RX buffer. If driver fails to allocate new RX
>>>> buffer driver will try to refill RX buffers in next run. Under
>>>> extreme resource shortage case, this situation can produce no more
>>>> RX buffers in RX descriptor ring and this will take the box out of
>>>> network. Other drivers avoid that situation by allocating new RX
>>>> buffer before passing received frame to upper stack. If RX buffer
>>>> allocation fails driver will just reuse old RX buffer without
>>>> passing received frame to upper stack. That does not completely
>>>> solve the keglim issue though. I think you should have enough mbuf
>>>> cluters to avoid keglim.
>>>>=20
>>>> However the output above indicates you have enough free mbuf
>>>> clusters. So I guess igb(4) encountered zero available RX buffer
>>>> situation in past but failed to refill the RX buffer again. I guess
>>>> driver may be able to periodically check available RX buffers.
>>>> Jack may have better idea if this was the case.(CCed)
>>>>=20
>>>=20
>>> That is exactly the pattern. The driver runs out of clusters but =
they
>>> eventually get consumed and freed although the driver refuses to =
process
>> any
>>> new frames. It is, on the other hand, perfectly capable of sending =
out
>>> packets.
>>>=20
>>=20
>> Ok, this clearly indicates igb(4) failed to refill RX buffers since
>> you can still send frames. I'm not sure whether igb(4) controllers
>> could be configured to generate no RX buffer interrupts but that
>> interrupt would be better suited to trigger RX refilling than timer
>> based refilling. Since igb(4) keeps track of available RX buffers,
>> igb(4) can selectively enable that interrupt once it see no RX
>> buffers in the RX descriptor ring. However this does not work with
>> polling.
>>=20
>=20
> I think that your evaluation of the problem is correct although I do =
not
> understand the selective interrupt mechanism you described.
>=20
> Precisely, the exact same behavior happens (RX hang) if options
> DEVICE_POLLING is _not_ used in the kernel configuration file. I tried =
with
> POLLING since someone mentioned that it helped in a case mentioned =
earlier
> today. Unfortunately for igb with or without polling yields the same =
rx ring
> filing problem.
>=20
> By the way I fixed the subject where I erroneously said TX was hanging =
while
> in fact RX is hanging and TX is just fine.
Katim,

could you apply the attached patch and report what the value of
rx_nxt_check and rx_nxt_refresh is when the interface hangs.
You get the values using sysctl -a dev.igb

Best regards
Michael


--Apple-Mail-4-1027096974
Content-Disposition: attachment;
	filename=patch
Content-Type: application/octet-stream;
	x-unix-mode=0644;
	name="patch"
Content-Transfer-Encoding: 7bit

Index: if_igb.c
===================================================================
--- if_igb.c	(revision 218406)
+++ if_igb.c	(working copy)
@@ -5158,6 +5158,12 @@
 		SYSCTL_ADD_UQUAD(ctx, queue_list, OID_AUTO, "rx_bytes",
 				CTLFLAG_RD, &rxr->rx_bytes,
 				"Queue Bytes Received");
+		SYSCTL_ADD_UINT(ctx, queue_list, OID_AUTO, "rx_nxt_refresh",
+				CTLFLAG_RD, &rxr->next_to_refresh, 0,
+				"Next to refresh");
+		SYSCTL_ADD_UINT(ctx, queue_list, OID_AUTO, "rx_nxt_check",
+				CTLFLAG_RD, &rxr->next_to_check, 0,
+				"Next to check");
 		SYSCTL_ADD_INT(ctx, queue_list, OID_AUTO, "lro_queued",
 				CTLFLAG_RD, &lro->lro_queued, 0,
 				"LRO Queued");
Index: if_em.c
===================================================================
--- if_em.c	(revision 218406)
+++ if_em.c	(working copy)
@@ -5137,6 +5137,12 @@
 		SYSCTL_ADD_ULONG(ctx, queue_list, OID_AUTO, "rx_irq",
 				CTLFLAG_RD, &rxr->rx_irq,
 				"Queue MSI-X Receive Interrupts");
+		SYSCTL_ADD_UINT(ctx, queue_list, OID_AUTO, "rx_nxt_refresh",
+				CTLFLAG_RD, &rxr->next_to_refresh, 0,
+				"Next to refresh");
+		SYSCTL_ADD_UINT(ctx, queue_list, OID_AUTO, "rx_nxt_check",
+				CTLFLAG_RD, &rxr->next_to_check, 0,
+				"Next to check");
 	}
 
 	/* MAC stats get their own sub node */

--Apple-Mail-4-1027096974
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
	charset=us-ascii


> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
> 


--Apple-Mail-4-1027096974--