From owner-freebsd-net@FreeBSD.ORG Tue Feb 8 03:12:03 2011 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DB6DF1065672; Tue, 8 Feb 2011 03:12:03 +0000 (UTC) (envelope-from pyunyh@gmail.com) Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com [209.85.214.182]) by mx1.freebsd.org (Postfix) with ESMTP id 89A198FC19; Tue, 8 Feb 2011 03:12:03 +0000 (UTC) Received: by iwn39 with SMTP id 39so5368542iwn.13 for ; Mon, 07 Feb 2011 19:12:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:from:date:to:cc:subject:message-id:reply-to :references:mime-version:content-type:content-disposition :content-transfer-encoding:in-reply-to:user-agent; bh=XyU6ad1PPLYDSmglDYYTxthqRq2imsZP+q+Rvw1G8BY=; b=aqAgnXO4qOHUbU1XkkVgnsumYJ+AkCscbC1Gy206h29kiDJCin1YM9VKkUiq5YidBa NEk6F1etniKZAj8w5dDUSm63zcSrwehSgUElEgQe6H2XAYwX+HRSReFnfys+a+j8A+26 YcO65r3dVRK2RyzMvmJiXXMEj1KlbRj+Syehw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:date:to:cc:subject:message-id:reply-to:references:mime-version :content-type:content-disposition:content-transfer-encoding :in-reply-to:user-agent; b=lEv/x7A2B3AcKxJEY9lf3+MTesv/DiD0TsWYVB7/aK+xpKWRvxDwN/4MQsRtNBGRBg 8GTn/G9Gf/cOd+e4WdfqMD10ofHPSdjeZHQggWesFvo661xBUa3qeiasqR8yh9NEh/bJ 1hSVKfYy2lfHzdZ64MMbfKggrVEwwc17/t+EU= Received: by 10.42.175.6 with SMTP id ay6mr5446957icb.185.1297134722847; Mon, 07 Feb 2011 19:12:02 -0800 (PST) Received: from pyunyh@gmail.com ([174.35.1.224]) by mx.google.com with ESMTPS id jv9sm3880615icb.13.2011.02.07.19.11.59 (version=TLSv1/SSLv3 cipher=RC4-MD5); Mon, 07 Feb 2011 19:12:00 -0800 (PST) Received: by pyunyh@gmail.com (sSMTP sendmail emulation); Mon, 7 Feb 2011 19:12:06 -0800 From: Pyun YongHyeon Date: Mon, 7 Feb 2011 19:12:06 -0800 To: Karim Fodil-Lemelin Message-ID: <20110208031206.GC1306@michelle.cdnetworks.com> References: <10510673199.20110207203507@serebryakov.spb.ru> <20110207235811.GA1306@michelle.cdnetworks.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.4.2.3i Cc: jfv@freebsd.org, freebsd-net@freebsd.org Subject: Re: Fwd: igb driver tx hangs when out of mbuf clusters X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: pyunyh@gmail.com List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Feb 2011 03:12:03 -0000 On Mon, Feb 07, 2011 at 09:21:45PM -0500, Karim Fodil-Lemelin wrote: > 2011/2/7 Pyun YongHyeon > > > On Mon, Feb 07, 2011 at 05:33:47PM -0500, Karim Fodil-Lemelin wrote: > > > Subject: Re: igb driver tx hangs when out of mbuf clusters > > > > > > > To: Lev Serebryakov > > > > Cc: freebsd-net@freebsd.org > > > > > > > > > > > > 2011/2/7 Lev Serebryakov > > > > > > > > Hello, Karim. > > > >> You wrote 7 февраля 2011 г., 19:58:04: > > > >> > > > >> > > > >> > The issue is with the igb driver from 7.4 RC3 r218406. If the driver > > > >> runs > > > >> > out of mbuf clusters it simply stops receiving even after the > > clusters > > > >> have > > > >> > been freed. > > > >> It looks like my problems with em0 (see thread "em0 hangs without > > > >> any messages like "Watchdog timeout", only down/up reset it.")... > > > >> Codebase for em and igb is somewhat common... > > > >> > > > >> -- > > > >> // Black Lion AKA Lev Serebryakov > > > >> > > > >> I agree. > > > > > > > > Do you get missed packets in mac_stats (sysctl dev.em | grep missed)? > > > > > > > > I might not have mentioned but I can also 'fix' the problem by doing > > > > ifconfig igb0 down/up. > > > > > > > > I will try using POLLING to 'automatize' the reset as you mentioned in > > your > > > > thread. > > > > > > > > Karim. > > > > > > > > > > > Follow up on tests with POLLING: The problem is still occurring although > > it > > > takes more time ... Outputs of sysctl dev.igb0 and netstat -m will > > follow: > > > > > > 9219/99426/108645 mbufs in use (current/cache/total) > > > 9217/90783/100000/100000 mbuf clusters in use (current/cache/total/max) > > > > Do you see network processes are stuck in keglim state? If you see > > that I think that's not trivial to solve. You wouldn't even kill > > that process if it is under keglim state unless some more mbuf > > clusters are freed from other places. > > > > No keglim state, here is a snapshot of top -SH while the problem is > happening: > > 12 root 171 ki31 0K 8K CPU5 5 19:27 100.00% idle: > cpu5 > 10 root 171 ki31 0K 8K CPU7 7 19:26 100.00% idle: > cpu7 > 14 root 171 ki31 0K 8K CPU3 3 19:25 100.00% idle: > cpu3 > 11 root 171 ki31 0K 8K CPU6 6 19:25 100.00% idle: > cpu6 > 13 root 171 ki31 0K 8K CPU4 4 19:24 100.00% idle: > cpu4 > 15 root 171 ki31 0K 8K CPU2 2 19:22 100.00% idle: > cpu2 > 16 root 171 ki31 0K 8K CPU1 1 19:18 100.00% idle: > cpu1 > 17 root 171 ki31 0K 8K RUN 0 19:12 100.00% idle: > cpu0 > 18 root -32 - 0K 8K WAIT 6 0:04 0.10% swi4: > clock s > 20 root -44 - 0K 8K WAIT 4 0:08 0.00% swi1: net > 29 root -68 - 0K 8K - 0 0:02 0.00% igb0 que > 35 root -68 - 0K 8K - 2 0:02 0.00% em1 taskq > 28 root -68 - 0K 8K WAIT 5 0:01 0.00% irq256: > igb0 > > keep in mind that num_queues has been forced to 1. > > > > > > I think both igb(4) and em(4) pass received frame to upper stack > > before allocating new RX buffer. If driver fails to allocate new RX > > buffer driver will try to refill RX buffers in next run. Under > > extreme resource shortage case, this situation can produce no more > > RX buffers in RX descriptor ring and this will take the box out of > > network. Other drivers avoid that situation by allocating new RX > > buffer before passing received frame to upper stack. If RX buffer > > allocation fails driver will just reuse old RX buffer without > > passing received frame to upper stack. That does not completely > > solve the keglim issue though. I think you should have enough mbuf > > cluters to avoid keglim. > > > > However the output above indicates you have enough free mbuf > > clusters. So I guess igb(4) encountered zero available RX buffer > > situation in past but failed to refill the RX buffer again. I guess > > driver may be able to periodically check available RX buffers. > > Jack may have better idea if this was the case.(CCed) > > > > That is exactly the pattern. The driver runs out of clusters but they > eventually get consumed and freed although the driver refuses to process any > new frames. It is, on the other hand, perfectly capable of sending out > packets. > Ok, this clearly indicates igb(4) failed to refill RX buffers since you can still send frames. I'm not sure whether igb(4) controllers could be configured to generate no RX buffer interrupts but that interrupt would be better suited to trigger RX refilling than timer based refilling. Since igb(4) keeps track of available RX buffers, igb(4) can selectively enable that interrupt once it see no RX buffers in the RX descriptor ring. However this does not work with polling.