From owner-freebsd-net@FreeBSD.ORG  Fri Feb 11 22:24:34 2011
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 066A51065670
	for <freebsd-net@freebsd.org>; Fri, 11 Feb 2011 22:24:34 +0000 (UTC)
	(envelope-from fodillemlinkarim@gmail.com)
Received: from mail-pz0-f54.google.com (mail-pz0-f54.google.com
	[209.85.210.54])
	by mx1.freebsd.org (Postfix) with ESMTP id C24408FC1D
	for <freebsd-net@freebsd.org>; Fri, 11 Feb 2011 22:24:33 +0000 (UTC)
Received: by pzk32 with SMTP id 32so543593pzk.13
	for <freebsd-net@freebsd.org>; Fri, 11 Feb 2011 14:24:33 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:in-reply-to:references:date
	:message-id:subject:from:to:cc:content-type;
	bh=bL099MCNGOS2aU/b2tvspGYtFjTTkKqUg9uFBc26QBc=;
	b=wz+BdiSFLCMYCpSdgqQ3JhqHJDM7l14jhNWfdF8lvcWJ83+1GV2n8sCgUtV2zwv2ea
	XnV/U58iIgPWP96im8u4DrInIf40xc4c7kNpo4CbKLV+uJt5GkHitD7wCnXHgKwbmN1i
	+RLwqnu+zdcTQ06EvD3NaJbM7r6vJIwzoqLRI=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	b=JEZjMbFJZo0JeANyrzQAnGbYb7Bnbb/G4Ndfn+LMqd1tyTMk8FPPe55+vWGlvzr4bC
	W1Z2s+6v80ZN3zPFXX7rCfrtl+ltDN3G21x/2pByn2U3utLN3uTbRukIO7rF0R7enzM7
	kJ9sZX59IcznxO2dhaRU59iLAbbLhEoDrq6cc=
MIME-Version: 1.0
Received: by 10.142.131.17 with SMTP id e17mr755747wfd.371.1297463073215; Fri,
	11 Feb 2011 14:24:33 -0800 (PST)
Received: by 10.142.246.3 with HTTP; Fri, 11 Feb 2011 14:24:33 -0800 (PST)
In-Reply-To: <AANLkTik2=_Vg92t6qNTNvT-y3OdAQC=MQ40r=OrHER=b@mail.gmail.com>
References: <AANLkTikrjkHDaBq+x6MTZhzOeqWA=xtFpqQPsthFGmuf@mail.gmail.com>
	<D70A2DA6-23B7-442D-856C-4267359D66A5@lurchi.franken.de>
	<AANLkTinLg6QZz67e3Hhda-bzTX69XWNcdEkr3EZHFmSZ@mail.gmail.com>
	<AANLkTikMuFRY=W0+VtGKdWkJcOFVbdy=OOZNe_xFUC3R@mail.gmail.com>
	<AANLkTin5DZBnr_VcXRyUmpcH2Gsr3GuaW4EsBtKJ6omd@mail.gmail.com>
	<AANLkTinaftP09MxxpXQwhLaO3dybSep2q4SWZRP4ycHB@mail.gmail.com>
	<AANLkTikaFRh-3OK0xjO8a+nY5aoPnMVFGPCnR1CGDVPk@mail.gmail.com>
	<F06CCA42-610F-41CA-897F-7029CCAE991B@freebsd.org>
	<AANLkTinMHSTMqskxTz2d3ysooadF5AwjTOGHnAbOhAj-@mail.gmail.com>
	<12838373-FE96-443E-8979-AF5408705BF0@freebsd.org>
	<AANLkTinSFycBZx31A-QQoweEVAD-tsEBnuZW5+pZgP2Z@mail.gmail.com>
	<AANLkTik2=_Vg92t6qNTNvT-y3OdAQC=MQ40r=OrHER=b@mail.gmail.com>
Date: Fri, 11 Feb 2011 17:24:33 -0500
Message-ID: <AANLkTimzVPsD2N228vSSwW4h6wVYSh8xtsTb-eo_EQiA@mail.gmail.com>
From: Karim Fodil-Lemelin <fodillemlinkarim@gmail.com>
To: Jack Vogel <jfvogel@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: freebsd-net@freebsd.org
Subject: Re: igb driver RX (was TX) hangs when out of mbuf clusters
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 11 Feb 2011 22:24:34 -0000

This is great news! Thanks!

Btw we've confirmed that your patch (from current) is indeed working for us
as well.

2011/2/11 Jack Vogel <jfvogel@gmail.com>

> Yes, it was the fix for this, Mike has tested and confirmed it eliminates
> the problem, and yes
> I will MFC the relevant bits back to 8 and 7 for you asap.
>
> Jack
>
>
>
> On Fri, Feb 11, 2011 at 11:53 AM, Karim Fodil-Lemelin <
> fodillemlinkarim@gmail.com> wrote:
>
>> Hi,
>>
>> I see a commit was made in current (r218530 | jfv | 2011-02-10 20:00:26
>> -0500 (Thu, 10 Feb 2011)). Is that commit done to address this issue?
>>
>> And if so Is there any MFC planned for 7.4 for this?
>>
>> Thanks,
>>
>> Karim.
>>
>> 2011/2/9 Michael Tuexen <tuexen@freebsd.org>
>>
>> On Feb 9, 2011, at 6:35 PM, Jack Vogel wrote:
>>>
>>> > OK, but the question is why does the ring get totally consumed this
>>> way, the
>>> > ring has 1024 descriptors, it seems unintuitive that that whole
>>> quantity can be
>>> > used without some being recharged. Do you see the system mbuf pool
>>> being
>>> > depleted at the same time?
>>> That was the test case I created: I set up a server accepting connections
>>> but not reading anything. So the driver passes the mbufs to the transport
>>> stack and they are not consumed. Then the problem occurs. Then I kill the
>>> server. Now there are mbufs available again, but the driver doesn't know.
>>>
>>> I had the impression that these were the circumstances in which the
>>> problem
>>> showed up (mbuf allocations failing).
>>> >
>>> > Since you can reproduce it, do me a favor, in rxeof,  change the
>>> processed
>>> > value from 8 to 4 and then 1, effectively call refresh every
>>> descriptor, see if
>>> > that eliminates the issue.
>>> I will do. Need to see if I can do it remotely, since I'm not in my lab
>>> right now. Can do it tomorrow for sure.
>>>
>>> But I do not think that this solves the problem, since I did the things
>>> very slowly and you call it at least when you are leaving rxeof.
>>>
>>> Best regards
>>> Michael
>>> >
>>> > Thanks for your help,
>>> >
>>> > Jack
>>> >
>>> >
>>> > On Wed, Feb 9, 2011 at 2:36 AM, Michael Tuexen <tuexen@freebsd.org>
>>> wrote:
>>> > Hi Jack,
>>> >
>>> > I could recreate the problem. When the problem occurs, we see
>>> >
>>> > rx_nxt_check = n
>>> > rx_nxt_refresh = n + 1
>>> >
>>> > (This was also reported in a mail from Karim)
>>> >
>>> > This means that the *whole* receive ring has no buffers anymore. This
>>> can
>>> > occur if, for some amount of time, no clusters are available.
>>> >
>>> > Now outside of the driver, at some point of time, clusters are freed.
>>> > I don't think that igb_refresh_mbufs() gets called, since it only gets
>>> > called from igb_rxeof(), which gets called when a packet has been
>>> received,
>>> > which can not happen since the receive ring is empty. So how can the
>>> driver
>>> > know? I have no idea. Maybe we can periodically check for such an event
>>> > and call igb_refresh_mbufs().
>>> >
>>> > Does this make sense to you?
>>> >
>>> > Best regards
>>> > Michael
>>> >
>>> >
>>> > On Feb 9, 2011, at 8:32 AM, Jack Vogel wrote:
>>> >
>>> > > Hmmm, well so much for that theory :)
>>> > >
>>> > > Jack
>>> > >
>>> > >
>>> > > On Tue, Feb 8, 2011 at 4:06 PM, Karim Fodil-Lemelin <
>>> fodillemlinkarim@gmail.com> wrote:
>>> > >
>>> > >
>>> > > 2011/2/8 Jack Vogel <jfvogel@gmail.com>
>>> > >
>>> > >
>>> > > I have been following this, and thinking about it. I still am working
>>> from a theoretical
>>> > > standpoint, but based on a patch I got quite a long time back and
>>> never quite groked,
>>> > > I believe now that I might have a solution.
>>> > >
>>> > > The original PR and patch was kern/150516 from Beezar Liu,  I was
>>> never quite comfortable
>>> > > with the code changes, nor convinced that it was a real issue and not
>>> a misunderstanding.
>>> > > However I think now that this very report might be behind what we are
>>> seeing today. I have
>>> > > a slightly different approach to solving it, of course it remains to
>>> be seen if it handles it
>>> > > properly.
>>> > >
>>> > > Please try the patch I've attached, I'm open to further correction or
>>> polishing of the
>>> > > changes. And thanks to Beezar for his original report and changes,
>>> this is not for em,
>>> > > but if this eliminates the problem its clearly needed in all drivers.
>>> > >
>>> > > Jack
>>> > >
>>> > >
>>> > > Hi Jack,
>>> > >
>>> > > Thanks for your help. I tried your patch and it didn't work so I
>>> added a couple of printf to see if the added code was getting hit:
>>> > >
>>> > > --- a/freebsd/sys/dev/e1000/if_igb.c
>>> > > --More--(byte 1253)+++ b/freebsd/sys/dev/e1000/if_igb.c
>>> > > @@ -612,7 +612,7 @@ igb_attach(device_t dev)
>>> > >             device_get_nameunit(dev));
>>> > >
>>> > >         INIT_DEBUGOUT("igb_attach: end");
>>> > > -
>>> > > +       printf("this driver has a patch from Jack Vogel\n");
>>> > >         return (0);
>>> > >
>>> > >  err_late:
>>> > > @@ -4131,6 +4131,7 @@ igb_rxeof(struct igb_queue *que, int count, int
>>> *done)
>>> > >                 struct mbuf             *sendmp, *mh, *mp;
>>> > >                 struct igb_rx_buf       *rxbuf;
>>> > >                 u16                     hlen, plen, hdr, vtag;
>>> > > +               int                     commit;
>>> > >                 bool                    eop = FALSE;
>>> > >
>>> > >                 cur = &rxr->rx_base[i];
>>> > > @@ -4255,10 +4256,23 @@ next_desc:
>>> > >                 bus_dmamap_sync(rxr->rxdma.dma_tag,
>>> rxr->rxdma.dma_map,
>>> > >                     BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
>>> > >
>>> > > +               commit = i;     /* capture the old index */
>>> > > +
>>> > >                 /* Advance our pointers to the next descriptor. */
>>> > >                 if (++i == adapter->num_rx_desc)
>>> > >                         i = 0;
>>> > >                 /*
>>> > > +               ** Sanity test for ring full, if this
>>> > > +               ** happens we need to refresh immediately
>>> > > +               ** or refresh may deadlock.
>>> > > +               */
>>> > > +               if (i == rxr->next_to_refresh) {
>>> > > +                       igb_refresh_mbufs(rxr, commit);
>>> > > +                       printf("igb_refresh_mbufs called with commit
>>> %d\n", commit);
>>> > > +                       processed = 0;
>>> > > +               }
>>> > > +
>>> > > +               /*
>>> > >                 ** Send to the stack or LRO
>>> > >                 */
>>> > >                 if (sendmp != NULL) {
>>> > >
>>> > > Here is the results:
>>> > >
>>> > > # dmesg | grep Vogel
>>> > > this driver has a patch from Jack Vogel
>>> > > this driver has a patch from Jack Vogel
>>> > >
>>> > > # netstat -m
>>> > > 60453/52707/113160 mbufs in use (current/cache/total)
>>> > > 48416/51584/100000/100000 mbuf clusters in use
>>> (current/cache/total/max)
>>> > > 2894/690 mbuf+clusters out of packet secondary zone in use
>>> (current/cache)
>>> > > 11946/854/12800/12800 4k (page size) jumbo clusters in use
>>> (current/cache/total/max)
>>> > > 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
>>> > > 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
>>> > > 164834K/119760K/284595K bytes allocated to network
>>> (current/cache/total)
>>> > > 0/339/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
>>> > > 0/0/0 requests for jumbo clusters denied (4k/9k/16k)
>>> > > 0/4/6656 sfbufs in use (current/peak/max)
>>> > > 0 requests for sfbufs denied
>>> > > 0 requests for sfbufs delayed
>>> > > 0 requests for I/O initiated by sendfile
>>> > > 0 calls to protocol drain routines
>>> > > # dmesg | grep commit
>>> > >
>>> > > At this point RX has hung.
>>> > >
>>> > > Somehow the check (i == rxr->next_to_refresh) is never true in this
>>> case. Also, I did read kern/150516 and couldn't wrap my head around the
>>> patch for the em driver that Beezar Liu suggested.
>>> > >
>>> > > Regards,
>>> > >
>>> > > Karim.
>>> > >
>>> > >
>>> >
>>> >
>>>
>>>
>>
>