From owner-freebsd-net@FreeBSD.ORG  Wed Feb  9 18:45:57 2011
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 110081065673
	for <freebsd-net@freebsd.org>; Wed,  9 Feb 2011 18:45:57 +0000 (UTC)
	(envelope-from tuexen@freebsd.org)
Received: from mail-n.franken.de (drew.ipv6.franken.de
	[IPv6:2001:638:a02:a001:20e:cff:fe4a:feaa])
	by mx1.freebsd.org (Postfix) with ESMTP id 1D6188FC0A
	for <freebsd-net@freebsd.org>; Wed,  9 Feb 2011 18:45:56 +0000 (UTC)
Received: from [192.168.1.113] (p508FA862.dip.t-dialin.net [80.143.168.98])
	(Authenticated sender: macmic)
	by mail-n.franken.de (Postfix) with ESMTP id 884611C0C0BD8;
	Wed,  9 Feb 2011 19:45:53 +0100 (CET)
Mime-Version: 1.0 (Apple Message framework v1082)
Content-Type: text/plain; charset=us-ascii
From: Michael Tuexen <tuexen@freebsd.org>
In-Reply-To: <AANLkTinMHSTMqskxTz2d3ysooadF5AwjTOGHnAbOhAj-@mail.gmail.com>
Date: Wed, 9 Feb 2011 19:45:52 +0100
Content-Transfer-Encoding: quoted-printable
Message-Id: <12838373-FE96-443E-8979-AF5408705BF0@freebsd.org>
References: <AANLkTikrjkHDaBq+x6MTZhzOeqWA=xtFpqQPsthFGmuf@mail.gmail.com>
	<D70A2DA6-23B7-442D-856C-4267359D66A5@lurchi.franken.de>
	<AANLkTinLg6QZz67e3Hhda-bzTX69XWNcdEkr3EZHFmSZ@mail.gmail.com>
	<AANLkTikMuFRY=W0+VtGKdWkJcOFVbdy=OOZNe_xFUC3R@mail.gmail.com>
	<AANLkTin5DZBnr_VcXRyUmpcH2Gsr3GuaW4EsBtKJ6omd@mail.gmail.com>
	<AANLkTinaftP09MxxpXQwhLaO3dybSep2q4SWZRP4ycHB@mail.gmail.com>
	<AANLkTikaFRh-3OK0xjO8a+nY5aoPnMVFGPCnR1CGDVPk@mail.gmail.com>
	<F06CCA42-610F-41CA-897F-7029CCAE991B@freebsd.org>
	<AANLkTinMHSTMqskxTz2d3ysooadF5AwjTOGHnAbOhAj-@mail.gmail.com>
To: Jack Vogel <jfvogel@gmail.com>
X-Mailer: Apple Mail (2.1082)
Cc: Karim Fodil-Lemelin <fodillemlinkarim@gmail.com>,
	Pyun YongHyeon <pyunyh@gmail.com>, freebsd-net@freebsd.org,
	beezarliu <beezarliu@yahoo.com.cn>
Subject: Re: igb driver RX (was TX) hangs when out of mbuf clusters
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 09 Feb 2011 18:45:57 -0000

On Feb 9, 2011, at 6:35 PM, Jack Vogel wrote:

> OK, but the question is why does the ring get totally consumed this =
way, the
> ring has 1024 descriptors, it seems unintuitive that that whole =
quantity can be
> used without some being recharged. Do you see the system mbuf pool =
being
> depleted at the same time?
That was the test case I created: I set up a server accepting =
connections
but not reading anything. So the driver passes the mbufs to the =
transport
stack and they are not consumed. Then the problem occurs. Then I kill =
the
server. Now there are mbufs available again, but the driver doesn't =
know.

I had the impression that these were the circumstances in which the =
problem
showed up (mbuf allocations failing).
>=20
> Since you can reproduce it, do me a favor, in rxeof,  change the =
processed
> value from 8 to 4 and then 1, effectively call refresh every =
descriptor, see if
> that eliminates the issue.
I will do. Need to see if I can do it remotely, since I'm not in my lab
right now. Can do it tomorrow for sure.

But I do not think that this solves the problem, since I did the things
very slowly and you call it at least when you are leaving rxeof.

Best regards
Michael
>=20
> Thanks for your help,
>=20
> Jack
>=20
>=20
> On Wed, Feb 9, 2011 at 2:36 AM, Michael Tuexen <tuexen@freebsd.org> =
wrote:
> Hi Jack,
>=20
> I could recreate the problem. When the problem occurs, we see
>=20
> rx_nxt_check =3D n
> rx_nxt_refresh =3D n + 1
>=20
> (This was also reported in a mail from Karim)
>=20
> This means that the *whole* receive ring has no buffers anymore. This =
can
> occur if, for some amount of time, no clusters are available.
>=20
> Now outside of the driver, at some point of time, clusters are freed.
> I don't think that igb_refresh_mbufs() gets called, since it only gets
> called from igb_rxeof(), which gets called when a packet has been =
received,
> which can not happen since the receive ring is empty. So how can the =
driver
> know? I have no idea. Maybe we can periodically check for such an =
event
> and call igb_refresh_mbufs().
>=20
> Does this make sense to you?
>=20
> Best regards
> Michael
>=20
>=20
> On Feb 9, 2011, at 8:32 AM, Jack Vogel wrote:
>=20
> > Hmmm, well so much for that theory :)
> >
> > Jack
> >
> >
> > On Tue, Feb 8, 2011 at 4:06 PM, Karim Fodil-Lemelin =
<fodillemlinkarim@gmail.com> wrote:
> >
> >
> > 2011/2/8 Jack Vogel <jfvogel@gmail.com>
> >
> >
> > I have been following this, and thinking about it. I still am =
working from a theoretical
> > standpoint, but based on a patch I got quite a long time back and =
never quite groked,
> > I believe now that I might have a solution.
> >
> > The original PR and patch was kern/150516 from Beezar Liu,  I was =
never quite comfortable
> > with the code changes, nor convinced that it was a real issue and =
not a misunderstanding.
> > However I think now that this very report might be behind what we =
are seeing today. I have
> > a slightly different approach to solving it, of course it remains to =
be seen if it handles it
> > properly.
> >
> > Please try the patch I've attached, I'm open to further correction =
or polishing of the
> > changes. And thanks to Beezar for his original report and changes, =
this is not for em,
> > but if this eliminates the problem its clearly needed in all =
drivers.
> >
> > Jack
> >
> >
> > Hi Jack,
> >
> > Thanks for your help. I tried your patch and it didn't work so I =
added a couple of printf to see if the added code was getting hit:
> >
> > --- a/freebsd/sys/dev/e1000/if_igb.c
> > --More--(byte 1253)+++ b/freebsd/sys/dev/e1000/if_igb.c
> > @@ -612,7 +612,7 @@ igb_attach(device_t dev)
> >             device_get_nameunit(dev));
> >
> >         INIT_DEBUGOUT("igb_attach: end");
> > -
> > +       printf("this driver has a patch from Jack Vogel\n");
> >         return (0);
> >
> >  err_late:
> > @@ -4131,6 +4131,7 @@ igb_rxeof(struct igb_queue *que, int count, =
int *done)
> >                 struct mbuf             *sendmp, *mh, *mp;
> >                 struct igb_rx_buf       *rxbuf;
> >                 u16                     hlen, plen, hdr, vtag;
> > +               int                     commit;
> >                 bool                    eop =3D FALSE;
> >
> >                 cur =3D &rxr->rx_base[i];
> > @@ -4255,10 +4256,23 @@ next_desc:
> >                 bus_dmamap_sync(rxr->rxdma.dma_tag, =
rxr->rxdma.dma_map,
> >                     BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
> >
> > +               commit =3D i;     /* capture the old index */
> > +
> >                 /* Advance our pointers to the next descriptor. */
> >                 if (++i =3D=3D adapter->num_rx_desc)
> >                         i =3D 0;
> >                 /*
> > +               ** Sanity test for ring full, if this
> > +               ** happens we need to refresh immediately
> > +               ** or refresh may deadlock.
> > +               */
> > +               if (i =3D=3D rxr->next_to_refresh) {
> > +                       igb_refresh_mbufs(rxr, commit);
> > +                       printf("igb_refresh_mbufs called with commit =
%d\n", commit);
> > +                       processed =3D 0;
> > +               }
> > +
> > +               /*
> >                 ** Send to the stack or LRO
> >                 */
> >                 if (sendmp !=3D NULL) {
> >
> > Here is the results:
> >
> > # dmesg | grep Vogel
> > this driver has a patch from Jack Vogel
> > this driver has a patch from Jack Vogel
> >
> > # netstat -m
> > 60453/52707/113160 mbufs in use (current/cache/total)
> > 48416/51584/100000/100000 mbuf clusters in use =
(current/cache/total/max)
> > 2894/690 mbuf+clusters out of packet secondary zone in use =
(current/cache)
> > 11946/854/12800/12800 4k (page size) jumbo clusters in use =
(current/cache/total/max)
> > 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
> > 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
> > 164834K/119760K/284595K bytes allocated to network =
(current/cache/total)
> > 0/339/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
> > 0/0/0 requests for jumbo clusters denied (4k/9k/16k)
> > 0/4/6656 sfbufs in use (current/peak/max)
> > 0 requests for sfbufs denied
> > 0 requests for sfbufs delayed
> > 0 requests for I/O initiated by sendfile
> > 0 calls to protocol drain routines
> > # dmesg | grep commit
> >
> > At this point RX has hung.
> >
> > Somehow the check (i =3D=3D rxr->next_to_refresh) is never true in =
this case. Also, I did read kern/150516 and couldn't wrap my head around =
the patch for the em driver that Beezar Liu suggested.
> >
> > Regards,
> >
> > Karim.
> >
> >
>=20
>=20