Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 8 Feb 2011 19:06:39 -0500
From:      Karim Fodil-Lemelin <fodillemlinkarim@gmail.com>
To:        Jack Vogel <jfvogel@gmail.com>
Cc:        Pyun YongHyeon <pyunyh@gmail.com>, beezarliu <beezarliu@yahoo.com.cn>, Michael Tuexen <tuexen@freebsd.org>, freebsd-net@freebsd.org
Subject:   Re: igb driver RX (was TX) hangs when out of mbuf clusters
Message-ID:  <AANLkTinaftP09MxxpXQwhLaO3dybSep2q4SWZRP4ycHB@mail.gmail.com>
In-Reply-To: <AANLkTin5DZBnr_VcXRyUmpcH2Gsr3GuaW4EsBtKJ6omd@mail.gmail.com>
References:  <AANLkTikrjkHDaBq%2Bx6MTZhzOeqWA=xtFpqQPsthFGmuf@mail.gmail.com> <D70A2DA6-23B7-442D-856C-4267359D66A5@lurchi.franken.de> <AANLkTinLg6QZz67e3Hhda-bzTX69XWNcdEkr3EZHFmSZ@mail.gmail.com> <AANLkTikMuFRY=W0%2BVtGKdWkJcOFVbdy=OOZNe_xFUC3R@mail.gmail.com> <AANLkTin5DZBnr_VcXRyUmpcH2Gsr3GuaW4EsBtKJ6omd@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
2011/2/8 Jack Vogel <jfvogel@gmail.com>

>
> I have been following this, and thinking about it. I still am working from
> a theoretical
> standpoint, but based on a patch I got quite a long time back and never
> quite groked,
> I believe now that I might have a solution.
>
> The original PR and patch was kern/150516 from Beezar Liu,  I was never
> quite comfortable
> with the code changes, nor convinced that it was a real issue and not a
> misunderstanding.
> However I think now that this very report might be behind what we are
> seeing today. I have
> a slightly different approach to solving it, of course it remains to be
> seen if it handles it
> properly.
>
> Please try the patch I've attached, I'm open to further correction or
> polishing of the
> changes. And thanks to Beezar for his original report and changes, this is
> not for em,
> but if this eliminates the problem its clearly needed in all drivers.
>
> Jack
>
>
> Hi Jack,

Thanks for your help. I tried your patch and it didn't work so I added a
couple of printf to see if the added code was getting hit:

--- a/freebsd/sys/dev/e1000/if_igb.c
--More--(byte 1253)+++ b/freebsd/sys/dev/e1000/if_igb.c
@@ -612,7 +612,7 @@ igb_attach(device_t dev)
            device_get_nameunit(dev));

        INIT_DEBUGOUT("igb_attach: end");
-
+       printf("this driver has a patch from Jack Vogel\n");
        return (0);

 err_late:
@@ -4131,6 +4131,7 @@ igb_rxeof(struct igb_queue *que, int count, int *done)
                struct mbuf             *sendmp, *mh, *mp;
                struct igb_rx_buf       *rxbuf;
                u16                     hlen, plen, hdr, vtag;
+               int                     commit;
                bool                    eop = FALSE;

                cur = &rxr->rx_base[i];
@@ -4255,10 +4256,23 @@ next_desc:
                bus_dmamap_sync(rxr->rxdma.dma_tag, rxr->rxdma.dma_map,
                    BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);

+               commit = i;     /* capture the old index */
+
                /* Advance our pointers to the next descriptor. */
                if (++i == adapter->num_rx_desc)
                        i = 0;
                /*
+               ** Sanity test for ring full, if this
+               ** happens we need to refresh immediately
+               ** or refresh may deadlock.
+               */
+               if (i == rxr->next_to_refresh) {
+                       igb_refresh_mbufs(rxr, commit);
+                       printf("igb_refresh_mbufs called with commit %d\n",
commit);
+                       processed = 0;
+               }
+
+               /*
                ** Send to the stack or LRO
                */
                if (sendmp != NULL) {

Here is the results:

# dmesg | grep Vogel
this driver has a patch from Jack Vogel
this driver has a patch from Jack Vogel

# netstat -m
60453/52707/113160 mbufs in use (current/cache/total)
48416/51584/100000/100000 mbuf clusters in use (current/cache/total/max)
2894/690 mbuf+clusters out of packet secondary zone in use (current/cache)
11946/854/12800/12800 4k (page size) jumbo clusters in use
(current/cache/total/max)
0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
164834K/119760K/284595K bytes allocated to network (current/cache/total)
0/339/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/4/6656 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
0 calls to protocol drain routines
# dmesg | grep commit

At this point RX has hung.

Somehow the check (i == rxr->next_to_refresh) is never true in this case.
Also, I did read kern/150516 and couldn't wrap my head around the patch for
the em driver that Beezar Liu suggested.

Regards,

Karim.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AANLkTinaftP09MxxpXQwhLaO3dybSep2q4SWZRP4ycHB>