From owner-freebsd-net@FreeBSD.ORG Fri Feb 11 22:24:34 2011 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 066A51065670 for ; Fri, 11 Feb 2011 22:24:34 +0000 (UTC) (envelope-from fodillemlinkarim@gmail.com) Received: from mail-pz0-f54.google.com (mail-pz0-f54.google.com [209.85.210.54]) by mx1.freebsd.org (Postfix) with ESMTP id C24408FC1D for ; Fri, 11 Feb 2011 22:24:33 +0000 (UTC) Received: by pzk32 with SMTP id 32so543593pzk.13 for ; Fri, 11 Feb 2011 14:24:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=bL099MCNGOS2aU/b2tvspGYtFjTTkKqUg9uFBc26QBc=; b=wz+BdiSFLCMYCpSdgqQ3JhqHJDM7l14jhNWfdF8lvcWJ83+1GV2n8sCgUtV2zwv2ea XnV/U58iIgPWP96im8u4DrInIf40xc4c7kNpo4CbKLV+uJt5GkHitD7wCnXHgKwbmN1i +RLwqnu+zdcTQ06EvD3NaJbM7r6vJIwzoqLRI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=JEZjMbFJZo0JeANyrzQAnGbYb7Bnbb/G4Ndfn+LMqd1tyTMk8FPPe55+vWGlvzr4bC W1Z2s+6v80ZN3zPFXX7rCfrtl+ltDN3G21x/2pByn2U3utLN3uTbRukIO7rF0R7enzM7 kJ9sZX59IcznxO2dhaRU59iLAbbLhEoDrq6cc= MIME-Version: 1.0 Received: by 10.142.131.17 with SMTP id e17mr755747wfd.371.1297463073215; Fri, 11 Feb 2011 14:24:33 -0800 (PST) Received: by 10.142.246.3 with HTTP; Fri, 11 Feb 2011 14:24:33 -0800 (PST) In-Reply-To: References: <12838373-FE96-443E-8979-AF5408705BF0@freebsd.org> Date: Fri, 11 Feb 2011 17:24:33 -0500 Message-ID: From: Karim Fodil-Lemelin To: Jack Vogel Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-net@freebsd.org Subject: Re: igb driver RX (was TX) hangs when out of mbuf clusters X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Feb 2011 22:24:34 -0000 This is great news! Thanks! Btw we've confirmed that your patch (from current) is indeed working for us as well. 2011/2/11 Jack Vogel > Yes, it was the fix for this, Mike has tested and confirmed it eliminates > the problem, and yes > I will MFC the relevant bits back to 8 and 7 for you asap. > > Jack > > > > On Fri, Feb 11, 2011 at 11:53 AM, Karim Fodil-Lemelin < > fodillemlinkarim@gmail.com> wrote: > >> Hi, >> >> I see a commit was made in current (r218530 | jfv | 2011-02-10 20:00:26 >> -0500 (Thu, 10 Feb 2011)). Is that commit done to address this issue? >> >> And if so Is there any MFC planned for 7.4 for this? >> >> Thanks, >> >> Karim. >> >> 2011/2/9 Michael Tuexen >> >> On Feb 9, 2011, at 6:35 PM, Jack Vogel wrote: >>> >>> > OK, but the question is why does the ring get totally consumed this >>> way, the >>> > ring has 1024 descriptors, it seems unintuitive that that whole >>> quantity can be >>> > used without some being recharged. Do you see the system mbuf pool >>> being >>> > depleted at the same time? >>> That was the test case I created: I set up a server accepting connections >>> but not reading anything. So the driver passes the mbufs to the transport >>> stack and they are not consumed. Then the problem occurs. Then I kill the >>> server. Now there are mbufs available again, but the driver doesn't know. >>> >>> I had the impression that these were the circumstances in which the >>> problem >>> showed up (mbuf allocations failing). >>> > >>> > Since you can reproduce it, do me a favor, in rxeof, change the >>> processed >>> > value from 8 to 4 and then 1, effectively call refresh every >>> descriptor, see if >>> > that eliminates the issue. >>> I will do. Need to see if I can do it remotely, since I'm not in my lab >>> right now. Can do it tomorrow for sure. >>> >>> But I do not think that this solves the problem, since I did the things >>> very slowly and you call it at least when you are leaving rxeof. >>> >>> Best regards >>> Michael >>> > >>> > Thanks for your help, >>> > >>> > Jack >>> > >>> > >>> > On Wed, Feb 9, 2011 at 2:36 AM, Michael Tuexen >>> wrote: >>> > Hi Jack, >>> > >>> > I could recreate the problem. When the problem occurs, we see >>> > >>> > rx_nxt_check = n >>> > rx_nxt_refresh = n + 1 >>> > >>> > (This was also reported in a mail from Karim) >>> > >>> > This means that the *whole* receive ring has no buffers anymore. This >>> can >>> > occur if, for some amount of time, no clusters are available. >>> > >>> > Now outside of the driver, at some point of time, clusters are freed. >>> > I don't think that igb_refresh_mbufs() gets called, since it only gets >>> > called from igb_rxeof(), which gets called when a packet has been >>> received, >>> > which can not happen since the receive ring is empty. So how can the >>> driver >>> > know? I have no idea. Maybe we can periodically check for such an event >>> > and call igb_refresh_mbufs(). >>> > >>> > Does this make sense to you? >>> > >>> > Best regards >>> > Michael >>> > >>> > >>> > On Feb 9, 2011, at 8:32 AM, Jack Vogel wrote: >>> > >>> > > Hmmm, well so much for that theory :) >>> > > >>> > > Jack >>> > > >>> > > >>> > > On Tue, Feb 8, 2011 at 4:06 PM, Karim Fodil-Lemelin < >>> fodillemlinkarim@gmail.com> wrote: >>> > > >>> > > >>> > > 2011/2/8 Jack Vogel >>> > > >>> > > >>> > > I have been following this, and thinking about it. I still am working >>> from a theoretical >>> > > standpoint, but based on a patch I got quite a long time back and >>> never quite groked, >>> > > I believe now that I might have a solution. >>> > > >>> > > The original PR and patch was kern/150516 from Beezar Liu, I was >>> never quite comfortable >>> > > with the code changes, nor convinced that it was a real issue and not >>> a misunderstanding. >>> > > However I think now that this very report might be behind what we are >>> seeing today. I have >>> > > a slightly different approach to solving it, of course it remains to >>> be seen if it handles it >>> > > properly. >>> > > >>> > > Please try the patch I've attached, I'm open to further correction or >>> polishing of the >>> > > changes. And thanks to Beezar for his original report and changes, >>> this is not for em, >>> > > but if this eliminates the problem its clearly needed in all drivers. >>> > > >>> > > Jack >>> > > >>> > > >>> > > Hi Jack, >>> > > >>> > > Thanks for your help. I tried your patch and it didn't work so I >>> added a couple of printf to see if the added code was getting hit: >>> > > >>> > > --- a/freebsd/sys/dev/e1000/if_igb.c >>> > > --More--(byte 1253)+++ b/freebsd/sys/dev/e1000/if_igb.c >>> > > @@ -612,7 +612,7 @@ igb_attach(device_t dev) >>> > > device_get_nameunit(dev)); >>> > > >>> > > INIT_DEBUGOUT("igb_attach: end"); >>> > > - >>> > > + printf("this driver has a patch from Jack Vogel\n"); >>> > > return (0); >>> > > >>> > > err_late: >>> > > @@ -4131,6 +4131,7 @@ igb_rxeof(struct igb_queue *que, int count, int >>> *done) >>> > > struct mbuf *sendmp, *mh, *mp; >>> > > struct igb_rx_buf *rxbuf; >>> > > u16 hlen, plen, hdr, vtag; >>> > > + int commit; >>> > > bool eop = FALSE; >>> > > >>> > > cur = &rxr->rx_base[i]; >>> > > @@ -4255,10 +4256,23 @@ next_desc: >>> > > bus_dmamap_sync(rxr->rxdma.dma_tag, >>> rxr->rxdma.dma_map, >>> > > BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE); >>> > > >>> > > + commit = i; /* capture the old index */ >>> > > + >>> > > /* Advance our pointers to the next descriptor. */ >>> > > if (++i == adapter->num_rx_desc) >>> > > i = 0; >>> > > /* >>> > > + ** Sanity test for ring full, if this >>> > > + ** happens we need to refresh immediately >>> > > + ** or refresh may deadlock. >>> > > + */ >>> > > + if (i == rxr->next_to_refresh) { >>> > > + igb_refresh_mbufs(rxr, commit); >>> > > + printf("igb_refresh_mbufs called with commit >>> %d\n", commit); >>> > > + processed = 0; >>> > > + } >>> > > + >>> > > + /* >>> > > ** Send to the stack or LRO >>> > > */ >>> > > if (sendmp != NULL) { >>> > > >>> > > Here is the results: >>> > > >>> > > # dmesg | grep Vogel >>> > > this driver has a patch from Jack Vogel >>> > > this driver has a patch from Jack Vogel >>> > > >>> > > # netstat -m >>> > > 60453/52707/113160 mbufs in use (current/cache/total) >>> > > 48416/51584/100000/100000 mbuf clusters in use >>> (current/cache/total/max) >>> > > 2894/690 mbuf+clusters out of packet secondary zone in use >>> (current/cache) >>> > > 11946/854/12800/12800 4k (page size) jumbo clusters in use >>> (current/cache/total/max) >>> > > 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) >>> > > 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) >>> > > 164834K/119760K/284595K bytes allocated to network >>> (current/cache/total) >>> > > 0/339/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) >>> > > 0/0/0 requests for jumbo clusters denied (4k/9k/16k) >>> > > 0/4/6656 sfbufs in use (current/peak/max) >>> > > 0 requests for sfbufs denied >>> > > 0 requests for sfbufs delayed >>> > > 0 requests for I/O initiated by sendfile >>> > > 0 calls to protocol drain routines >>> > > # dmesg | grep commit >>> > > >>> > > At this point RX has hung. >>> > > >>> > > Somehow the check (i == rxr->next_to_refresh) is never true in this >>> case. Also, I did read kern/150516 and couldn't wrap my head around the >>> patch for the em driver that Beezar Liu suggested. >>> > > >>> > > Regards, >>> > > >>> > > Karim. >>> > > >>> > > >>> > >>> > >>> >>> >> >