From owner-freebsd-current@FreeBSD.ORG Thu May 14 21:21:57 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8036D106566B for ; Thu, 14 May 2009 21:21:57 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 533B68FC15 for ; Thu, 14 May 2009 21:21:57 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 057CC46B8F; Thu, 14 May 2009 17:21:57 -0400 (EDT) Received: from jhbbsd.hudson-trading.com (unknown [209.249.190.8]) by bigwig.baldwin.cx (Postfix) with ESMTPA id D849E8A028; Thu, 14 May 2009 17:21:55 -0400 (EDT) From: John Baldwin To: freebsd-current@freebsd.org, d@delphij.net Date: Thu, 14 May 2009 17:00:40 -0400 User-Agent: KMail/1.9.7 References: <3c0b01820905141202w113966dp4bfbab73d84d585@mail.gmail.com> <4A0C7544.6010304@delphij.net> In-Reply-To: <4A0C7544.6010304@delphij.net> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200905141700.40439.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Thu, 14 May 2009 17:21:55 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.5 required=4.2 tests=AWL,BAYES_00,RDNS_NONE autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: Alexander Sack Subject: Re: Broadcom bge(4) panics while shutting down X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 May 2009 21:21:57 -0000 On Thursday 14 May 2009 3:47:16 pm Xin LI wrote: > Hi, Alexander, > > Alexander Sack wrote: > > Hello: > > > > Under heavy traffic (100% utilization GIGE on a 2 port BGE card) > > running BGE CURRENT driver I see panics on shutdown. The reason is > > because bge_rxeof() while processing its RX ring of BD's drops the > > softc lock when it hands it off to its input function. If bge_stop() > > is waiting for it, it will then proceed to acquire lock and then > > quiesce the hardware (reseting the card, clearing out BDs etc.). Once > > bge_stop() releases the softc lock, then bge_rxeof() under an > > interrupt context (no polling here) will reacquire and continue to > > process the ring which is a bad idea. It should check to see if the > > card is still running before continuing processing BDs (i.e. once > > IF_DRV_RUNNING has been reset by bge_stop(), bge_rxeof() is done, bail > > out). > > > > Here is my first go around with this patch: > > > > > > -- if_bge.c.CURRENT 2009-05-14 14:39:39.000000000 -0400 > > +++ if_bge.c 2009-05-14 14:39:24.000000000 -0400 > > @@ -3081,6 +3081,10 @@ > > uint16_t vlan_tag = 0; > > int have_tag = 0; > > > > + if (!(ifp->if_drv_flags & IFF_DRV_RUNNING)) { > > + return; > > + } > > + > > #ifdef DEVICE_POLLING > > if (ifp->if_capenable & IFCAP_POLLING) { > > if (sc->rxcycles <= 0) > > > > > > This prevents any panics during shutdown under heavy load and AS IT > > TURNS out (I feel stupid for not looking) that em(4) already had this > > check in its em_rxeof() function (right at the top of the loop). I'm > > more than happy changing it to the em style but above seems reasonable > > to me though I have to verify there isn't anything missing off the > > loop from a hardware standpoint (I don't think so because bge_stop() > > did all the dirty work so I believe touching any registers after that > > from bge_rxeof() is a bad idea). > > > > Preliminary testing shows no more panics start and stopping ports > > under heavy load (panics were almost immediate otherwise). > > > > Thoughts? > > I think this would solve the problem but I'm not sure whether this would > increase some overhead on the RX path. It seems that there is a race > between bge_release_resources() and bge_intr(), I mean, it might be a > good idea to "drain" bge_intr() instead? Usually just detach() drains the interrupt handler. However, an 'ifconfig bge0 down' could probably provoke this as well. I would probably do the check right after re-acquiring the lock at the bottom of the loop before touching anything else. -- John Baldwin