From owner-freebsd-current@FreeBSD.ORG Thu May 14 21:17:02 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D63C5106564A for ; Thu, 14 May 2009 21:17:02 +0000 (UTC) (envelope-from pisymbol@gmail.com) Received: from an-out-0708.google.com (an-out-0708.google.com [209.85.132.250]) by mx1.freebsd.org (Postfix) with ESMTP id 8EA568FC08 for ; Thu, 14 May 2009 21:17:02 +0000 (UTC) (envelope-from pisymbol@gmail.com) Received: by an-out-0708.google.com with SMTP id c3so692338ana.13 for ; Thu, 14 May 2009 14:17:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=OakFw85oatgYvwMy1QHcpCkitsLpGwDFu7pRxfLu0Is=; b=TIUNp2EXx2aiUzFrjZcuGgr/0AsMdukp8+QbLtiinjT8rl6qQwX2NyUtmaxm9Pq7T+ PX3flJku5v8DJ4A++XkW9f1+i+3g3Ha/8fI/c1MUjNTRLdZhUgAUXGQ8aI1vDvUV5X1W Rz61yfl7/TfHZhytY7CpZEzlr/1aKVxczCF7M= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=joYz3l4ywVHlhgrWu8nIkyJ/+Uq7hBkesjaRi9IQ9UddxLK6iQTrN5L7ZilrTsFWEb Osu0mGYz41lRizmHhQZcweu70RNE4usltIHwcUqnhhrQlBUT2YGz2Us/NrI66uT9b89q FA8GoVJ7jwlXDQpbEvO+MOnYIAK5VRNWz53WM= MIME-Version: 1.0 Received: by 10.100.126.19 with SMTP id y19mr3684448anc.46.1242335821916; Thu, 14 May 2009 14:17:01 -0700 (PDT) In-Reply-To: <3c0b01820905141301h1b08fc0ay1e6a1676b5a149d4@mail.gmail.com> References: <3c0b01820905141202w113966dp4bfbab73d84d585@mail.gmail.com> <4A0C7544.6010304@delphij.net> <3c0b01820905141301h1b08fc0ay1e6a1676b5a149d4@mail.gmail.com> Date: Thu, 14 May 2009 17:17:01 -0400 Message-ID: <3c0b01820905141417h76e9104fl2800524e364d62b6@mail.gmail.com> From: Alexander Sack To: freebsd-current@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Subject: Re: Broadcom bge(4) panics while shutting down X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 May 2009 21:17:03 -0000 On Thu, May 14, 2009 at 4:01 PM, Alexander Sack wrote: > On Thu, May 14, 2009 at 3:47 PM, Xin LI wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> Hi, Alexander, >> >> Alexander Sack wrote: >>> Hello: >>> >>> Under heavy traffic (100% utilization GIGE on a 2 port BGE card) >>> running BGE CURRENT driver I see panics on shutdown. =A0The reason is >>> because bge_rxeof() while processing its RX ring of BD's drops the >>> softc lock when it hands it off to its input function. =A0If bge_stop() >>> is waiting for it, it will then proceed to acquire lock and then >>> quiesce the hardware (reseting the card, clearing out BDs etc.). =A0Onc= e >>> bge_stop() releases the softc lock, then bge_rxeof() under an >>> interrupt context (no polling here) will reacquire and continue to >>> process the ring which is a bad idea. =A0It should check to see if the >>> card is still running before continuing processing BDs (i.e. once >>> IF_DRV_RUNNING has been reset by bge_stop(), bge_rxeof() is done, bail >>> out). >>> >>> Here is my first go around with this patch: >>> >>> >>> -- if_bge.c.CURRENT =A0 2009-05-14 14:39:39.000000000 -0400 >>> +++ if_bge.c =A02009-05-14 14:39:24.000000000 -0400 >>> @@ -3081,6 +3081,10 @@ >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 uint16_t =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0vla= n_tag =3D 0; >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 int =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= have_tag =3D 0; >>> >>> + =A0 =A0 =A0 =A0 =A0 =A0 if (!(ifp->if_drv_flags & IFF_DRV_RUNNING)) { >>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 return; >>> + =A0 =A0 =A0 =A0 =A0 =A0 } >>> + >>> =A0#ifdef DEVICE_POLLING >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (ifp->if_capenable & IFCAP_POLLING) { >>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (sc->rxcycles <=3D 0) >>> >>> >>> This prevents any panics during shutdown under heavy load and AS IT >>> TURNS out (I feel stupid for not looking) that em(4) already had this >>> check in its em_rxeof() function (right at the top of the loop). =A0I'm >>> more than happy changing it to the em style but above seems reasonable >>> to me though I have to verify there isn't anything missing off the >>> loop from a hardware standpoint (I don't think so because bge_stop() >>> did all the dirty work so I believe touching any registers after that >>> from bge_rxeof() is a bad idea). >>> >>> Preliminary testing shows no more panics start and stopping ports >>> under heavy load (panics were almost immediate otherwise). >>> >>> Thoughts? >> >> I think this would solve the problem but I'm not sure whether this would >> increase some overhead on the RX path. =A0It seems that there is a race >> between bge_release_resources() and bge_intr(), I mean, it might be a >> good idea to "drain" bge_intr() instead? > > Are you talking about detach time? =A0Because bge_stop() gets called > before bge_release_resources() and stops host interrupts so where is > the race again? =A0I mean at this point no more interrupts should be > delivered to bge_intr() (I can confirm from spec since BGE has > released it in the wild). =A0So why would you "drain" it at this > point....(the hardware is down including the firmware). > > I agree it adds a little overhead to the standard bge_rxeof() path > which I agree is very sensitive to change. =A0However, I think the check > at top is tolerable since the other recourse is crash. =A0I mean its > very easy to reproduce. =A0Flood a Broadcom card with traffic then stop > the card and let the race begin...it will go down in bge_rxeof() after > bge_stop releases the lock. > > I actually did not look at changing anything structurally to perhaps > make this whole predicament better but minimally there should be a > shield against this no? > > -aps > http://www.freebsd.org/cgi/query-pr.cgi?pr=3D134548 To track...with patch (though spacing got killed, my apologies, I moved the check into the while logic a la em). I've tested this with zero issue so far. -aps