Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 14 May 2009 17:27:29 -0400
From:      Alexander Sack <pisymbol@gmail.com>
To:        John Baldwin <jhb@freebsd.org>
Cc:        freebsd-current@freebsd.org, d@delphij.net
Subject:   Re: Broadcom bge(4) panics while shutting down
Message-ID:  <3c0b01820905141427i7b858504m1ab74fd49882716c@mail.gmail.com>
In-Reply-To: <200905141700.40439.jhb@freebsd.org>
References:  <3c0b01820905141202w113966dp4bfbab73d84d585@mail.gmail.com> <4A0C7544.6010304@delphij.net> <200905141700.40439.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, May 14, 2009 at 5:00 PM, John Baldwin <jhb@freebsd.org> wrote:
> On Thursday 14 May 2009 3:47:16 pm Xin LI wrote:
>> Hi, Alexander,
>>
>> Alexander Sack wrote:
>> > Hello:
>> >
>> > Under heavy traffic (100% utilization GIGE on a 2 port BGE card)
>> > running BGE CURRENT driver I see panics on shutdown. =A0The reason is
>> > because bge_rxeof() while processing its RX ring of BD's drops the
>> > softc lock when it hands it off to its input function. =A0If bge_stop(=
)
>> > is waiting for it, it will then proceed to acquire lock and then
>> > quiesce the hardware (reseting the card, clearing out BDs etc.). =A0On=
ce
>> > bge_stop() releases the softc lock, then bge_rxeof() under an
>> > interrupt context (no polling here) will reacquire and continue to
>> > process the ring which is a bad idea. =A0It should check to see if the
>> > card is still running before continuing processing BDs (i.e. once
>> > IF_DRV_RUNNING has been reset by bge_stop(), bge_rxeof() is done, bail
>> > out).
>> >
>> > Here is my first go around with this patch:
>> >
>> >
>> > -- if_bge.c.CURRENT 2009-05-14 14:39:39.000000000 -0400
>> > +++ if_bge.c =A0 =A0 =A0 =A02009-05-14 14:39:24.000000000 -0400
>> > @@ -3081,6 +3081,10 @@
>> > =A0 =A0 =A0 =A0 =A0 =A0 uint16_t =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0vlan_t=
ag =3D 0;
>> > =A0 =A0 =A0 =A0 =A0 =A0 int =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ha=
ve_tag =3D 0;
>> >
>> > + =A0 =A0 =A0 =A0 =A0 if (!(ifp->if_drv_flags & IFF_DRV_RUNNING)) {
>> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 return;
>> > + =A0 =A0 =A0 =A0 =A0 }
>> > +
>> > =A0#ifdef DEVICE_POLLING
>> > =A0 =A0 =A0 =A0 =A0 =A0 if (ifp->if_capenable & IFCAP_POLLING) {
>> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (sc->rxcycles <=3D 0)
>> >
>> >
>> > This prevents any panics during shutdown under heavy load and AS IT
>> > TURNS out (I feel stupid for not looking) that em(4) already had this
>> > check in its em_rxeof() function (right at the top of the loop). =A0I'=
m
>> > more than happy changing it to the em style but above seems reasonable
>> > to me though I have to verify there isn't anything missing off the
>> > loop from a hardware standpoint (I don't think so because bge_stop()
>> > did all the dirty work so I believe touching any registers after that
>> > from bge_rxeof() is a bad idea).
>> >
>> > Preliminary testing shows no more panics start and stopping ports
>> > under heavy load (panics were almost immediate otherwise).
>> >
>> > Thoughts?
>>
>> I think this would solve the problem but I'm not sure whether this would
>> increase some overhead on the RX path. =A0It seems that there is a race
>> between bge_release_resources() and bge_intr(), I mean, it might be a
>> good idea to "drain" bge_intr() instead?
>
> Usually just detach() drains the interrupt handler. =A0However, an 'ifcon=
fig
> bge0 down' could probably provoke this as well. =A0I would probably do th=
e
> check right after re-acquiring the lock at the bottom of the loop before
> touching anything else.

Yea John, you got a point about that.  I submitted the patch with the
check in the while logic thinking that which I BELIEVE is
functionality equivalent (don't ask me which one is faster), i.e. as
soon as we require it, check it since bge_stop() might have reset it.

If you get a chance, can you look at the PR and let me know if you
think it looks good?  I really want this fixed in 7.x to be honest
since its a pain in the headache (I was working on another subsystem
when I ran into this).

-aps



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3c0b01820905141427i7b858504m1ab74fd49882716c>