Date: Wed, 25 Jan 2012 07:43:33 -0700 From: PseudoCylon <moonlightakkiy@yahoo.ca> To: Adrian Chadd <adrian@freebsd.org>, freebsd-wireless@freebsd.org Subject: Re: net80211 race conditions seen in -HEAD Message-ID: <CAFZ_MY%2BifiXc3iPfDEuWNHyr7JvhuG55uzp3BTmCO2Ek2G1LOg@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
> ------------------------------ > > Message: 14 > Date: Sat, 21 Jan 2012 21:40:12 -0800 > From: Adrian Chadd <adrian@freebsd.org> > Subject: net80211 race conditions seen in -HEAD > To: freebsd-wireless@freebsd.org > Message-ID: > =A0 =A0 =A0 =A0<CAJ-Vmo=3D0Q7++oJtZ0jTPUD8q=3DFwkgJs5EoEXs+jt-XmMADEjtA@m= ail.gmail.com> > Content-Type: text/plain; charset=3DISO-8859-1 > > Hi, > > I've noticed some kernel panics in net80211/ath in -HEAD. It in all > instances boils down to a now-invalid ieee80211_node - either it's > partially allocated/copied, or it's been recently freed. > > > > This became increasingly obvious when doing DFS CAC, as the kernel was no= w > changing the channel quite frequently on me whilst simulating/processing > radar events. I've since found I can mostly reproduce it in the lab (when > surrounded by ridiculous levels of RX intereference traffic, triggering a= ll > kinds of events) whilst creating/destroying VAPs. > > Now that I have debugging code in place (which as a side effect makes it > very difficult now to cause a crash, let alone tickle the race condition) > it's glaringly obvious what's going on. > > There's five contexts stuff can occur, at least in the net80211/ath case: > > * the swi (ie ath_intr(), ath_beacon_proc) > * the ath taskqueue; > * the net80211 taskqueue; > * the ioctl() context, coming up from a userland process; > * a callout running in the clock thread. > > Now, callouts should _hopefully_ be grabbing and releasing locks correctl= y. > We've found a few spots where they weren't (leading to quite silly state > races and crashes.) > > I'm going to ignore the obvious possible problems with multiple concurren= t > processes doing ioctl()s. l'm simply going to operate on the principle th= at > the multiple-ioctl() path is fine. > > It seems that -obtaining- references to vap->iv_bss aren't locked. So in > (say) ieee80211_sta_join1() the iv_bss node can be dereferenced and freed= . > If this is going on concurrently with (say) something going on in the > net80211 taskqueue (eg a newstate call) then I _think_ it's possible for > the ath_newstate() code to get a reference to vap->iv_bss simultaneously > with it being freed in ieee80211_sta_join1() (or similar.) So the > ath_newstate() code will be assigned a 'ni' that has just been freed. > > I've seen another crash in the net80211_ht code where it _looks_ like the > bss node wasn't entirely setup - bsschan was 0xffff - so the kernel panic= ed > hard there. > > This likely explains a lot of the "weird stuff" people have been reportin= g. > I also think the bgscan race is related to this - I can't help but wonder > if the bgscan callout/event is also coinciding with wpa_supplicant doing > stuff, and a race condition ends up leaving the vap w/ the sta power save > flag set. > > I don't yet have a solution to all of this - I just wanted to brain dump > what I've seen thus far. > Here is my brain dump. While ago usb wifi drivers had the slimier issue (race in 80211 stack). It's worth checking this rev. http://svnweb.freebsd.org/base?view=3Drevision&revision=3D212127 AK
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFZ_MY%2BifiXc3iPfDEuWNHyr7JvhuG55uzp3BTmCO2Ek2G1LOg>