Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 25 Jan 2012 07:43:33 -0700
From:      PseudoCylon <moonlightakkiy@yahoo.ca>
To:        Adrian Chadd <adrian@freebsd.org>, freebsd-wireless@freebsd.org
Subject:   Re: net80211 race conditions seen in -HEAD
Message-ID:  <CAFZ_MY%2BifiXc3iPfDEuWNHyr7JvhuG55uzp3BTmCO2Ek2G1LOg@mail.gmail.com>

next in thread | raw e-mail | index | archive | help
> ------------------------------
>
> Message: 14
> Date: Sat, 21 Jan 2012 21:40:12 -0800
> From: Adrian Chadd <adrian@freebsd.org>
> Subject: net80211 race conditions seen in -HEAD
> To: freebsd-wireless@freebsd.org
> Message-ID:
> =A0 =A0 =A0 =A0<CAJ-Vmo=3D0Q7++oJtZ0jTPUD8q=3DFwkgJs5EoEXs+jt-XmMADEjtA@m=
ail.gmail.com>
> Content-Type: text/plain; charset=3DISO-8859-1
>
> Hi,
>
> I've noticed some kernel panics in net80211/ath in -HEAD. It in all
> instances boils down to a now-invalid ieee80211_node - either it's
> partially allocated/copied, or it's been recently freed.
>
>
>
> This became increasingly obvious when doing DFS CAC, as the kernel was no=
w
> changing the channel quite frequently on me whilst simulating/processing
> radar events. I've since found I can mostly reproduce it in the lab (when
> surrounded by ridiculous levels of RX intereference traffic, triggering a=
ll
> kinds of events) whilst creating/destroying VAPs.
>
> Now that I have debugging code in place (which as a side effect makes it
> very difficult now to cause a crash, let alone tickle the race condition)
> it's glaringly obvious what's going on.
>
> There's five contexts stuff can occur, at least in the net80211/ath case:
>
> * the swi (ie ath_intr(), ath_beacon_proc)
> * the ath taskqueue;
> * the net80211 taskqueue;
> * the ioctl() context, coming up from a userland process;
> * a callout running in the clock thread.
>
> Now, callouts should _hopefully_ be grabbing and releasing locks correctl=
y.
> We've found a few spots where they weren't (leading to quite silly state
> races and crashes.)
>
> I'm going to ignore the obvious possible problems with multiple concurren=
t
> processes doing ioctl()s. l'm simply going to operate on the principle th=
at
> the multiple-ioctl() path is fine.
>
> It seems that -obtaining- references to vap->iv_bss aren't locked. So in
> (say) ieee80211_sta_join1() the iv_bss node can be dereferenced and freed=
.
> If this is going on concurrently with (say) something going on in the
> net80211 taskqueue (eg a newstate call) then I _think_ it's possible for
> the ath_newstate() code to get a reference to vap->iv_bss simultaneously
> with it being freed in ieee80211_sta_join1() (or similar.) So the
> ath_newstate() code will be assigned a 'ni' that has just been freed.
>
> I've seen another crash in the net80211_ht code where it _looks_ like the
> bss node wasn't entirely setup - bsschan was 0xffff - so the kernel panic=
ed
> hard there.
>
> This likely explains a lot of the "weird stuff" people have been reportin=
g.
> I also think the bgscan race is related to this - I can't help but wonder
> if the bgscan callout/event is also coinciding with wpa_supplicant doing
> stuff, and a race condition ends up leaving the vap w/ the sta power save
> flag set.
>
> I don't yet have a solution to all of this - I just wanted to brain dump
> what I've seen thus far.
>

Here is my brain dump.

While ago usb wifi drivers had the slimier issue (race in 80211
stack). It's worth checking this rev.
http://svnweb.freebsd.org/base?view=3Drevision&revision=3D212127

AK



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFZ_MY%2BifiXc3iPfDEuWNHyr7JvhuG55uzp3BTmCO2Ek2G1LOg>