Date: Sat, 21 Jan 2012 21:40:12 -0800 From: Adrian Chadd <adrian@freebsd.org> To: freebsd-wireless@freebsd.org Subject: net80211 race conditions seen in -HEAD Message-ID: <CAJ-Vmo=0Q7%2B%2BoJtZ0jTPUD8q=FwkgJs5EoEXs%2Bjt-XmMADEjtA@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
Hi, I've noticed some kernel panics in net80211/ath in -HEAD. It in all instances boils down to a now-invalid ieee80211_node - either it's partially allocated/copied, or it's been recently freed. This became increasingly obvious when doing DFS CAC, as the kernel was now changing the channel quite frequently on me whilst simulating/processing radar events. I've since found I can mostly reproduce it in the lab (when surrounded by ridiculous levels of RX intereference traffic, triggering all kinds of events) whilst creating/destroying VAPs. Now that I have debugging code in place (which as a side effect makes it very difficult now to cause a crash, let alone tickle the race condition) it's glaringly obvious what's going on. There's five contexts stuff can occur, at least in the net80211/ath case: * the swi (ie ath_intr(), ath_beacon_proc) * the ath taskqueue; * the net80211 taskqueue; * the ioctl() context, coming up from a userland process; * a callout running in the clock thread. Now, callouts should _hopefully_ be grabbing and releasing locks correctly. We've found a few spots where they weren't (leading to quite silly state races and crashes.) I'm going to ignore the obvious possible problems with multiple concurrent processes doing ioctl()s. l'm simply going to operate on the principle that the multiple-ioctl() path is fine. It seems that -obtaining- references to vap->iv_bss aren't locked. So in (say) ieee80211_sta_join1() the iv_bss node can be dereferenced and freed. If this is going on concurrently with (say) something going on in the net80211 taskqueue (eg a newstate call) then I _think_ it's possible for the ath_newstate() code to get a reference to vap->iv_bss simultaneously with it being freed in ieee80211_sta_join1() (or similar.) So the ath_newstate() code will be assigned a 'ni' that has just been freed. I've seen another crash in the net80211_ht code where it _looks_ like the bss node wasn't entirely setup - bsschan was 0xffff - so the kernel paniced hard there. This likely explains a lot of the "weird stuff" people have been reporting. I also think the bgscan race is related to this - I can't help but wonder if the bgscan callout/event is also coinciding with wpa_supplicant doing stuff, and a race condition ends up leaving the vap w/ the sta power save flag set. I don't yet have a solution to all of this - I just wanted to brain dump what I've seen thus far. Adrian
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-Vmo=0Q7%2B%2BoJtZ0jTPUD8q=FwkgJs5EoEXs%2Bjt-XmMADEjtA>