Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 9 Dec 2012 00:48:54 GMT
From:      Adrian Chadd <adrian@FreeBSD.org>
To:        freebsd-gnats-submit@FreeBSD.org
Subject:   misc/174283: [net80211] panics in ieee80211_ff_age() and ieee80211_ff_flush()
Message-ID:  <201212090048.qB90ms79056782@red.freebsd.org>
Resent-Message-ID: <201212090050.qB90o0rc049075@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         174283
>Category:       misc
>Synopsis:       [net80211] panics in ieee80211_ff_age() and ieee80211_ff_flush()
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sun Dec 09 00:50:00 UTC 2012
>Closed-Date:
>Last-Modified:
>Originator:     Adrian Chadd
>Release:        -HEAD
>Organization:
>Environment:
>Description:
There are panics in the net80211 fast-frame queue ageing and flushing code.

It looks like the staging queue ends up being empty and the net80211 FF routines have KASSERT()s to make sure the queue isn't empty.  I'm guessing its a sanity check - it shouldn't be called when the queues are empty.

However, the check is done without the comlock being held, so it's entirely plausible that there'll be a race or preemption between the check and actually checking/emptying the queue; where another thread (CPU or preempted thread) will empty the FF AC queue for us; once this returns it panics.

kgdb analysis of a crashdump shows:

* ath_tx_processq()
* ieee80211_ff_flush()
* ieee80211_ff_age()

ieee80211_ff_flush() checks if the queue is empty and if not, it calls ieee80211_ff_flush().

There's a bunch of places the FF routines are called from and these can and do overlap.


>How-To-Repeat:
* run 9-stable or -head with assert/witness enabled;
* iperf TCP between FF capable stations - just wait a while, it'll eventually trigger!
>Fix:
The solutions?

* stick the ieee80211_ff_*() calls in a specific taskqueue and call them from there, rather than from both the TX, RX and TX completion context;
* grab the comlock before checking, and make sure the function expects the comlock to be held and frees the comlock after;
* Just accept (and document) the check is racy/opportunistic; and remove the "is the queue empty?" KASSERT()s in the FF code.


>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201212090048.qB90ms79056782>