Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 30 Jul 2012 17:28:52 -0700
From:      Adrian Chadd <adrian.chadd@gmail.com>
To:        PseudoCylon <moonlightakkiy@yahoo.ca>
Cc:        freebsd-wireless@freebsd.org, Kim Culhan <w8hdkim@gmail.com>
Subject:   Re: ath lor
Message-ID:  <CAJ-Vmomk9fsCx1zXGPfXGLLPo=Bpcqvxs=EEg16gJAJE2u9BFA@mail.gmail.com>
In-Reply-To: <CAJ-VmonktzxboAcVhiJQipS58Wmr17Jyu8M3efgaBkkJRh5-XA@mail.gmail.com>
References:  <CAFZ_MYKgUkryy4parts3QahAyPA7FY9xUqC98_E7oFW%2BzarA8A@mail.gmail.com> <CAJ-VmonktzxboAcVhiJQipS58Wmr17Jyu8M3efgaBkkJRh5-XA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 30 July 2012 12:11, Adrian Chadd <adrian.chadd@gmail.com> wrote:
> Well, we really need to figure out exactly the code and locking paths
> that is causing these LORs. I don't have the time to figure it out
> just at the moment - I want to focus on bringing the AR93xx series
> NICs into FreeBSD in a clean, non-hacky method.

Looking at Kim's backtrace:

Jul 21 16:09:49 foo kernel: lock order reversal:
Jul 21 16:09:49 foo kernel: 1st 0xffffff8001ad3948 ath0_scan_lock
(ath0_scan_lock) @ /usr/src/sys/net80211/ieee8
0211_node.c:2166

This is in ieee80211_iterate_nodes()

Jul 21 16:09:49 foo kernel: 2nd 0xffffff8001ad2018 ath0_com_lock
(ath0_com_lock) @ /usr/src/sys/net80211/ieee802
11_node.c:2518

This is in ieee80211_node_leave() - which is sometimes called from
ieee80211_iterate_node() via some iterator functions (eg
sta_disassoc() -> ieee80211_node_leave()) and thus will be called with
the scan lock held.

So, any point where ieee80211_iterate_nodes() is called with the com
lock held (IEEE80211_LOCK()) is going to lead to a LOR.

Thus- if we move the actual call of the iterator function to be
_outside_ of the iteration lock, we should be ok.

The main downside to this is that unfortunately we'll end up having
the possibility of overlapping iterations occuring. Right now that's
not possible, as the scan lock prevents to calls to
ieee80211_iterate_node() from overlapping. This may have some flow-on
effects.

The other place where the iterate lock is grabbed is
ieee80211_timeout_stations(), which does much of what
ieee80211_iterate_node() does. I don't know why it doesn't actually
use ieee80211_iterate_node().

Ok. so there's two possible LORs that occur:

* one is with ieee80211_iterate_nodes() and itself - where one
instance is called with no comlock held, and one called with the
comlock held. The latter occurs from newstate(), where some node
iteration is done from inside the comlock.
* one is with ieee80211_iterate_nodes() with
ieee80211_timeout_stations() - again, becausse the former can be
called with or without the comlock held.

The timeout is a bit annoying - it'd be nice if all the callouts
actually took/held a mutex. The inactivity timer doesn't, so it can't
be atomically cancelled in any useful way. If it were modified to do
so, the comlock would end up being held across a whole lot of these
functions. That would make locking very, very delicate, as now you
risk having locks being held across calls to the drivers.

So, hm. What to do next. I'd personally like to define
ieee80211_iterate_nodes() as "can't be called with the node table OR
the com lock held" but that'd require some significant reworking.

Feedback?



Adrian

Jul 21 16:09:49 foo kernel: KDB: stack backtrace:
Jul 21 16:09:49 foo kernel: db_trace_self_wrapper() at
db_trace_self_wrapper+0x37
Jul 21 16:09:49 foo kernel: kdb_backtrace() at kdb_backtrace+0x39
Jul 21 16:09:49 foo kernel: witness_checkorder() at witness_checkorder+0xca1
Jul 21 16:09:49 foo kernel: _mtx_lock_flags() at _mtx_lock_flags+0x79
Jul 21 16:09:49 foo kernel: ieee80211_node_leave() at
ieee80211_node_leave+0x97
Jul 21 16:09:49 foo kernel: ieee80211_iterate_nodes() at
ieee80211_iterate_nodes+0x89
Jul 21 16:09:49 foo kernel: setmlme_common() at setmlme_common+0x408
Jul 21 16:09:49 foo kernel: ieee80211_ioctl_setmlme() at
ieee80211_ioctl_setmlme+0x87
Jul 21 16:09:49 foo kernel: ieee80211_ioctl_set80211() at
ieee80211_ioctl_set80211+0x5b0
Jul 21 16:09:49 foo kernel: in_control() at in_control+0x234
Jul 21 16:09:49 foo kernel: ifioctl() at ifioctl+0x148c
Jul 21 16:09:49 foo kernel: kern_ioctl() at kern_ioctl+0x1dc
Jul 21 16:09:49 foo kernel: sys_ioctl() at sys_ioctl+0x12e
Jul 21 16:09:50 foo kernel: amd64_syscall() at amd64_syscall+0x25a
Jul 21 16:09:50 foo kernel: Xfast_syscall() at Xfast_syscall+0xfb
Jul 21 16:09:50 foo kernel: --- syscall (54, FreeBSD ELF64,
sys_ioctl), rip = 0x801210dfc, rsp = 0x7fffffffda78,
 rbp = 0x2a ---
Jul 21 16:09:50 foo kernel: ath0: stuck beacon; resetting (bmiss count 4)

thanks



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-Vmomk9fsCx1zXGPfXGLLPo=Bpcqvxs=EEg16gJAJE2u9BFA>