Date: Mon, 30 Jul 2012 17:28:52 -0700 From: Adrian Chadd <adrian.chadd@gmail.com> To: PseudoCylon <moonlightakkiy@yahoo.ca> Cc: freebsd-wireless@freebsd.org, Kim Culhan <w8hdkim@gmail.com> Subject: Re: ath lor Message-ID: <CAJ-Vmomk9fsCx1zXGPfXGLLPo=Bpcqvxs=EEg16gJAJE2u9BFA@mail.gmail.com> In-Reply-To: <CAJ-VmonktzxboAcVhiJQipS58Wmr17Jyu8M3efgaBkkJRh5-XA@mail.gmail.com> References: <CAFZ_MYKgUkryy4parts3QahAyPA7FY9xUqC98_E7oFW%2BzarA8A@mail.gmail.com> <CAJ-VmonktzxboAcVhiJQipS58Wmr17Jyu8M3efgaBkkJRh5-XA@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 30 July 2012 12:11, Adrian Chadd <adrian.chadd@gmail.com> wrote: > Well, we really need to figure out exactly the code and locking paths > that is causing these LORs. I don't have the time to figure it out > just at the moment - I want to focus on bringing the AR93xx series > NICs into FreeBSD in a clean, non-hacky method. Looking at Kim's backtrace: Jul 21 16:09:49 foo kernel: lock order reversal: Jul 21 16:09:49 foo kernel: 1st 0xffffff8001ad3948 ath0_scan_lock (ath0_scan_lock) @ /usr/src/sys/net80211/ieee8 0211_node.c:2166 This is in ieee80211_iterate_nodes() Jul 21 16:09:49 foo kernel: 2nd 0xffffff8001ad2018 ath0_com_lock (ath0_com_lock) @ /usr/src/sys/net80211/ieee802 11_node.c:2518 This is in ieee80211_node_leave() - which is sometimes called from ieee80211_iterate_node() via some iterator functions (eg sta_disassoc() -> ieee80211_node_leave()) and thus will be called with the scan lock held. So, any point where ieee80211_iterate_nodes() is called with the com lock held (IEEE80211_LOCK()) is going to lead to a LOR. Thus- if we move the actual call of the iterator function to be _outside_ of the iteration lock, we should be ok. The main downside to this is that unfortunately we'll end up having the possibility of overlapping iterations occuring. Right now that's not possible, as the scan lock prevents to calls to ieee80211_iterate_node() from overlapping. This may have some flow-on effects. The other place where the iterate lock is grabbed is ieee80211_timeout_stations(), which does much of what ieee80211_iterate_node() does. I don't know why it doesn't actually use ieee80211_iterate_node(). Ok. so there's two possible LORs that occur: * one is with ieee80211_iterate_nodes() and itself - where one instance is called with no comlock held, and one called with the comlock held. The latter occurs from newstate(), where some node iteration is done from inside the comlock. * one is with ieee80211_iterate_nodes() with ieee80211_timeout_stations() - again, becausse the former can be called with or without the comlock held. The timeout is a bit annoying - it'd be nice if all the callouts actually took/held a mutex. The inactivity timer doesn't, so it can't be atomically cancelled in any useful way. If it were modified to do so, the comlock would end up being held across a whole lot of these functions. That would make locking very, very delicate, as now you risk having locks being held across calls to the drivers. So, hm. What to do next. I'd personally like to define ieee80211_iterate_nodes() as "can't be called with the node table OR the com lock held" but that'd require some significant reworking. Feedback? Adrian Jul 21 16:09:49 foo kernel: KDB: stack backtrace: Jul 21 16:09:49 foo kernel: db_trace_self_wrapper() at db_trace_self_wrapper+0x37 Jul 21 16:09:49 foo kernel: kdb_backtrace() at kdb_backtrace+0x39 Jul 21 16:09:49 foo kernel: witness_checkorder() at witness_checkorder+0xca1 Jul 21 16:09:49 foo kernel: _mtx_lock_flags() at _mtx_lock_flags+0x79 Jul 21 16:09:49 foo kernel: ieee80211_node_leave() at ieee80211_node_leave+0x97 Jul 21 16:09:49 foo kernel: ieee80211_iterate_nodes() at ieee80211_iterate_nodes+0x89 Jul 21 16:09:49 foo kernel: setmlme_common() at setmlme_common+0x408 Jul 21 16:09:49 foo kernel: ieee80211_ioctl_setmlme() at ieee80211_ioctl_setmlme+0x87 Jul 21 16:09:49 foo kernel: ieee80211_ioctl_set80211() at ieee80211_ioctl_set80211+0x5b0 Jul 21 16:09:49 foo kernel: in_control() at in_control+0x234 Jul 21 16:09:49 foo kernel: ifioctl() at ifioctl+0x148c Jul 21 16:09:49 foo kernel: kern_ioctl() at kern_ioctl+0x1dc Jul 21 16:09:49 foo kernel: sys_ioctl() at sys_ioctl+0x12e Jul 21 16:09:50 foo kernel: amd64_syscall() at amd64_syscall+0x25a Jul 21 16:09:50 foo kernel: Xfast_syscall() at Xfast_syscall+0xfb Jul 21 16:09:50 foo kernel: --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x801210dfc, rsp = 0x7fffffffda78, rbp = 0x2a --- Jul 21 16:09:50 foo kernel: ath0: stuck beacon; resetting (bmiss count 4) thanks
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-Vmomk9fsCx1zXGPfXGLLPo=Bpcqvxs=EEg16gJAJE2u9BFA>