From owner-freebsd-wireless@FreeBSD.ORG Tue Jul 31 00:28:59 2012 Return-Path: Delivered-To: freebsd-wireless@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9CB10106566C for ; Tue, 31 Jul 2012 00:28:59 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: from mail-yw0-f54.google.com (mail-yw0-f54.google.com [209.85.213.54]) by mx1.freebsd.org (Postfix) with ESMTP id 566C78FC08 for ; Tue, 31 Jul 2012 00:28:59 +0000 (UTC) Received: by yhfs35 with SMTP id s35so6400262yhf.13 for ; Mon, 30 Jul 2012 17:28:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=cTKP6VKO6TArStFPtIkGOLYkYxGmlAd3TjjjiXdpJMA=; b=d6ITGinSYHaUtUkJA4HEPpmPyqw43h9X3D7PdrHaZswvW8CDs+K8Mhg7U2bsw5xVox upO5EfKWMYBK5T/7rcx0h5n5yJVHbb7PCqAwRz6wUcs31oSrI/dT0E3lVK2ZKZNiUp0A V+KcCy8THLJyqIIY3LlCxh+uJm3D6Dj1OiLBKF3Ys46+v7p2coBm0bV15AciG17jpuFy lynYq4yEFTvVvnRBtAc/tinKYBUrpDQROKdvAlBNaoi8DqrtWoCsrk3k5CcmSsw/iJ2l J18jeBpxBdPz8ieEhfPFkbkKsN7gvlheovtg0q9QDB2X5F970oB3CtpNpSlfUVoPT8+n 6rwg== MIME-Version: 1.0 Received: by 10.66.75.201 with SMTP id e9mr28225474paw.54.1343694532527; Mon, 30 Jul 2012 17:28:52 -0700 (PDT) Received: by 10.68.66.136 with HTTP; Mon, 30 Jul 2012 17:28:52 -0700 (PDT) In-Reply-To: References: Date: Mon, 30 Jul 2012 17:28:52 -0700 Message-ID: From: Adrian Chadd To: PseudoCylon Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-wireless@freebsd.org, Kim Culhan Subject: Re: ath lor X-BeenThere: freebsd-wireless@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of 802.11 stack, tools device driver development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 31 Jul 2012 00:28:59 -0000 On 30 July 2012 12:11, Adrian Chadd wrote: > Well, we really need to figure out exactly the code and locking paths > that is causing these LORs. I don't have the time to figure it out > just at the moment - I want to focus on bringing the AR93xx series > NICs into FreeBSD in a clean, non-hacky method. Looking at Kim's backtrace: Jul 21 16:09:49 foo kernel: lock order reversal: Jul 21 16:09:49 foo kernel: 1st 0xffffff8001ad3948 ath0_scan_lock (ath0_scan_lock) @ /usr/src/sys/net80211/ieee8 0211_node.c:2166 This is in ieee80211_iterate_nodes() Jul 21 16:09:49 foo kernel: 2nd 0xffffff8001ad2018 ath0_com_lock (ath0_com_lock) @ /usr/src/sys/net80211/ieee802 11_node.c:2518 This is in ieee80211_node_leave() - which is sometimes called from ieee80211_iterate_node() via some iterator functions (eg sta_disassoc() -> ieee80211_node_leave()) and thus will be called with the scan lock held. So, any point where ieee80211_iterate_nodes() is called with the com lock held (IEEE80211_LOCK()) is going to lead to a LOR. Thus- if we move the actual call of the iterator function to be _outside_ of the iteration lock, we should be ok. The main downside to this is that unfortunately we'll end up having the possibility of overlapping iterations occuring. Right now that's not possible, as the scan lock prevents to calls to ieee80211_iterate_node() from overlapping. This may have some flow-on effects. The other place where the iterate lock is grabbed is ieee80211_timeout_stations(), which does much of what ieee80211_iterate_node() does. I don't know why it doesn't actually use ieee80211_iterate_node(). Ok. so there's two possible LORs that occur: * one is with ieee80211_iterate_nodes() and itself - where one instance is called with no comlock held, and one called with the comlock held. The latter occurs from newstate(), where some node iteration is done from inside the comlock. * one is with ieee80211_iterate_nodes() with ieee80211_timeout_stations() - again, becausse the former can be called with or without the comlock held. The timeout is a bit annoying - it'd be nice if all the callouts actually took/held a mutex. The inactivity timer doesn't, so it can't be atomically cancelled in any useful way. If it were modified to do so, the comlock would end up being held across a whole lot of these functions. That would make locking very, very delicate, as now you risk having locks being held across calls to the drivers. So, hm. What to do next. I'd personally like to define ieee80211_iterate_nodes() as "can't be called with the node table OR the com lock held" but that'd require some significant reworking. Feedback? Adrian Jul 21 16:09:49 foo kernel: KDB: stack backtrace: Jul 21 16:09:49 foo kernel: db_trace_self_wrapper() at db_trace_self_wrapper+0x37 Jul 21 16:09:49 foo kernel: kdb_backtrace() at kdb_backtrace+0x39 Jul 21 16:09:49 foo kernel: witness_checkorder() at witness_checkorder+0xca1 Jul 21 16:09:49 foo kernel: _mtx_lock_flags() at _mtx_lock_flags+0x79 Jul 21 16:09:49 foo kernel: ieee80211_node_leave() at ieee80211_node_leave+0x97 Jul 21 16:09:49 foo kernel: ieee80211_iterate_nodes() at ieee80211_iterate_nodes+0x89 Jul 21 16:09:49 foo kernel: setmlme_common() at setmlme_common+0x408 Jul 21 16:09:49 foo kernel: ieee80211_ioctl_setmlme() at ieee80211_ioctl_setmlme+0x87 Jul 21 16:09:49 foo kernel: ieee80211_ioctl_set80211() at ieee80211_ioctl_set80211+0x5b0 Jul 21 16:09:49 foo kernel: in_control() at in_control+0x234 Jul 21 16:09:49 foo kernel: ifioctl() at ifioctl+0x148c Jul 21 16:09:49 foo kernel: kern_ioctl() at kern_ioctl+0x1dc Jul 21 16:09:49 foo kernel: sys_ioctl() at sys_ioctl+0x12e Jul 21 16:09:50 foo kernel: amd64_syscall() at amd64_syscall+0x25a Jul 21 16:09:50 foo kernel: Xfast_syscall() at Xfast_syscall+0xfb Jul 21 16:09:50 foo kernel: --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x801210dfc, rsp = 0x7fffffffda78, rbp = 0x2a --- Jul 21 16:09:50 foo kernel: ath0: stuck beacon; resetting (bmiss count 4) thanks