From owner-freebsd-wireless@FreeBSD.ORG Sun Jan 22 05:40:13 2012 Return-Path: Delivered-To: freebsd-wireless@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 373AC1065677 for ; Sun, 22 Jan 2012 05:40:13 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: from mail-vx0-f182.google.com (mail-vx0-f182.google.com [209.85.220.182]) by mx1.freebsd.org (Postfix) with ESMTP id E56C38FC15 for ; Sun, 22 Jan 2012 05:40:12 +0000 (UTC) Received: by vcbfl17 with SMTP id fl17so1923161vcb.13 for ; Sat, 21 Jan 2012 21:40:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:date:x-google-sender-auth:message-id:subject :from:to:content-type; bh=BRF+V4bSprcPJktwr/zXf/4XLH54kp1aG8ZO0eGIk4o=; b=LqJ27SUeKrkf1wZo5ypLuPgjc8QY4+Ia9H+aG/fltfDJzCnNsZ1ErsDGx81I9ZNFgm dJa5xQqC5lptR3cDUkzxdUHCouxYMnBkR7ofViUvQi4KK56h7sGHZW6FSVL+Cyg5Cikb qzagvIf0TcYmHDytaKiSagNU6eeI0HkGmwVXw= MIME-Version: 1.0 Received: by 10.220.149.212 with SMTP id u20mr2025483vcv.7.1327210812061; Sat, 21 Jan 2012 21:40:12 -0800 (PST) Sender: adrian.chadd@gmail.com Received: by 10.52.36.5 with HTTP; Sat, 21 Jan 2012 21:40:12 -0800 (PST) Date: Sat, 21 Jan 2012 21:40:12 -0800 X-Google-Sender-Auth: Uygml6ye8AgX74LgOE2hebI9-8k Message-ID: From: Adrian Chadd To: freebsd-wireless@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: net80211 race conditions seen in -HEAD X-BeenThere: freebsd-wireless@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of 802.11 stack, tools device driver development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 22 Jan 2012 05:40:13 -0000 Hi, I've noticed some kernel panics in net80211/ath in -HEAD. It in all instances boils down to a now-invalid ieee80211_node - either it's partially allocated/copied, or it's been recently freed. This became increasingly obvious when doing DFS CAC, as the kernel was now changing the channel quite frequently on me whilst simulating/processing radar events. I've since found I can mostly reproduce it in the lab (when surrounded by ridiculous levels of RX intereference traffic, triggering all kinds of events) whilst creating/destroying VAPs. Now that I have debugging code in place (which as a side effect makes it very difficult now to cause a crash, let alone tickle the race condition) it's glaringly obvious what's going on. There's five contexts stuff can occur, at least in the net80211/ath case: * the swi (ie ath_intr(), ath_beacon_proc) * the ath taskqueue; * the net80211 taskqueue; * the ioctl() context, coming up from a userland process; * a callout running in the clock thread. Now, callouts should _hopefully_ be grabbing and releasing locks correctly. We've found a few spots where they weren't (leading to quite silly state races and crashes.) I'm going to ignore the obvious possible problems with multiple concurrent processes doing ioctl()s. l'm simply going to operate on the principle that the multiple-ioctl() path is fine. It seems that -obtaining- references to vap->iv_bss aren't locked. So in (say) ieee80211_sta_join1() the iv_bss node can be dereferenced and freed. If this is going on concurrently with (say) something going on in the net80211 taskqueue (eg a newstate call) then I _think_ it's possible for the ath_newstate() code to get a reference to vap->iv_bss simultaneously with it being freed in ieee80211_sta_join1() (or similar.) So the ath_newstate() code will be assigned a 'ni' that has just been freed. I've seen another crash in the net80211_ht code where it _looks_ like the bss node wasn't entirely setup - bsschan was 0xffff - so the kernel paniced hard there. This likely explains a lot of the "weird stuff" people have been reporting. I also think the bgscan race is related to this - I can't help but wonder if the bgscan callout/event is also coinciding with wpa_supplicant doing stuff, and a race condition ends up leaving the vap w/ the sta power save flag set. I don't yet have a solution to all of this - I just wanted to brain dump what I've seen thus far. Adrian