From owner-freebsd-current@FreeBSD.ORG Wed Jun 29 10:42:57 2011 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 46ECB106566B; Wed, 29 Jun 2011 10:42:57 +0000 (UTC) (envelope-from bschmidt@techwires.net) Received: from mail-fx0-f44.google.com (mail-fx0-f44.google.com [209.85.161.44]) by mx1.freebsd.org (Postfix) with ESMTP id 809008FC13; Wed, 29 Jun 2011 10:42:56 +0000 (UTC) Received: by fxe6 with SMTP id 6so928002fxe.17 for ; Wed, 29 Jun 2011 03:42:55 -0700 (PDT) Received: by 10.223.21.7 with SMTP id h7mr1050795fab.72.1309344175533; Wed, 29 Jun 2011 03:42:55 -0700 (PDT) Received: from jessie.localnet (p5B2EC842.dip0.t-ipconnect.de [91.46.200.66]) by mx.google.com with ESMTPS id m6sm2770456fac.1.2011.06.29.03.42.54 (version=SSLv3 cipher=OTHER); Wed, 29 Jun 2011 03:42:54 -0700 (PDT) Sender: Bernhard Schmidt From: Bernhard Schmidt To: Stefan Esser Date: Wed, 29 Jun 2011 12:41:16 +0200 User-Agent: KMail/1.13.5 (Linux/2.6.32-32-generic; KDE/4.4.5; i686; ; ) References: <4E099EB2.7050902@freebsd.org> <4E0AE815.2070502@freebsd.org> In-Reply-To: <4E0AE815.2070502@freebsd.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Message-Id: <201106291241.17371.bschmidt@freebsd.org> Cc: Adrian Chadd , freebsd-current@freebsd.org Subject: Re: Panic in ieee80211 tx mgmt timeout X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: bschmidt@freebsd.org List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Jun 2011 10:42:57 -0000 On Wednesday, June 29, 2011 10:53:41 Stefan Esser wrote: > Am 29.06.2011 10:03, schrieb Adrian Chadd: > > On 29 June 2011 14:03, Bernhard Schmidt wrote: > >> It's name is ieee80211_tx_mgt_timeout used to track AUTH/ASSOC > >> requests. Afaik there is even a similar PR about that. > > Sorry, I manually entered the panic message, since dumps were not > working on my system at the time of that panic. > > >> Adrian, you've got a AP set up to drop either a AUTH or ASSOC > >> response frame? > > I've got a number of AUTH -> SCAN transition lost messages for wlan0, > seconds to minutes apart: > > Jun 28 21:16:17 kernel: wlan0: ieee80211_new_state_locked: pending AUTH > -> SCAN transition lost > Jun 28 21:34:46 kernel: wlan0: ieee80211_new_state_locked: pending AUTH > -> SCAN transition lost > Jun 28 21:36:33 kernel: wlan0: ieee80211_new_state_locked: pending AUTH > -> SCAN transition lost > Jun 28 21:45:14 kernel: wlan0: ieee80211_new_state_locked: pending AUTH > -> SCAN transition lost > Jun 28 21:45:44 kernel: wlan0: ieee80211_new_state_locked: pending AUTH > -> SCAN transition lost > > The setup is easy to reproduce, my rc.conf contained: > > wlans_ath0="wlan0" > ifconfig_ath0="down" > ifconfig_wlan0="down" > wpa_supplicant_enable="YES" Strip the last 3 lines, don't ever fiddle around with ath0 directly. This configuration always starts wpa_supplicant. > This system used to be connected via ath0, but recently was moved to a > place where Ethernet is available. The panics started only after WLAN > was not used anymore. I might disable wpa_supplicant, since it is not > required in the current situation, but did not try whether that helps > prevent the panic. > > > Tell me how and I'll set it up. > > > > A panic at that point in the function indicates maybe ni is NULL? > > or ni->vap is now NULL, maybe? > > I recreated the panic, this time with kernel dumps correctly configured > (thanks for the hint, Scott). The panic message is: > > Fatal trap 12: page fault while in kernel mode > cpuid = 0; apic id = 00 > fault virtual address = 0xffffff809c7a1000 > fault code = supervisor read data, page not present > instruction pointer = 0x20:0xffffffff805e1851 > stack pointer = 0x28:0xffffff8000288ab0 > frame pointer = 0x28:0xffffff8000288b60 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 11 (swi4: clock) > > Traceback: > > #10 0xffffffff805e1851 in ieee80211_tx_mgt_timeout (arg=0xffffff809c7a1000) > at ../../../net80211/ieee80211_output.c:2487 > > This indicates, that an invalid argument is passed and assigned to > "*ni", which causes the page fault when dereferencing "ni" to obtain "*va". The problem here seems to be wpa_supplicant. It can try to associate at any given point in time which results in the BSS ni being destroyed, though it might still be referenced somewhere (In this case the timeout stuff, or better said ath's TX queue). Not clearing the reference (or stopping whatever is using it) is the fault here. Now how to figure out who the caller is? Got the complete backtrace? -- Bernhard