From owner-freebsd-current@FreeBSD.ORG Wed Jun 29 09:07:38 2011 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 49BAB1065670 for ; Wed, 29 Jun 2011 09:07:38 +0000 (UTC) (envelope-from se@freebsd.org) Received: from nm5-vm0.bullet.mail.ne1.yahoo.com (nm5-vm0.bullet.mail.ne1.yahoo.com [98.138.90.251]) by mx1.freebsd.org (Postfix) with SMTP id 183358FC1E for ; Wed, 29 Jun 2011 09:07:38 +0000 (UTC) Received: from [98.138.90.48] by nm5.bullet.mail.ne1.yahoo.com with NNFMP; 29 Jun 2011 08:53:43 -0000 Received: from [98.138.84.34] by tm1.bullet.mail.ne1.yahoo.com with NNFMP; 29 Jun 2011 08:53:43 -0000 Received: from [127.0.0.1] by smtp102.mail.ne1.yahoo.com with NNFMP; 29 Jun 2011 08:53:43 -0000 X-Yahoo-Newman-Id: 606197.96215.bm@smtp102.mail.ne1.yahoo.com Received: from [192.168.119.20] (se@81.173.144.90 with plain) by smtp102.mail.ne1.yahoo.com with SMTP; 29 Jun 2011 01:53:42 -0700 PDT X-Yahoo-SMTP: iDf2N9.swBDAhYEh7VHfpgq0lnq. X-YMail-OSG: vl3NgUcVM1kGkTVe8dTugvA6Mx7aL9n8TA7NtarETXGA_O2 UIMhPFLNq79rRDoaS8u5d0cXw4shodd35Zw0uaD1fMpzYyoXoXSKgCAHSbhH JZxboJY0MOVTM1Cn770v4l.99iNycaXVafKKeCH1ETga_x_stzRvxdO8fv7C Kh0gA_je9g5qV538dfXBmNDELZs2k4vpJvGW150PT9HZSFwArfu16kNrcXQK vb.gcNnwArwcme2cGXHgN0hfOsc6huh9JLzkD9drYjtU.lXTjYko.Krp8fQp pRVtcQ_tXbTLBqCHO7LPO5LAkSist9ckXLPCkqUngUv8zZYZONvzN.wyvnNQ 2Bo8hJ0tNdElLoU1Ntt3.ve9vZTnXRfSkBvf0lQ-- X-Yahoo-Newman-Property: ymail-3 Message-ID: <4E0AE815.2070502@freebsd.org> Date: Wed, 29 Jun 2011 10:53:41 +0200 From: Stefan Esser User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20110620 Thunderbird/5.0b2 MIME-Version: 1.0 To: Adrian Chadd References: <4E099EB2.7050902@freebsd.org> <201106290803.36647.bschmidt@freebsd.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-current@freebsd.org, bschmidt@freebsd.org Subject: Re: Panic in ieee80211 tx mgmt timeout X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Jun 2011 09:07:38 -0000 Am 29.06.2011 10:03, schrieb Adrian Chadd: > On 29 June 2011 14:03, Bernhard Schmidt wrote: >> It's name is ieee80211_tx_mgt_timeout used to track AUTH/ASSOC >> requests. Afaik there is even a similar PR about that. Sorry, I manually entered the panic message, since dumps were not working on my system at the time of that panic. >> Adrian, you've got a AP set up to drop either a AUTH or ASSOC >> response frame? I've got a number of AUTH -> SCAN transition lost messages for wlan0, seconds to minutes apart: Jun 28 21:16:17 kernel: wlan0: ieee80211_new_state_locked: pending AUTH -> SCAN transition lost Jun 28 21:34:46 kernel: wlan0: ieee80211_new_state_locked: pending AUTH -> SCAN transition lost Jun 28 21:36:33 kernel: wlan0: ieee80211_new_state_locked: pending AUTH -> SCAN transition lost Jun 28 21:45:14 kernel: wlan0: ieee80211_new_state_locked: pending AUTH -> SCAN transition lost Jun 28 21:45:44 kernel: wlan0: ieee80211_new_state_locked: pending AUTH -> SCAN transition lost The setup is easy to reproduce, my rc.conf contained: wlans_ath0="wlan0" ifconfig_ath0="down" ifconfig_wlan0="down" wpa_supplicant_enable="YES" This system used to be connected via ath0, but recently was moved to a place where Ethernet is available. The panics started only after WLAN was not used anymore. I might disable wpa_supplicant, since it is not required in the current situation, but did not try whether that helps prevent the panic. > Tell me how and I'll set it up. > > A panic at that point in the function indicates maybe ni is NULL? > or ni->vap is now NULL, maybe? I recreated the panic, this time with kernel dumps correctly configured (thanks for the hint, Scott). The panic message is: Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0xffffff809c7a1000 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff805e1851 stack pointer = 0x28:0xffffff8000288ab0 frame pointer = 0x28:0xffffff8000288b60 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 11 (swi4: clock) Traceback: #10 0xffffffff805e1851 in ieee80211_tx_mgt_timeout (arg=0xffffff809c7a1000) at ../../../net80211/ieee80211_output.c:2487 This indicates, that an invalid argument is passed and assigned to "*ni", which causes the page fault when dereferencing "ni" to obtain "*va". I'm afraid that the assumption in the comment (about timeout being save to use) does not really hold: static void ieee80211_tx_mgt_timeout(void *arg) { struct ieee80211_node *ni = arg; struct ieee80211vap *vap = ni->ni_vap; if (vap->iv_state != IEEE80211_S_INIT && (vap->iv_ic->ic_flags & IEEE80211_F_SCAN) == 0) { /* * NB: it's safe to specify a timeout as the reason here; * it'll only be used in the right state. */ ieee80211_new_state(vap, IEEE80211_S_SCAN, IEEE80211_SCAN_FAIL_TIMEOUT)*vap ; } } If "vap" is valid during one invocation of that function, I'd expect it to at least be a pointer to valid kernel memory after the timeout. I.e., the value found by dereferencing it may be stale, but the pointer itself should at least not cause a page fault. (???) The compressed core.txt is 27KB, the compressed vmcore is 800MB. I might be able to find a place to upload the vmcore file to, but since I'm currently on a DSL with only 672KBit/s upstream, it would take me some 3 hours to upload to a better connected server (and I'd like to avoid doing that, if not essential for debugging). The core.txt is small enough to send by mail. Let me know if you think it helps you understand the problem. I'm willing to support debugging, e.g. by placement of printfs in my kernel for the timeout handler and the creation and destruction of *vap structures. After removal of "wlans_ath0=wlan0" the system will most probably be stable, I did not specifically test this case (i.e. ath0 configured, but no wlan0 created). I do know, that an "ifconfig down" of ath0 and wlan0 suffices; probably an "ifconfig wlan0 down" alone would be enough. So, I know how to avoid the panic, but I think it is still important to find the cause. Thank you for looking into this! Best regards, STefan