From owner-freebsd-stable@FreeBSD.ORG Sat Apr 24 07:50:32 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F26F9106564A for ; Sat, 24 Apr 2010 07:50:32 +0000 (UTC) (envelope-from bschmidt@mx.techwires.net) Received: from mx.techwires.net (mx.techwires.net [IPv6:2001:4d88:100f:1::3]) by mx1.freebsd.org (Postfix) with ESMTP id 3CBF78FC12 for ; Sat, 24 Apr 2010 07:50:32 +0000 (UTC) Received: by mx.techwires.net (Postfix, from userid 1001) id 85A101D922; Sat, 24 Apr 2010 09:50:31 +0200 (CEST) Date: Sat, 24 Apr 2010 09:50:31 +0200 From: Bernhard Schmidt To: Garrett Cooper Message-ID: <20100424075031.GA98660@mx.techwires.net> References: <20100424073430.GB62910@mx.techwires.net> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.4.2.3i Cc: Brandon Gooch , Olivier =?iso-8859-1?Q?Cochard-Labb=E9?= , freebsd-stable@freebsd.org Subject: Re: iwn firmware instability with an up-to-date stable kernel X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 24 Apr 2010 07:50:33 -0000 On Sat, Apr 24, 2010 at 12:45:14AM -0700, Garrett Cooper wrote: > On Sat, Apr 24, 2010 at 12:34 AM, Bernhard Schmidt > wrote: > > On Fri, Apr 23, 2010 at 11:27:32PM -0700, Garrett Cooper wrote: > >> On Fri, Apr 23, 2010 at 10:08 PM, Brandon Gooch > >> wrote: > >> > On Sat, Apr 24, 2010 at 4:59 AM, Garrett Cooper wrote: > >> >> On Fri, Apr 23, 2010 at 9:42 PM, Garrett Cooper wrote: > >> >>> On Fri, Apr 23, 2010 at 8:05 PM, Brandon Gooch > >> >>> wrote: > >> >>>> 2010/4/23 Garrett Cooper : > >> >>>>> 2010/4/23 Garrett Cooper : > >> >>>>>> 2010/4/18 Olivier Cochard-Labbé : > >> >>>>>>> 2010/4/18 Bernhard Schmidt : > >> >>>>>>>> Are you able to reproduce this on demand? As in type a few commands and > >> >>>>>>>> the firmware error occurs? > >> >>>>>>>> > >> >>>>>>> > >> >>>>>>> No, I'm not able to reproduce on demand this problem. > >> >>>>>> > >> >>>>>> I'm seeing similar issues on occasion with my Lenovo as well: > >> >>>>>> > >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: firmware error log: > >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: error type      = > >> >>>>>> "NMI_INTERRUPT_WDG" (0x00000004) > >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: program counter = 0x0000046C > >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: source line     = 0x000000D0 > >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: error data      = 0x0000000207030000 > >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: branch link     = 0x00008370000004C2 > >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: interrupt link  = 0x000006DA000018B8 > >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: time            = 4287402440 > >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: driver status: > >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring  0: qid=0  cur=1   queued=0 > >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring  1: qid=1  cur=0   queued=0 > >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring  2: qid=2  cur=0   queued=0 > >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring  3: qid=3  cur=36  queued=0 > >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring  4: qid=4  cur=123 queued=0 > >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring  5: qid=5  cur=0   queued=0 > >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring  6: qid=6  cur=0   queued=0 > >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring  7: qid=7  cur=0   queued=0 > >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring  8: qid=8  cur=0   queued=0 > >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring  9: qid=9  cur=0   queued=0 > >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring 10: qid=10 cur=0   queued=0 > >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring 11: qid=11 cur=0   queued=0 > >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring 12: qid=12 cur=0   queued=0 > >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring 13: qid=13 cur=0   queued=0 > >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring 14: qid=14 cur=0   queued=0 > >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring 15: qid=15 cur=0   queued=0 > >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: rx ring: cur=8 > >> >>>>>> > >> >>>>>> This may be because the system was under load (I was installing a port > >> >>>>>> shortly before the connection dropped). I'll try poking at this > >> >>>>>> further because it's going to be an annoying productivity loss :/. > >> >>>>> > >> >>>>>    Sorry... should have included more helpful details. > >> >>>>> Thanks, > >> >>>>> -Garrett > >> >>>>> > >> >>>>> dmesg: > >> >>>>> > >> >>>>> iwn0: mem 0xdf2fe000-0xdf2fffff irq 17 > >> >>>>> at device 0.0 on pci3 > >> >>>>> iwn0: MIMO 2T3R, MoW1, address 00:1d:e0:7d:9f:c7 > >> >>>>> iwn0: [ITHREAD] > >> >>>>> iwn0: 11a rates: 6Mbps 9Mbps 12Mbps 18Mbps 24Mbps 36Mbps 48Mbps 54Mbps > >> >>>>> iwn0: 11b rates: 1Mbps 2Mbps 5.5Mbps 11Mbps > >> >>>>> iwn0: 11g rates: 1Mbps 2Mbps 5.5Mbps 11Mbps 6Mbps 9Mbps 12Mbps 18Mbps > >> >>>>> 24Mbps 36Mbps 48Mbps 54Mbps > >> >>>>> > >> >>>>> pciconf -lv snippet: > >> >>>>> > >> >>>>> iwn0@pci0:3:0:0:        class=0x028000 card=0x11108086 chip=0x42308086 > >> >>>>> rev=0x61 hdr=0x00 > >> >>>>>    vendor     = 'Intel Corporation' > >> >>>>>    device     = 'Intel Wireless WiFi Link 4965AGN (Intel 4965AGN)' > >> >>>>>    class      = network > >> >>>>> cbb0@pci0:21:0:0:       class=0x060700 card=0x20c617aa chip=0x04761180 > >> >>>>> rev=0xba hdr=0x02 > >> >>>>> > >> >>>>> uname -a: > >> >>>>> > >> >>>>> $ uname -a > >> >>>>> FreeBSD garrcoop-fbsd.cisco.com 8.0-STABLE FreeBSD 8.0-STABLE #0 > >> >>>>> r207006: Wed Apr 21 13:18:44 PDT 2010 > >> >>>>> root@garrcoop-fbsd.cisco.com:/usr/obj/usr/src/sys/LAPPY_X86  i386 > >> >>>> > >> >>>> I'm actually looking at this right now. For me, it's actually > >> >>>> happening when my machine stays on overnight (or for long periods of > >> >>>> time, idle). > >> >>>> > >> >>>> Also, it seems to be causing the kernel to panic, although I'm now > >> >>>> wondering if the Machine Check Architecture is somehow catching this > >> >>>> device error and causing an exception (hw.mca.enabled=1)(?) -- not > >> >>>> possible, right ??? > >> >>>> > >> >>>> Whatever the case, I can't seem to get the firmware error to occur > >> >>>> with iwn(4) debugging or wlandebug options enabled, so who knows > >> >>>> exactly what leads to this. > >> >>>> > >> >>>> I know Bernhard has worked hard on this driver, it's a shame that this > >> >>>> freaky bug has bit us all now, without leaving many clues :( > >> >>>> > >> >>>> I've attached a textdump for posterity if nothing else :) > >> >>> > >> >>>    Connectivity appears to be shoddy in my neck of the woods (kind of > >> >>> ironic... but meh). Just running buildworld, buildkernel, then doing a > >> >>> tcpdump in parallel causes the pseudo device to go up and down a lot. > >> >>> I assume this isn't standard behavior? > >> >>>    Just for reference buildworld was started shortly after 19:39:05, > >> >>> and it finished at 21:29. The interface has also gone up and down once > >> >>> since then while the system's been basically idle. > >> >> > >> >>    Hmmm... I'm seem to be in an excellent position to reproduce this > >> >> issue. I've reproduced it twice by merely bringing the interface up > >> >> and down several times using: > >> >> > >> >> ifconfig_wlan0="WPA DHCP" > >> >> > >> >>    instead of my usual: > >> >> > >> >> ifconfig_wlan0="WPA ssid DHCP" > >> >> > >> >>    Maybe others who are experiencing the issue should try that? I'll > >> >> do more testing when I get home... > > > > How did you do that? Reloading the module, or with ifconfig? > > /etc/rc.d/netif restart , which does the ifconfig operations (no > module change occurred AFAIK, but wlan0 did of course do some > device_printf's when it was associating itself with iwn(4)). Can you do ps xa | grep wpa? Just wondering if wpa_supplicant gets started twice. > >> > > >> > My rc.conf is: > >> > > >> > ifconfig_wlan0="WPA DHCP" > >> > > >> > ...as well, although I haven't tried manually taking the interface > >> > down and bringing it back up. > >> > >> Hmmm... that is interesting. I wish I could do that, but it seems to > >> be alluding my grasp right now. The driver just kind of freaks out > >> with a bunch of SSIDs, one being my target SSID, a bunch of NUL string > >> ones, and then finally it just croaks. I need to figure out whether or > >> not the SSIDs are valid when I boot it up at my desk again. > >> > >> > Are you waiting for the device to associate and begin passing traffic > >> > before you each up/down cycle? > >> > >> I was, but I'm not sure whether or not the Ajax pieces in GMail were. > >> I'll try some more rudimentary tests when I get back to work on Monday > >> in that environment, but I need to try out other things at home as > >> well in the meantime. > > Thanks, > -Garrett -- Bernhard