From owner-freebsd-stable@FreeBSD.ORG Sat Apr 24 07:45:15 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C3C5F106566C for ; Sat, 24 Apr 2010 07:45:15 +0000 (UTC) (envelope-from yanefbsd@gmail.com) Received: from mail-qy0-f181.google.com (mail-qy0-f181.google.com [209.85.221.181]) by mx1.freebsd.org (Postfix) with ESMTP id 73CDE8FC13 for ; Sat, 24 Apr 2010 07:45:15 +0000 (UTC) Received: by qyk11 with SMTP id 11so12517641qyk.13 for ; Sat, 24 Apr 2010 00:45:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=TLU/3bAZcw0g8BGYz/ycAibxk9uTBwAGpNZ+3MjVrdo=; b=Qs0k/wVwmYPlm9yGABA6im4kU5IiTSqIzYW1Cbah00+BsxmsLWgsgiPhGcpbaKmKm7 9Ju+kvsoLU/Zq+L4Z83nQ9SRWcs7jtWAsj1TWLCWTXzSE7hNH3b3U2t3oyssJHE1oCZm ZdFP/bm+Nio3dus5619mHwjcc+dOMNj530rQc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=GfNd82GagUiDmblxBZ3/jWA91usaGVUwEILmd93B4hPrqilyB9PA+VmhnkO6mY4q8F kLFcENCbuED8JIXvXFrgqIQf16bkalNpsxAjJ7Er/hn4lZs4bsBSHi/iPlaNn+gndycd C0mKb2TuCM4N8aWKX0833uFn9tYi3OWNUGJUg= MIME-Version: 1.0 Received: by 10.229.190.213 with SMTP id dj21mr1310652qcb.66.1272095114110; Sat, 24 Apr 2010 00:45:14 -0700 (PDT) Received: by 10.229.233.11 with HTTP; Sat, 24 Apr 2010 00:45:14 -0700 (PDT) In-Reply-To: <20100424073430.GB62910@mx.techwires.net> References: <20100424073430.GB62910@mx.techwires.net> Date: Sat, 24 Apr 2010 00:45:14 -0700 Message-ID: From: Garrett Cooper To: Bernhard Schmidt Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: Brandon Gooch , =?ISO-8859-1?Q?Olivier_Cochard=2DLabb=E9?= , freebsd-stable@freebsd.org Subject: Re: iwn firmware instability with an up-to-date stable kernel X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 24 Apr 2010 07:45:15 -0000 On Sat, Apr 24, 2010 at 12:34 AM, Bernhard Schmidt wrote: > On Fri, Apr 23, 2010 at 11:27:32PM -0700, Garrett Cooper wrote: >> On Fri, Apr 23, 2010 at 10:08 PM, Brandon Gooch >> wrote: >> > On Sat, Apr 24, 2010 at 4:59 AM, Garrett Cooper w= rote: >> >> On Fri, Apr 23, 2010 at 9:42 PM, Garrett Cooper = wrote: >> >>> On Fri, Apr 23, 2010 at 8:05 PM, Brandon Gooch >> >>> wrote: >> >>>> 2010/4/23 Garrett Cooper : >> >>>>> 2010/4/23 Garrett Cooper : >> >>>>>> 2010/4/18 Olivier Cochard-Labb=E9 : >> >>>>>>> 2010/4/18 Bernhard Schmidt : >> >>>>>>>> Are you able to reproduce this on demand? As in type a few comm= ands and >> >>>>>>>> the firmware error occurs? >> >>>>>>>> >> >>>>>>> >> >>>>>>> No, I'm not able to reproduce on demand this problem. >> >>>>>> >> >>>>>> I'm seeing similar issues on occasion with my Lenovo as well: >> >>>>>> >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: firmware error log: >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: error type =A0 =A0 =A0=3D >> >>>>>> "NMI_INTERRUPT_WDG" (0x00000004) >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: program counter =3D 0x00000= 46C >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: source line =A0 =A0 =3D 0x0= 00000D0 >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: error data =A0 =A0 =A0=3D 0= x0000000207030000 >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: branch link =A0 =A0 =3D 0x0= 0008370000004C2 >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: interrupt link =A0=3D 0x000= 006DA000018B8 >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: time =A0 =A0 =A0 =A0 =A0 = =A0=3D 4287402440 >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: driver status: >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring =A00: qid=3D0 =A0cu= r=3D1 =A0 queued=3D0 >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring =A01: qid=3D1 =A0cu= r=3D0 =A0 queued=3D0 >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring =A02: qid=3D2 =A0cu= r=3D0 =A0 queued=3D0 >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring =A03: qid=3D3 =A0cu= r=3D36 =A0queued=3D0 >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring =A04: qid=3D4 =A0cu= r=3D123 queued=3D0 >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring =A05: qid=3D5 =A0cu= r=3D0 =A0 queued=3D0 >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring =A06: qid=3D6 =A0cu= r=3D0 =A0 queued=3D0 >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring =A07: qid=3D7 =A0cu= r=3D0 =A0 queued=3D0 >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring =A08: qid=3D8 =A0cu= r=3D0 =A0 queued=3D0 >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring =A09: qid=3D9 =A0cu= r=3D0 =A0 queued=3D0 >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring 10: qid=3D10 cur=3D= 0 =A0 queued=3D0 >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring 11: qid=3D11 cur=3D= 0 =A0 queued=3D0 >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring 12: qid=3D12 cur=3D= 0 =A0 queued=3D0 >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring 13: qid=3D13 cur=3D= 0 =A0 queued=3D0 >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring 14: qid=3D14 cur=3D= 0 =A0 queued=3D0 >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring 15: qid=3D15 cur=3D= 0 =A0 queued=3D0 >> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: rx ring: cur=3D8 >> >>>>>> >> >>>>>> This may be because the system was under load (I was installing a= port >> >>>>>> shortly before the connection dropped). I'll try poking at this >> >>>>>> further because it's going to be an annoying productivity loss :/= . >> >>>>> >> >>>>> =A0 =A0Sorry... should have included more helpful details. >> >>>>> Thanks, >> >>>>> -Garrett >> >>>>> >> >>>>> dmesg: >> >>>>> >> >>>>> iwn0: mem 0xdf2fe000-0xdf2fffff ir= q 17 >> >>>>> at device 0.0 on pci3 >> >>>>> iwn0: MIMO 2T3R, MoW1, address 00:1d:e0:7d:9f:c7 >> >>>>> iwn0: [ITHREAD] >> >>>>> iwn0: 11a rates: 6Mbps 9Mbps 12Mbps 18Mbps 24Mbps 36Mbps 48Mbps 54= Mbps >> >>>>> iwn0: 11b rates: 1Mbps 2Mbps 5.5Mbps 11Mbps >> >>>>> iwn0: 11g rates: 1Mbps 2Mbps 5.5Mbps 11Mbps 6Mbps 9Mbps 12Mbps 18M= bps >> >>>>> 24Mbps 36Mbps 48Mbps 54Mbps >> >>>>> >> >>>>> pciconf -lv snippet: >> >>>>> >> >>>>> iwn0@pci0:3:0:0: =A0 =A0 =A0 =A0class=3D0x028000 card=3D0x11108086= chip=3D0x42308086 >> >>>>> rev=3D0x61 hdr=3D0x00 >> >>>>> =A0 =A0vendor =A0 =A0 =3D 'Intel Corporation' >> >>>>> =A0 =A0device =A0 =A0 =3D 'Intel Wireless WiFi Link 4965AGN (Intel= 4965AGN)' >> >>>>> =A0 =A0class =A0 =A0 =A0=3D network >> >>>>> cbb0@pci0:21:0:0: =A0 =A0 =A0 class=3D0x060700 card=3D0x20c617aa c= hip=3D0x04761180 >> >>>>> rev=3D0xba hdr=3D0x02 >> >>>>> >> >>>>> uname -a: >> >>>>> >> >>>>> $ uname -a >> >>>>> FreeBSD garrcoop-fbsd.cisco.com 8.0-STABLE FreeBSD 8.0-STABLE #0 >> >>>>> r207006: Wed Apr 21 13:18:44 PDT 2010 >> >>>>> root@garrcoop-fbsd.cisco.com:/usr/obj/usr/src/sys/LAPPY_X86 =A0i38= 6 >> >>>> >> >>>> I'm actually looking at this right now. For me, it's actually >> >>>> happening when my machine stays on overnight (or for long periods o= f >> >>>> time, idle). >> >>>> >> >>>> Also, it seems to be causing the kernel to panic, although I'm now >> >>>> wondering if the Machine Check Architecture is somehow catching thi= s >> >>>> device error and causing an exception (hw.mca.enabled=3D1)(?) -- no= t >> >>>> possible, right ??? >> >>>> >> >>>> Whatever the case, I can't seem to get the firmware error to occur >> >>>> with iwn(4) debugging or wlandebug options enabled, so who knows >> >>>> exactly what leads to this. >> >>>> >> >>>> I know Bernhard has worked hard on this driver, it's a shame that t= his >> >>>> freaky bug has bit us all now, without leaving many clues :( >> >>>> >> >>>> I've attached a textdump for posterity if nothing else :) >> >>> >> >>> =A0 =A0Connectivity appears to be shoddy in my neck of the woods (ki= nd of >> >>> ironic... but meh). Just running buildworld, buildkernel, then doing= a >> >>> tcpdump in parallel causes the pseudo device to go up and down a lot= . >> >>> I assume this isn't standard behavior? >> >>> =A0 =A0Just for reference buildworld was started shortly after 19:39= :05, >> >>> and it finished at 21:29. The interface has also gone up and down on= ce >> >>> since then while the system's been basically idle. >> >> >> >> =A0 =A0Hmmm... I'm seem to be in an excellent position to reproduce t= his >> >> issue. I've reproduced it twice by merely bringing the interface up >> >> and down several times using: >> >> >> >> ifconfig_wlan0=3D"WPA DHCP" >> >> >> >> =A0 =A0instead of my usual: >> >> >> >> ifconfig_wlan0=3D"WPA ssid DHCP" >> >> >> >> =A0 =A0Maybe others who are experiencing the issue should try that? I= 'll >> >> do more testing when I get home... > > How did you do that? Reloading the module, or with ifconfig? /etc/rc.d/netif restart , which does the ifconfig operations (no module change occurred AFAIK, but wlan0 did of course do some device_printf's when it was associating itself with iwn(4)). >> > >> > My rc.conf is: >> > >> > ifconfig_wlan0=3D"WPA DHCP" >> > >> > ...as well, although I haven't tried manually taking the interface >> > down and bringing it back up. >> >> Hmmm... that is interesting. I wish I could do that, but it seems to >> be alluding my grasp right now. The driver just kind of freaks out >> with a bunch of SSIDs, one being my target SSID, a bunch of NUL string >> ones, and then finally it just croaks. I need to figure out whether or >> not the SSIDs are valid when I boot it up at my desk again. >> >> > Are you waiting for the device to associate and begin passing traffic >> > before you each up/down cycle? >> >> I was, but I'm not sure whether or not the Ajax pieces in GMail were. >> I'll try some more rudimentary tests when I get back to work on Monday >> in that environment, but I need to try out other things at home as >> well in the meantime. Thanks, -Garrett