Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 24 Apr 2010 00:45:14 -0700
From:      Garrett Cooper <yanefbsd@gmail.com>
To:        Bernhard Schmidt <bschmidt@techwires.net>
Cc:        Brandon Gooch <jamesbrandongooch@gmail.com>, =?ISO-8859-1?Q?Olivier_Cochard=2DLabb=E9?= <olivier@cochard.me>, freebsd-stable@freebsd.org
Subject:   Re: iwn firmware instability with an up-to-date stable kernel
Message-ID:  <g2t7d6fde3d1004240045kde5f4bbds62be7920d94e57d@mail.gmail.com>
In-Reply-To: <20100424073430.GB62910@mx.techwires.net>
References:  <w2s3131aa531004171849i12348bdbt12dfbb18c1f71bc2@mail.gmail.com> <r2q3131aa531004180456w49eea301t526d305c8e7a980a@mail.gmail.com> <v2w7d6fde3d1004231929yb5b54ac6rc3a90276014176b0@mail.gmail.com> <r2u7d6fde3d1004231932rea2235a1r2dcd8f1973fcb1b4@mail.gmail.com> <s2m179b97fb1004232005geee6fbb6q9776a119f5b477db@mail.gmail.com> <u2v7d6fde3d1004232142yb2037851ne906f38ccc8039e9@mail.gmail.com> <y2p7d6fde3d1004232159hcc0c02bcpb6ee0910d76672a7@mail.gmail.com> <i2h179b97fb1004232208za4b1dbbekef363ec0fa522e0@mail.gmail.com> <l2z7d6fde3d1004232327heb8a744alfa02b81f199876ff@mail.gmail.com> <20100424073430.GB62910@mx.techwires.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Apr 24, 2010 at 12:34 AM, Bernhard Schmidt
<bschmidt@techwires.net> wrote:
> On Fri, Apr 23, 2010 at 11:27:32PM -0700, Garrett Cooper wrote:
>> On Fri, Apr 23, 2010 at 10:08 PM, Brandon Gooch
>> <jamesbrandongooch@gmail.com> wrote:
>> > On Sat, Apr 24, 2010 at 4:59 AM, Garrett Cooper <yanefbsd@gmail.com> w=
rote:
>> >> On Fri, Apr 23, 2010 at 9:42 PM, Garrett Cooper <yanefbsd@gmail.com> =
wrote:
>> >>> On Fri, Apr 23, 2010 at 8:05 PM, Brandon Gooch
>> >>> <jamesbrandongooch@gmail.com> wrote:
>> >>>> 2010/4/23 Garrett Cooper <yanefbsd@gmail.com>:
>> >>>>> 2010/4/23 Garrett Cooper <yanefbsd@gmail.com>:
>> >>>>>> 2010/4/18 Olivier Cochard-Labb=E9 <olivier@cochard.me>:
>> >>>>>>> 2010/4/18 Bernhard Schmidt <bschmidt@techwires.net>:
>> >>>>>>>> Are you able to reproduce this on demand? As in type a few comm=
ands and
>> >>>>>>>> the firmware error occurs?
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>> No, I'm not able to reproduce on demand this problem.
>> >>>>>>
>> >>>>>> I'm seeing similar issues on occasion with my Lenovo as well:
>> >>>>>>
>> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: firmware error log:
>> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: error type =A0 =A0 =A0=3D
>> >>>>>> "NMI_INTERRUPT_WDG" (0x00000004)
>> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: program counter =3D 0x00000=
46C
>> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: source line =A0 =A0 =3D 0x0=
00000D0
>> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: error data =A0 =A0 =A0=3D 0=
x0000000207030000
>> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: branch link =A0 =A0 =3D 0x0=
0008370000004C2
>> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: interrupt link =A0=3D 0x000=
006DA000018B8
>> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: time =A0 =A0 =A0 =A0 =A0 =
=A0=3D 4287402440
>> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: driver status:
>> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring =A00: qid=3D0 =A0cu=
r=3D1 =A0 queued=3D0
>> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring =A01: qid=3D1 =A0cu=
r=3D0 =A0 queued=3D0
>> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring =A02: qid=3D2 =A0cu=
r=3D0 =A0 queued=3D0
>> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring =A03: qid=3D3 =A0cu=
r=3D36 =A0queued=3D0
>> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring =A04: qid=3D4 =A0cu=
r=3D123 queued=3D0
>> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring =A05: qid=3D5 =A0cu=
r=3D0 =A0 queued=3D0
>> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring =A06: qid=3D6 =A0cu=
r=3D0 =A0 queued=3D0
>> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring =A07: qid=3D7 =A0cu=
r=3D0 =A0 queued=3D0
>> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring =A08: qid=3D8 =A0cu=
r=3D0 =A0 queued=3D0
>> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring =A09: qid=3D9 =A0cu=
r=3D0 =A0 queued=3D0
>> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring 10: qid=3D10 cur=3D=
0 =A0 queued=3D0
>> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring 11: qid=3D11 cur=3D=
0 =A0 queued=3D0
>> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring 12: qid=3D12 cur=3D=
0 =A0 queued=3D0
>> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring 13: qid=3D13 cur=3D=
0 =A0 queued=3D0
>> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring 14: qid=3D14 cur=3D=
0 =A0 queued=3D0
>> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: tx ring 15: qid=3D15 cur=3D=
0 =A0 queued=3D0
>> >>>>>> Apr 23 19:25:24 garrcoop-fbsd kernel: rx ring: cur=3D8
>> >>>>>>
>> >>>>>> This may be because the system was under load (I was installing a=
 port
>> >>>>>> shortly before the connection dropped). I'll try poking at this
>> >>>>>> further because it's going to be an annoying productivity loss :/=
.
>> >>>>>
>> >>>>> =A0 =A0Sorry... should have included more helpful details.
>> >>>>> Thanks,
>> >>>>> -Garrett
>> >>>>>
>> >>>>> dmesg:
>> >>>>>
>> >>>>> iwn0: <Intel(R) PRO/Wireless 4965BGN> mem 0xdf2fe000-0xdf2fffff ir=
q 17
>> >>>>> at device 0.0 on pci3
>> >>>>> iwn0: MIMO 2T3R, MoW1, address 00:1d:e0:7d:9f:c7
>> >>>>> iwn0: [ITHREAD]
>> >>>>> iwn0: 11a rates: 6Mbps 9Mbps 12Mbps 18Mbps 24Mbps 36Mbps 48Mbps 54=
Mbps
>> >>>>> iwn0: 11b rates: 1Mbps 2Mbps 5.5Mbps 11Mbps
>> >>>>> iwn0: 11g rates: 1Mbps 2Mbps 5.5Mbps 11Mbps 6Mbps 9Mbps 12Mbps 18M=
bps
>> >>>>> 24Mbps 36Mbps 48Mbps 54Mbps
>> >>>>>
>> >>>>> pciconf -lv snippet:
>> >>>>>
>> >>>>> iwn0@pci0:3:0:0: =A0 =A0 =A0 =A0class=3D0x028000 card=3D0x11108086=
 chip=3D0x42308086
>> >>>>> rev=3D0x61 hdr=3D0x00
>> >>>>> =A0 =A0vendor =A0 =A0 =3D 'Intel Corporation'
>> >>>>> =A0 =A0device =A0 =A0 =3D 'Intel Wireless WiFi Link 4965AGN (Intel=
 4965AGN)'
>> >>>>> =A0 =A0class =A0 =A0 =A0=3D network
>> >>>>> cbb0@pci0:21:0:0: =A0 =A0 =A0 class=3D0x060700 card=3D0x20c617aa c=
hip=3D0x04761180
>> >>>>> rev=3D0xba hdr=3D0x02
>> >>>>>
>> >>>>> uname -a:
>> >>>>>
>> >>>>> $ uname -a
>> >>>>> FreeBSD garrcoop-fbsd.cisco.com 8.0-STABLE FreeBSD 8.0-STABLE #0
>> >>>>> r207006: Wed Apr 21 13:18:44 PDT 2010
>> >>>>> root@garrcoop-fbsd.cisco.com:/usr/obj/usr/src/sys/LAPPY_X86 =A0i38=
6
>> >>>>
>> >>>> I'm actually looking at this right now. For me, it's actually
>> >>>> happening when my machine stays on overnight (or for long periods o=
f
>> >>>> time, idle).
>> >>>>
>> >>>> Also, it seems to be causing the kernel to panic, although I'm now
>> >>>> wondering if the Machine Check Architecture is somehow catching thi=
s
>> >>>> device error and causing an exception (hw.mca.enabled=3D1)(?) -- no=
t
>> >>>> possible, right ???
>> >>>>
>> >>>> Whatever the case, I can't seem to get the firmware error to occur
>> >>>> with iwn(4) debugging or wlandebug options enabled, so who knows
>> >>>> exactly what leads to this.
>> >>>>
>> >>>> I know Bernhard has worked hard on this driver, it's a shame that t=
his
>> >>>> freaky bug has bit us all now, without leaving many clues :(
>> >>>>
>> >>>> I've attached a textdump for posterity if nothing else :)
>> >>>
>> >>> =A0 =A0Connectivity appears to be shoddy in my neck of the woods (ki=
nd of
>> >>> ironic... but meh). Just running buildworld, buildkernel, then doing=
 a
>> >>> tcpdump in parallel causes the pseudo device to go up and down a lot=
.
>> >>> I assume this isn't standard behavior?
>> >>> =A0 =A0Just for reference buildworld was started shortly after 19:39=
:05,
>> >>> and it finished at 21:29. The interface has also gone up and down on=
ce
>> >>> since then while the system's been basically idle.
>> >>
>> >> =A0 =A0Hmmm... I'm seem to be in an excellent position to reproduce t=
his
>> >> issue. I've reproduced it twice by merely bringing the interface up
>> >> and down several times using:
>> >>
>> >> ifconfig_wlan0=3D"WPA DHCP"
>> >>
>> >> =A0 =A0instead of my usual:
>> >>
>> >> ifconfig_wlan0=3D"WPA ssid <base-station-id1> DHCP"
>> >>
>> >> =A0 =A0Maybe others who are experiencing the issue should try that? I=
'll
>> >> do more testing when I get home...
>
> How did you do that? Reloading the module, or with ifconfig?

/etc/rc.d/netif restart , which does the ifconfig operations (no
module change occurred AFAIK, but wlan0 did of course do some
device_printf's when it was associating itself with iwn(4)).

>> >
>> > My rc.conf is:
>> >
>> > ifconfig_wlan0=3D"WPA DHCP"
>> >
>> > ...as well, although I haven't tried manually taking the interface
>> > down and bringing it back up.
>>
>> Hmmm... that is interesting. I wish I could do that, but it seems to
>> be alluding my grasp right now. The driver just kind of freaks out
>> with a bunch of SSIDs, one being my target SSID, a bunch of NUL string
>> ones, and then finally it just croaks. I need to figure out whether or
>> not the SSIDs are valid when I boot it up at my desk again.
>>
>> > Are you waiting for the device to associate and begin passing traffic
>> > before you each up/down cycle?
>>
>> I was, but I'm not sure whether or not the Ajax pieces in GMail were.
>> I'll try some more rudimentary tests when I get back to work on Monday
>> in that environment, but I need to try out other things at home as
>> well in the meantime.

Thanks,
-Garrett



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?g2t7d6fde3d1004240045kde5f4bbds62be7920d94e57d>