Date: Tue, 22 Oct 2013 18:56:51 +0200 From: claudiu vasadi <claudiu.vasadi@gmail.com> To: Adrian Chadd <adrian@freebsd.org> Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>, "freebsd-wireless@freebsd.org" <freebsd-wireless@freebsd.org> Subject: Re: 9.2-STABLE r255918 with GENERIC and iwn - core dump Message-ID: <CAM-i3ijHDVG5XQ0LjfWHT%2BcyECja7VQRR9wXV71wEOeHr4eVDw@mail.gmail.com> In-Reply-To: <CAJ-VmomY16hJSJgj5jAq9ywxCsY67AmmesPSSgsb1OekTGikww@mail.gmail.com> References: <CAM-i3iiem_3-tv90k0NWeJjocx77RhT%2BCeZPHZRHZS3_AsgZkQ@mail.gmail.com> <CAJ-VmomY16hJSJgj5jAq9ywxCsY67AmmesPSSgsb1OekTGikww@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
For when the time comes, I'm all in for any tests (if needed). On Tue, Oct 22, 2013 at 6:07 PM, Adrian Chadd <adrian@freebsd.org> wrote: > I know what's causing this! > > It's because when the management frame completes, there's a callback mbuf > tag (M_TXCB) that causes the driver to call the net80211 TX completion > callback. > > Now, because some drivers call the net80211 tx completion callback from > within their driver locks, it causes locking issues. So, someone (I don't > know or really care who) made it so whenever a TX completion occurs, the > net80211 code will schedule a callout to occur. This means the callout > occurs outside of the driver locks, solving that issue. > > This has a bunch of problems. > > * Firstly, if you have multiple management frames coming in, only the most > recent will be acknowledged. Tsk. There's only one callout, and it's per > vap. > * Secondly, no node reference is taken before scheduling the callout, so > if the node is destroyed (eg because the BSS is freed during a channel scan > or reset) and the callout still occurs, it'll dereference a bad node. This > is the crash cause. > * Thirdly, the cancellation occurs in the VAP state change path. It > doesn't know about the node(s) that just received TX completions. Since the > callback is per vap, there's no way to figure out which node needs > dereferencing.. so things blow up. > > The solution is just to undo this brain damaged solution and require that > drivers call the TX completion callback with no driver locks held. That's > on my TODO list but it'll take a little more time. Now that 10 has branched > I'll be happy to just flip that switch in -HEAD and deal with the locking > fallout. > > Thanks, > > > > -adrian > > > > On 22 October 2013 07:28, claudiu vasadi <claudiu.vasadi@gmail.com> wrote: > >> Hi everyone, >> >> I have a Lenovo Thinkpad T420s with Intel core i7 @ 2.70GHz, 8GB RAM, >> Intel >> SSD 160GB and iwn0: <Intel Centrino Ultimate-N 6300> mem >> 0xf4200000-0xf4201fff irq 17 at device 0.0 on pci3 >> >> Today, while connecting to different AP's, I noticed at one point that I >> was not getting an IP although the wifi card was associated. Within >> "wifimgr", I did a "Save and Reconnect" and then got a core dump. >> >> Bellow, the bt: >> >> >> GNU gdb 6.1.1 [FreeBSD] >> Copyright 2004 Free Software Foundation, Inc. >> GDB is free software, covered by the GNU General Public License, and you >> are >> welcome to change it and/or distribute copies of it under certain >> conditions. >> Type "show copying" to see the conditions. >> There is absolutely no warranty for GDB. Type "show warranty" for >> details. >> This GDB was configured as "amd64-marcel-freebsd"... >> >> Unread portion of the kernel message buffer: >> >> >> Fatal trap 12: page fault while in kernel mode >> cpuid = 0; apic id = 00 >> fault virtual address = 0xffffff801e5f7000 >> fault code = supervisor read data, page not present >> instruction pointer = 0x20:0xffffffff80a10431 >> stack pointer = 0x28:0xffffff8000276980 >> frame pointer = 0x28:0xffffff8000276a20 >> code segment = base 0x0, limit 0xfffff, type 0x1b >> = DPL 0, pres 1, long 1, def32 0, gran 1 >> processor eflags = interrupt enabled, resume, IOPL = 0 >> current process = 12 (swi4: clock) >> trap number = 12 >> panic: page fault >> cpuid = 0 >> KDB: stack backtrace: >> #0 0xffffffff80948a06 at kdb_backtrace+0x66 >> #1 0xffffffff8090e50e at panic+0x1ce >> #2 0xffffffff80cf3440 at trap_fatal+0x290 >> #3 0xffffffff80cf37a1 at trap_pfault+0x211 >> #4 0xffffffff80cf3d54 at trap+0x344 >> #5 0xffffffff80cdd093 at calltrap+0x8 >> #6 0xffffffff808dfddd at intr_event_execute_handlers+0xfd >> #7 0xffffffff808e15cd at ithread_loop+0x9d >> #8 0xffffffff808dc82f at fork_exit+0x11f >> #9 0xffffffff80cdd5be at fork_trampoline+0xe >> Uptime: 8h20m28s >> Dumping 952 out of 8106 MB:..2% (CTRL-C to abort) (CTRL-C to abort) >> (CTRL-C to abort) (CTRL-C to abort) ..11% (CTRL-C to abort) (CTRL-C to >> abort) ..21%..31% (CTRL-C to abort) (CTRL-C to abort) (CTRL-C to abort) >> (CTRL-C to abort) (CTRL-C to abort) ..41% (CTRL-C to abort) (CTRL-C to >> abort) (CTRL-C to abort) (CTRL-C to abort) (CTRL-C to abort) (CTRL-C >> to >> abort) (CTRL-C to abort) (CTRL-C to abort) ..51% (CTRL-C to abort) >> (CTRL-C to abort) ..61% (CTRL-C to abort) ..71% (CTRL-C to abort) >> ..81%..91% >> >> Reading symbols from /boot/kernel/zfs.ko...Reading symbols from >> /boot-mount/boot/kernel/zfs.ko.symbols...done. >> done. >> Loaded symbols for /boot/kernel/zfs.ko >> Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from >> /boot-mount/boot/kernel/opensolaris.ko.symbols...done. >> done. >> Loaded symbols for /boot/kernel/opensolaris.ko >> Reading symbols from /boot/kernel/geom_eli.ko...Reading symbols from >> /boot-mount/boot/kernel/geom_eli.ko.symbols...done. >> done. >> Loaded symbols for /boot/kernel/geom_eli.ko >> Reading symbols from /boot/kernel/crypto.ko...Reading symbols from >> /boot-mount/boot/kernel/crypto.ko.symbols...done. >> done. >> Loaded symbols for /boot/kernel/crypto.ko >> Reading symbols from /boot/kernel/linux.ko...Reading symbols from >> /boot-mount/boot/kernel/linux.ko.symbols...done. >> done. >> Loaded symbols for /boot/kernel/linux.ko >> Reading symbols from /boot/kernel/drm.ko...Reading symbols from >> /boot-mount/boot/kernel/drm.ko.symbols...done. >> done. >> Loaded symbols for /boot/kernel/drm.ko >> Reading symbols from /boot/modules/nvidia.ko...done. >> Loaded symbols for /boot/modules/nvidia.ko >> Reading symbols from /boot/kernel/mmc.ko...Reading symbols from >> /boot-mount/boot/kernel/mmc.ko.symbols...done. >> done. >> Loaded symbols for /boot/kernel/mmc.ko >> Reading symbols from /boot/kernel/mmcsd.ko...Reading symbols from >> /boot-mount/boot/kernel/mmcsd.ko.symbols...done. >> done. >> Loaded symbols for /boot/kernel/mmcsd.ko >> Reading symbols from /boot/kernel/acpi_call.ko...done. >> Loaded symbols for /boot/kernel/acpi_call.ko >> Reading symbols from /boot/kernel/umodem.ko...Reading symbols from >> /boot-mount/boot/kernel/umodem.ko.symbols...done. >> done. >> Loaded symbols for /boot/kernel/umodem.ko >> Reading symbols from /boot/modules/vboxnetflt.ko...done. >> Loaded symbols for /boot/modules/vboxnetflt.ko >> Reading symbols from /boot/modules/vboxdrv.ko...done. >> Loaded symbols for /boot/modules/vboxdrv.ko >> Reading symbols from /boot/kernel/netgraph.ko...Reading symbols from >> /boot-mount/boot/kernel/netgraph.ko.symbols...done. >> done. >> Loaded symbols for /boot/kernel/netgraph.ko >> Reading symbols from /boot/kernel/ng_ether.ko...Reading symbols from >> /boot-mount/boot/kernel/ng_ether.ko.symbols...done. >> done. >> Loaded symbols for /boot/kernel/ng_ether.ko >> Reading symbols from /boot/modules/vboxnetadp.ko...done. >> Loaded symbols for /boot/modules/vboxnetadp.ko >> #0 doadump (textdump=<value optimized out>) at pcpu.h:234 >> 234 pcpu.h: No such file or directory. >> in pcpu.h >> (kgdb) bt >> #0 doadump (textdump=<value optimized out>) at pcpu.h:234 >> #1 0xffffffff8090dfe6 in kern_reboot (howto=260) at >> /usr/src/sys/kern/kern_shutdown.c:449 >> #2 0xffffffff8090e4e7 in panic (fmt=0x1 <Address 0x1 out of bounds>) at >> /usr/src/sys/kern/kern_shutdown.c:637 >> #3 0xffffffff80cf3440 in trap_fatal (frame=0xc, eva=<value optimized >> out>) >> at /usr/src/sys/amd64/amd64/trap.c:879 >> #4 0xffffffff80cf37a1 in trap_pfault (frame=0xffffff80002768d0, >> usermode=0) at /usr/src/sys/amd64/amd64/trap.c:795 >> #5 0xffffffff80cf3d54 in trap (frame=0xffffff80002768d0) at >> /usr/src/sys/amd64/amd64/trap.c:463 >> #6 0xffffffff80cdd093 in calltrap () at >> /usr/src/sys/amd64/amd64/exception.S:232 >> #7 0xffffffff80a10431 in ieee80211_tx_mgt_timeout >> (arg=0xffffff801e5f7000) >> at /usr/src/sys/net80211/ieee80211_output.c:2487 >> #8 0xffffffff809246e8 in softclock (arg=<value optimized out>) at >> /usr/src/sys/kern/kern_timeout.c:518 >> #9 0xffffffff808dfddd in intr_event_execute_handlers (p=<value optimized >> out>, ie=0xfffffe0007221b00) >> at /usr/src/sys/kern/kern_intr.c:1272 >> #10 0xffffffff808e15cd in ithread_loop (arg=0xfffffe0007209460) at >> /usr/src/sys/kern/kern_intr.c:1285 >> #11 0xffffffff808dc82f in fork_exit (callout=0xffffffff808e1530 >> <ithread_loop>, arg=0xfffffe0007209460, >> frame=0xffffff8000276b00) at /usr/src/sys/kern/kern_fork.c:990 >> #12 0xffffffff80cdd5be in fork_trampoline () at >> /usr/src/sys/amd64/amd64/exception.S:606 >> #13 0x0000000000000000 in ?? () >> >> >> One thing to keep in mind is that since I started using geli+ZFS >> (installed >> with PC-BSD 9.1 cd), I always got "Cannot reset interface wlan0 - exit >> status 1" with "wifimgr" whichever action i did (ex: reconect, rescan, >> up/down,etc). >> >> >> I would appreciate some help in debugging this. >> >> >> -- >> Best regards, >> Claudiu Vasadi >> _______________________________________________ >> freebsd-hackers@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers >> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org >> " >> > > -- Best regards, Claudiu Vasadi
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAM-i3ijHDVG5XQ0LjfWHT%2BcyECja7VQRR9wXV71wEOeHr4eVDw>