From owner-freebsd-hackers@FreeBSD.ORG Tue Oct 22 16:07:58 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 76718E0E; Tue, 22 Oct 2013 16:07:58 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: from mail-qe0-x22a.google.com (mail-qe0-x22a.google.com [IPv6:2607:f8b0:400d:c02::22a]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 2381E24EA; Tue, 22 Oct 2013 16:07:58 +0000 (UTC) Received: by mail-qe0-f42.google.com with SMTP id gc15so4948904qeb.15 for ; Tue, 22 Oct 2013 09:07:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=TZHOiAE8hKZ6ZUK7K6mKLCodxLzm/YBOJu8ZiPZ+REM=; b=cpPPyU0CxWOx+NbkzCmgnMHC6lvIFqg69NdVOs9VxsZxvJ9eZVrD+XFQG05KXtSCII VD3c15osOXxJSifaa3Gi9RklKnijHvLfhyCD8l3/MnAnTQ0c3T6r+6EYXDhX/NX3Oc92 ogcYukQs547mQ1hImmWIMCqtk2eaJl0weLwgh2Fdb+uGRVWYTVpSQHFfQtQBTt7s4Voa GNFq2PclZqtjSbwAPb+QnkX8jwzy68uuT1DPyniPV57qiY+x4xgkTErXtGM3oCgQRVaV k18zcUBlfeqJ++Wgpo76tScAWQ9BGqHjK2hiUhihvln62uIvdjSoL2G1NxlF8aMbgS3g CPlQ== MIME-Version: 1.0 X-Received: by 10.49.127.179 with SMTP id nh19mr29993888qeb.1.1382458077212; Tue, 22 Oct 2013 09:07:57 -0700 (PDT) Sender: adrian.chadd@gmail.com Received: by 10.224.207.66 with HTTP; Tue, 22 Oct 2013 09:07:57 -0700 (PDT) In-Reply-To: References: Date: Tue, 22 Oct 2013 09:07:57 -0700 X-Google-Sender-Auth: zSqfefMoreg7Ukj_65a3ocClEv4 Message-ID: Subject: Re: 9.2-STABLE r255918 with GENERIC and iwn - core dump From: Adrian Chadd To: claudiu vasadi , "freebsd-wireless@freebsd.org" Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: "freebsd-hackers@freebsd.org" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Oct 2013 16:07:58 -0000 I know what's causing this! It's because when the management frame completes, there's a callback mbuf tag (M_TXCB) that causes the driver to call the net80211 TX completion callback. Now, because some drivers call the net80211 tx completion callback from within their driver locks, it causes locking issues. So, someone (I don't know or really care who) made it so whenever a TX completion occurs, the net80211 code will schedule a callout to occur. This means the callout occurs outside of the driver locks, solving that issue. This has a bunch of problems. * Firstly, if you have multiple management frames coming in, only the most recent will be acknowledged. Tsk. There's only one callout, and it's per vap. * Secondly, no node reference is taken before scheduling the callout, so if the node is destroyed (eg because the BSS is freed during a channel scan or reset) and the callout still occurs, it'll dereference a bad node. This is the crash cause. * Thirdly, the cancellation occurs in the VAP state change path. It doesn't know about the node(s) that just received TX completions. Since the callback is per vap, there's no way to figure out which node needs dereferencing.. so things blow up. The solution is just to undo this brain damaged solution and require that drivers call the TX completion callback with no driver locks held. That's on my TODO list but it'll take a little more time. Now that 10 has branched I'll be happy to just flip that switch in -HEAD and deal with the locking fallout. Thanks, -adrian On 22 October 2013 07:28, claudiu vasadi wrote: > Hi everyone, > > I have a Lenovo Thinkpad T420s with Intel core i7 @ 2.70GHz, 8GB RAM, Intel > SSD 160GB and iwn0: mem > 0xf4200000-0xf4201fff irq 17 at device 0.0 on pci3 > > Today, while connecting to different AP's, I noticed at one point that I > was not getting an IP although the wifi card was associated. Within > "wifimgr", I did a "Save and Reconnect" and then got a core dump. > > Bellow, the bt: > > > GNU gdb 6.1.1 [FreeBSD] > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you > are > welcome to change it and/or distribute copies of it under certain > conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "amd64-marcel-freebsd"... > > Unread portion of the kernel message buffer: > > > Fatal trap 12: page fault while in kernel mode > cpuid = 0; apic id = 00 > fault virtual address = 0xffffff801e5f7000 > fault code = supervisor read data, page not present > instruction pointer = 0x20:0xffffffff80a10431 > stack pointer = 0x28:0xffffff8000276980 > frame pointer = 0x28:0xffffff8000276a20 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 12 (swi4: clock) > trap number = 12 > panic: page fault > cpuid = 0 > KDB: stack backtrace: > #0 0xffffffff80948a06 at kdb_backtrace+0x66 > #1 0xffffffff8090e50e at panic+0x1ce > #2 0xffffffff80cf3440 at trap_fatal+0x290 > #3 0xffffffff80cf37a1 at trap_pfault+0x211 > #4 0xffffffff80cf3d54 at trap+0x344 > #5 0xffffffff80cdd093 at calltrap+0x8 > #6 0xffffffff808dfddd at intr_event_execute_handlers+0xfd > #7 0xffffffff808e15cd at ithread_loop+0x9d > #8 0xffffffff808dc82f at fork_exit+0x11f > #9 0xffffffff80cdd5be at fork_trampoline+0xe > Uptime: 8h20m28s > Dumping 952 out of 8106 MB:..2% (CTRL-C to abort) (CTRL-C to abort) > (CTRL-C to abort) (CTRL-C to abort) ..11% (CTRL-C to abort) (CTRL-C to > abort) ..21%..31% (CTRL-C to abort) (CTRL-C to abort) (CTRL-C to abort) > (CTRL-C to abort) (CTRL-C to abort) ..41% (CTRL-C to abort) (CTRL-C to > abort) (CTRL-C to abort) (CTRL-C to abort) (CTRL-C to abort) (CTRL-C to > abort) (CTRL-C to abort) (CTRL-C to abort) ..51% (CTRL-C to abort) > (CTRL-C to abort) ..61% (CTRL-C to abort) ..71% (CTRL-C to abort) > ..81%..91% > > Reading symbols from /boot/kernel/zfs.ko...Reading symbols from > /boot-mount/boot/kernel/zfs.ko.symbols...done. > done. > Loaded symbols for /boot/kernel/zfs.ko > Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from > /boot-mount/boot/kernel/opensolaris.ko.symbols...done. > done. > Loaded symbols for /boot/kernel/opensolaris.ko > Reading symbols from /boot/kernel/geom_eli.ko...Reading symbols from > /boot-mount/boot/kernel/geom_eli.ko.symbols...done. > done. > Loaded symbols for /boot/kernel/geom_eli.ko > Reading symbols from /boot/kernel/crypto.ko...Reading symbols from > /boot-mount/boot/kernel/crypto.ko.symbols...done. > done. > Loaded symbols for /boot/kernel/crypto.ko > Reading symbols from /boot/kernel/linux.ko...Reading symbols from > /boot-mount/boot/kernel/linux.ko.symbols...done. > done. > Loaded symbols for /boot/kernel/linux.ko > Reading symbols from /boot/kernel/drm.ko...Reading symbols from > /boot-mount/boot/kernel/drm.ko.symbols...done. > done. > Loaded symbols for /boot/kernel/drm.ko > Reading symbols from /boot/modules/nvidia.ko...done. > Loaded symbols for /boot/modules/nvidia.ko > Reading symbols from /boot/kernel/mmc.ko...Reading symbols from > /boot-mount/boot/kernel/mmc.ko.symbols...done. > done. > Loaded symbols for /boot/kernel/mmc.ko > Reading symbols from /boot/kernel/mmcsd.ko...Reading symbols from > /boot-mount/boot/kernel/mmcsd.ko.symbols...done. > done. > Loaded symbols for /boot/kernel/mmcsd.ko > Reading symbols from /boot/kernel/acpi_call.ko...done. > Loaded symbols for /boot/kernel/acpi_call.ko > Reading symbols from /boot/kernel/umodem.ko...Reading symbols from > /boot-mount/boot/kernel/umodem.ko.symbols...done. > done. > Loaded symbols for /boot/kernel/umodem.ko > Reading symbols from /boot/modules/vboxnetflt.ko...done. > Loaded symbols for /boot/modules/vboxnetflt.ko > Reading symbols from /boot/modules/vboxdrv.ko...done. > Loaded symbols for /boot/modules/vboxdrv.ko > Reading symbols from /boot/kernel/netgraph.ko...Reading symbols from > /boot-mount/boot/kernel/netgraph.ko.symbols...done. > done. > Loaded symbols for /boot/kernel/netgraph.ko > Reading symbols from /boot/kernel/ng_ether.ko...Reading symbols from > /boot-mount/boot/kernel/ng_ether.ko.symbols...done. > done. > Loaded symbols for /boot/kernel/ng_ether.ko > Reading symbols from /boot/modules/vboxnetadp.ko...done. > Loaded symbols for /boot/modules/vboxnetadp.ko > #0 doadump (textdump=) at pcpu.h:234 > 234 pcpu.h: No such file or directory. > in pcpu.h > (kgdb) bt > #0 doadump (textdump=) at pcpu.h:234 > #1 0xffffffff8090dfe6 in kern_reboot (howto=260) at > /usr/src/sys/kern/kern_shutdown.c:449 > #2 0xffffffff8090e4e7 in panic (fmt=0x1
) at > /usr/src/sys/kern/kern_shutdown.c:637 > #3 0xffffffff80cf3440 in trap_fatal (frame=0xc, eva=) > at /usr/src/sys/amd64/amd64/trap.c:879 > #4 0xffffffff80cf37a1 in trap_pfault (frame=0xffffff80002768d0, > usermode=0) at /usr/src/sys/amd64/amd64/trap.c:795 > #5 0xffffffff80cf3d54 in trap (frame=0xffffff80002768d0) at > /usr/src/sys/amd64/amd64/trap.c:463 > #6 0xffffffff80cdd093 in calltrap () at > /usr/src/sys/amd64/amd64/exception.S:232 > #7 0xffffffff80a10431 in ieee80211_tx_mgt_timeout (arg=0xffffff801e5f7000) > at /usr/src/sys/net80211/ieee80211_output.c:2487 > #8 0xffffffff809246e8 in softclock (arg=) at > /usr/src/sys/kern/kern_timeout.c:518 > #9 0xffffffff808dfddd in intr_event_execute_handlers (p= out>, ie=0xfffffe0007221b00) > at /usr/src/sys/kern/kern_intr.c:1272 > #10 0xffffffff808e15cd in ithread_loop (arg=0xfffffe0007209460) at > /usr/src/sys/kern/kern_intr.c:1285 > #11 0xffffffff808dc82f in fork_exit (callout=0xffffffff808e1530 > , arg=0xfffffe0007209460, > frame=0xffffff8000276b00) at /usr/src/sys/kern/kern_fork.c:990 > #12 0xffffffff80cdd5be in fork_trampoline () at > /usr/src/sys/amd64/amd64/exception.S:606 > #13 0x0000000000000000 in ?? () > > > One thing to keep in mind is that since I started using geli+ZFS (installed > with PC-BSD 9.1 cd), I always got "Cannot reset interface wlan0 - exit > status 1" with "wifimgr" whichever action i did (ex: reconect, rescan, > up/down,etc). > > > I would appreciate some help in debugging this. > > > -- > Best regards, > Claudiu Vasadi > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" >