Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 02 Jan 2012 14:09:56 +1100
From:      Lawrence Stewart <lstewart@freebsd.org>
To:        Adrian Chadd <adrian@FreeBSD.org>
Cc:        svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org, rwatson@freebsd.org
Subject:   Re: svn commit: r228986 - in head: share/man/man4 sys/net
Message-ID:  <4F012004.8010508@freebsd.org>
In-Reply-To: <4EFEB38F.8010709@freebsd.org>
References:  <201112300857.pBU8vxfP004914@svn.freebsd.org> <CAJ-Vmo=mj1QPYRWYhQe1XfTh9oZKjrJJ0LnQjy=QsUsPODK6mA@mail.gmail.com> <4EFEB38F.8010709@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 12/31/11 18:02, Lawrence Stewart wrote:
> On 12/31/11 11:13, Adrian Chadd wrote:
>> This just broke wlan. Please consider fixing this patch :)
>>
>>
>> Adrian
>>
>> *** Interface: wlan0: start
>> can't re-use a leaf (wlan0)!
>> panic: bpfattach tscfgoid
>> KDB: enter: panic
>> [ thread pid 166 tid 100048 ]Stopped at kdb_enter+0x4c: lui at,0x8048
>> db>
>> db> bt
>> Tracing pid 166 tid 100048 td 0x80c37900
>> db_trace_thread+30 (?,?,?,?) ra 80074cc0 sp c40075f0 sz 24
>> 80074bac+114 (0,?,ffffffff,?) ra 8007427c sp c4007608 sz 32
>> 80073ef4+388 (?,?,?,?) ra 80074400 sp c4007628 sz 168
>> db_command_loop+70 (?,?,?,?) ra 80076ac4 sp c40076d0 sz 24
>> 800769d0+f4 (?,?,?,?) ra 801b7560 sp c40076e8 sz 424
>> kdb_trap+110 (?,?,?,?) ra 8035c7ec sp c4007890 sz 48
>> trap+bf4 (?,?,?,?) ra 80354560 sp c40078c0 sz 184
>> MipsKernGenException+134 (0,4,803c089c,113) ra 801b72d8 sp c4007978 sz
>> 200
>> kdb_enter+4c (?,?,?,?) ra 8017f88c sp c4007a40 sz 24
>> panic+11c (?,0,80706500,0) ra 802304b8 sp c4007a58 sz 40
>> bpfattach2+cc (?,?,?,?) ra 802305e0 sp c4007a80 sz 64
>> bpfattach+10 (?,?,?,?) ra 802448ec sp c4007ac0 sz 24
>> ether_ifattach+d0 (?,?,?,?) ra 8025f0ac sp c4007ad8 sz 32
>> ieee80211_vap_attach+104 (?,?,?,?) ra 8007ffe8 sp c4007af8 sz 96
>> 8007f960+688 (?,?,0,?) ra 8026a39c sp c4007b58 sz 88
>> 8026a22c+170 (?,?,?,?) ra 802430dc sp c4007bb0 sz 88
>> ifc_simple_create+80 (?,?,?,?) ra 80242b04 sp c4007c08 sz 64
>> 80242ab0+54 (?,?,?,?) ra 80242d78 sp c4007c48 sz 40
>> if_clone_create+a8 (?,?,?,?) ra 8023bdd4 sp c4007c70 sz 40
>> ifioctl+3a4 (?,?,80c7f460,80c37900) ra 801d8070 sp c4007c98 sz 144
>> soo_ioctl+3b0 (?,?,?,?) ra 801d2804 sp c4007d28 sz 40
>> kern_ioctl+248 (?,?,?,?) ra 801d29ac sp c4007d50 sz 64
>> sys_ioctl+130 (?,?,?,?) ra 8035c3ec sp c4007d90 sz 56
>> trap+7f4 (?,?,?,?) ra 8035475c sp c4007dc8 sz 184
>> MipsUserGenException+10c (?,?,?,4061d590) ra 0 sp c4007e80 sz 0
>> pid 166
>> db> reset
>
> I've managed to reproduce this on my wifi enabled laptop with r228986
> MFCed to a 9-stable kernel. I can't reproduce the panic with regular
> interfaces, pseudo interfaces (e.g. lo0 and pflog0) or vlans (which also
> have a relationship with an underlying physical interface) i.e. this is
> VAP/net80211 specific as far as I can tell.
>
> The problem is that bpfattach2() is being called twice with the same
> interface name i.e. "wlan0".
>
> I added a debug printf and call to kdb_backtrace() after the
> SYSCTL_ADD_PROC() call in bpfattach2() to see the code paths which are
> calling into the function. Here's what I see when the kernel runs on my
> laptop (I've added inline comments between "## ##"):
>
> Added tscfg OID for interface wlan0
> KDB: stack backtrace:
> #0 0xffffffff808680ce at kdb_backtrace+0x5e
> #1 0xffffffff808da5b4 at bpfattach2+0xb4
> #2 0xffffffff808fbe06 at ieee80211_vap_setup+0x266
> #3 0xffffffff80740205 at wpi_vap_create+0x95
> #4 0xffffffff809047fb at wlan_clone_create+0x16b
> #5 0xffffffff808e6079 at ifc_simple_create+0x89
> #6 0xffffffff808e5cc5 at if_clone_createif+0x65
> #7 0xffffffff808e4546 at ifioctl+0x306
> #8 0xffffffff80879755 at kern_ioctl+0x115
> #9 0xffffffff8087998d at sys_ioctl+0xfd
> #10 0xffffffff80b17d60 at amd64_syscall+0x450
> #11 0xffffffff80b03497 at Xfast_syscall+0xf7
>
> ## Here it has successfully added the net.bpf.tscfg.wlan0 sysctl entry
> for wlan0 ##
>
> can't re-use a leaf (wlan0)!
>
> ## Here SYSCTL_ADD_PROC() failed because leaf name is already used ##
>
> panic: bpfattach tscfgoid
> cpuid = 1
> KDB: stack backtrace:
> #0 0xffffffff808680ce at kdb_backtrace+0x5e
> #1 0xffffffff80832c87 at panic+0x187
> #2 0xffffffff808da6f6 at bpfattach2+0x1f6
> #3 0xffffffff808e72ee at ether_ifattach+0xae
> #4 0xffffffff808fd0a5 at ieee80211_vap_attach+0xb5
> #5 0xffffffff8074023c at wpi_vap_create+0xcc
> #6 0xffffffff809047fb at wlan_clone_create+0x16b
> #7 0xffffffff808e6079 at ifc_simple_create+0x89
> #8 0xffffffff808e5cc5 at if_clone_createif+0x65
> #9 0xffffffff808e4546 at ifioctl+0x306
> #10 0xffffffff80879755 at kern_ioctl+0x115
> #11 0xffffffff8087998d at sys_ioctl+0xfd
> #12 0xffffffff80b17d60 at amd64_syscall+0x450
> #13 0xffffffff80b03497 at Xfast_syscall+0xf7
>
>
>
> So after a bit of digging, ieee80211_vap_setup() calls
> ieee80211_radiotap_vattach(), which explicitly calls bpfattach2(), and
> then a subsequent call to ieee80211_vap_attach() indirectly calls into
> bpfattach2() via the call to ether_ifattach().
>
> This smells like a net80211 bug to me. I'm guessing, but I would suspect
> ieee80211_vap_setup() shouldn't call ieee80211_radiotap_vattach() and
> should let ieee80211_vap_attach() handle the BPF attachment via the call
> to ether_ifattach().
>
> Thoughts?

This turns out not to be a bug in net80211, but rather an oversight on 
my part. I didn't consider the case where multiple DLTs can be attached 
to BPF with the same interface name.

I reverted r228986 and have reworked the patch to instead store the 
per-interface time stamp configuration OID pointer and config variable 
in the ifnet.

In testing my new patch, I found another problem. BPF maintains a list 
(bpf_iflist) of bpf_if structs, one per ifnet/dlt combo. Each call to 
bpf_attach() prepends a new bpf_if struct to the list. bpfdetach() is 
expected to reclaim all bpf_if structs which reference the specified 
ifnet from the list when it is called, but currently only removes the 
first it finds.

As far as I can tell, the implementation of bpfdetach() leaks bpf_if 
references, and has done so since it was introduced in r58273. See here 
for the function added in r58273:

http://svnweb.freebsd.org/base/head/sys/net/bpf.c?revision=58273&view=markup#l1311

I would like to commit something like the following as a fix:

http://people.freebsd.org/~lstewart/patches/misc/bpfdetach_bpfif_leakfix_10.x.r229165.patch

With the above patch and my revised r228986 patch both applied, I can no 
longer produce panics on my laptop.

Robert, as you committed r58273, I'm hoping you might have some thoughts 
on this (even if it's almost 12 years later ;). Am I missing something 
subtle, or is my analysis sensible? Assuming my analysis and patch are 
sane, would you mind giving it a quick review so I can commit it?

Adrian, would you mind applying both the above bpf leakfix patch 
followed by the revised sysclocksnap/tscfg patch below and testing to 
make sure your kernel no longer explodes on boot?

http://people.freebsd.org/~lstewart/patches/misc/bpf_sysclocksnap_tscfg_postleakfix_10.x.r229165.patch

Cheers,
Lawrence



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4F012004.8010508>