From owner-freebsd-wireless@FreeBSD.ORG Mon Nov 11 20:47:21 2013 Return-Path: Delivered-To: freebsd-wireless@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 280DB787; Mon, 11 Nov 2013 20:47:21 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id EAC02261B; Mon, 11 Nov 2013 20:47:20 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id F3DD5B9B3; Mon, 11 Nov 2013 15:47:19 -0500 (EST) From: John Baldwin To: clutton Subject: Re: service netif restart [iface] runs a wpa_supplicant twice Date: Mon, 11 Nov 2013 15:44:14 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20130906; KDE/4.5.5; amd64; ; ) References: <1382572583.1862.39.camel@eva02.mbsd> <201311061159.14824.jhb@freebsd.org> <1383862923.70321.87.camel@eva02.mbsd> In-Reply-To: <1383862923.70321.87.camel@eva02.mbsd> MIME-Version: 1.0 Content-Type: Text/Plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <201311111544.15187.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Mon, 11 Nov 2013 15:47:20 -0500 (EST) Cc: freebsd-wireless@freebsd.org, freebsd-arch@freebsd.org X-BeenThere: freebsd-wireless@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "Discussions of 802.11 stack, tools device driver development." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Nov 2013 20:47:21 -0000 On Thursday, November 07, 2013 5:22:03 pm clutton wrote: > On Wed, 2013-11-06 at 11:59 -0500, John Baldwin wrote: > > On Tuesday, November 05, 2013 5:17:30 pm John Baldwin wrote: > > > On Tuesday, November 05, 2013 2:33:50 pm Bernhard Schmidt wrote: > > > > On Tue, Nov 5, 2013 at 5:54 PM, John Baldwin wrot= e: > > > > > On Sunday, November 03, 2013 12:56:08 pm Adrian Chadd wrote: > > > > >> On 2 November 2013 12:13, clutton wrote: > > > > >> > > > > >> [snip] > > > > >> > > > > >> > What was happened? netif tries to setup wlan0 (clone, wpa, dhc= p,=20 etc), > > > > >> > when wlan0 interface occurs, devd runs another copy of netif. > > > > >> > > > > >> Well, it sounds like we need to pick an architecture _and_ fix t= he > > > > >> behaviour here. > > > > >> > > > > >> Which is: > > > > >> > > > > >> > > > > >> * I think wpa-supplicant should always run if it's required in=20 /etc/rc.conf; > > > > >> * netif should check if devd is configured and if so, just leave= =20 the > > > > >> configuration up to devd > > > > >> * if it isn't running, then devd should be responsible for > > > > >> dhclient/add-to-wpa-config > > > > >> > > > > >> What we first have to establish is whether add_interface and > > > > >> remove_interface (or whatever they're called) are correctly=20 working, > > > > >> for ethernet and wifi driver types. Then, we need to ensure they= =20 can > > > > >> coexist (ie, one wpa_supplicant, but with both ethernet/wifi=20 drivers > > > > >> loaded and active on their relevant interfaces.) _then_ we can=20 break > > > > >> out the "stuff devd does" out of netif and have _either_ netif=20 (x)or > > > > >> devd call this new script to setup/teardown the interface runtime > > > > >> state. > > > > >> > > > > >> How's that sound? > > > > > > > > > > Note that devd just runs netif (via /etc/pccard_ether), so it's=20 already > > > > > just one script, and having netif bail if devd is running would m= ake > > > > > netif not do anything in the common case. > > > > > > > > > > What normally happens during boot is that '/etc/rc.d/netif start'= =20 creates > > > > > wlan0 and runs wpa_supplicant via 'childif_create' making a neste= d=20 call to > > > > > ifn_start for wlan0. That is, childif_create autoruns=20 /etc/rc.d/netif > > > > > explicitly after it creates the device. Probably that is what=20 should be > > > > > removed. That would let devd always start wpa_supplicant via > > > > > /etc/pccard_ether. I've just tested this by doing a stop/start o= n=20 iwn0 > > > > > (parent of wlan0, so wlan0 gets destroyed and re-created) and it= =20 started > > > > > wpa_supplicant correctly. > > > > > > > > > > Index: head/etc/network.subr > > > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > > > --- network.subr (revision 257705) > > > > > +++ network.subr (working copy) > > > > > @@ -1429,9 +1429,6 @@ childif_create() > > > > > fi > > > > > ${IFCONFIG_CMD} $i name $child && cfg=3D0 > > > > > fi > > > > > - if autoif $child; then > > > > > - ifn_start $child > > > > > - fi > > > > > done > > > > > > > > > > # Create vlan interfaces > > > > > > > > > > I also tested vlans created via vlans_ and they should use th= e=20 same fix as > > > > > well. Note that this model is more consistent with how=20 cloned_interfaces > > > > > works where ifn_start is not explicitly run when each interface i= s=20 created. > > > > > Instead, we rely on devd kicking off pccard_ether for those as we= ll. > > > >=20 > > > > That looks sane too me. > > > >=20 > > > > Just one question, I remember that devd is disabled during boot and > > > > activated later through a sysctl (to ignore events entirely), is th= is > > > > the case before or after netif is running? I guess it is activated > > > > after netif, otherwise we would have seen this issue on booting and > > > > not just during netif restart. > > >=20 > > > Hmm, devd starts after netif, but it just worked fine for me when I=20 booted up. > > > I also misspoke about cloned_interfaces. We manually add the=20 cloned_interface > > > list to the list of interfaces /etc/rc.d/netif iterates over. What I= am > > > puzzled by is that this just worked for me during a test boot. Hmm, = it=20 looks like > > > devctl is no longer disabled during boot and then explicitly enabled = by=20 devd. > > > devctl is now always enabled during boot, but capped at 1000 entries = to=20 avoid > > > leaking memory. In fact, it looks like devd tries to recreate a few= =20 interfaces > > > after netif finishes and is generally confused. I tried again with=20 devd_flags > > > set to "-n" to flush the initial set of events on boot. This removed= =20 the > > > multiple calls to netif on boot on my laptop, but somehow wpa_supplic= ant=20 is > > > still being started by devd (and I'm not sure how now). > >=20 > > I've hacked devd some more and can now see what is going on. -n doesn'= t=20 do what > > I thought it does. It does not throw away pending events on startup, i= t=20 just > > makes devd not fork until it has walked the initial set of events. The= =20 kernel > > changed (a while ago) to queue the first 1000 events until devd starts = up. =20 This > > means that in practice devd gets arrival events for all devices in the= =20 system as > > soon as it starts up and triggers duplicate invocations of netif after= =20 netif > > finishes. However, /etc/pccard_ether ignores attempts to start a devic= e=20 that > > is already up, so this should be a no-op on bootup (if my change is=20 reverted) > > as the interfaces should already be configured by the time devd starts.= I=20 suspect > > what happens in multiuser is that devd fires off pccard_ether and sees= =20 that the > > interface isn't up before the original netif has a chance to invoke the= =20 nested > > ifn_start. We could perhaps change it so we only invoke ifn_start if d= evd > > isn't running. > >=20 > > One other thought: I restart my wireless interfaces by doing > > 'sh /etc/rc.d/netif restart wlan0', not 'iwn0'. This doesn't=20 teardown/recreate > > the wlan0 device, so it doesn't suffer from the issue reported by the O= P. > >=20 > > Here is a change I've tested that seems to do the right thing both at b= oot=20 time > > and doing a restart of either iwn0 or wlan0 at runtime. If devd is=20 running > > it leaves the task of starting an interface up to devd, otherwise (such= as=20 during > > boot), it configures the new child interface synchronously. > >=20 > > Note that pgrep is in /bin. > >=20 > > Index: network.subr > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > --- network.subr (revision 257747) > > +++ network.subr (working copy) > > @@ -1406,10 +1406,14 @@ clone_down() > > # > > childif_create() > > { > > - local cfg child child_vlans child_wlans create_args debug_flags ifn i > > + local cfg child child_vlans child_wlans create_args debug_flags devd \ > > + ifn i > > cfg=3D1 > > ifn=3D$1 > > =20 > > + # Check if devd is running > > + devd=3D$(pgrep devd) > > + > > # Create wireless interfaces > > child_wlans=3D`get_if_var $ifn wlans_IF` > > =20 > > @@ -1429,6 +1433,9 @@ childif_create() > > fi > > ${IFCONFIG_CMD} $i name $child && cfg=3D0 > > fi > > + if [ -z "$devd" ] && autoif $child; then > > + ifn_start $child > > + fi > > done > > =20 > > # Create vlan interfaces > > @@ -1452,6 +1459,9 @@ childif_create() > > ${IFCONFIG_CMD} $i name $child && cfg=3D0 > > fi > > fi > > + if [ -z "$devd" ] && autoif $child; then > > + ifn_start $child > > + fi > > done > > =20 > > return ${cfg} > >=20 >=20 > Yes, the "service netif restart wlan0" doesn't teardown/recreate the > wlan0 device. Anyway, a "service netif restart" does. >=20 > What about removing this functionality, instead? See the patch below. >=20 > The pros: > 1) creating the wlan interface by hand (by ifconfig) means that the > further configuration is going to be in that way, by hand (by ifconfig, > route, dhcpclient, etc). > 2) already written down configuration (in rc.conf) means working with rc > subsystem (netif) > 3) I have no idea why somebody would expect from a command "ifconfig > wlan0 create wlandev ath0" the same behaviour as from a "service netif > start wlan0". Eh, I work with vlans quite a bit at work and I certainly do a model where I edit rc.conf and then create it by hand to bring it up. This is one less step than having to manually ifconfig it after creating it and then going to write to rc.conf. > 4) Let's remove the unexpected behaviour at all, it's prone error, it's > not obviously at first glance, some kind of clever computer which knows > better what do you need. I think that we have this functionality > occasionally, no one had designed this on purpose, am I wrong? I don't agree. > The cons: > I have none. You told me. >=20 >=20 > =CE=9E ~ =E2=86=92 diff -u /usr/src/etc/devd.conf /etc/devd.conf =20 > --- /usr/src/etc/devd.conf 2013-09-29 17:24:16.759250174 +0300 > +++ /etc/devd.conf 2013-11-07 23:43:17.833616197 +0200 > @@ -38,7 +38,7 @@ > # > notify 0 { > match "system" "IFNET"; > - match "subsystem" "!usbus[0-9]+"; > + match "subsystem" "!(usbus|wlan|vlan)[0-9]+"; > match "type" "ATTACH"; > action "/etc/pccard_ether $subsystem start"; > }; This isn't complete at all. Now you need to exclude everything, because what if I do 'ifconfig tap0 create' by hand? Now do you expect it to not configure that if I have bits for it in rc.conf? The usbus example here is because usbus isn't a real ifnet, but an abuse of the subsystem. wlanX and vlanX are real interfaces, they are just clone interfaces similar to tap/tun, etc. rather than "physical" interfaces liek bge0, etc. Everyone expects bge0 to auto configure if you pop in a bge cardbus card or kldload the driver. =2D-=20 John Baldwin