From owner-freebsd-net@FreeBSD.ORG Mon Feb 11 07:52:02 2008 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 23D2416A417; Mon, 11 Feb 2008 07:52:02 +0000 (UTC) (envelope-from robertjenssen@ozemail.com.au) Received: from outbound.icp-qv1-irony-out1.iinet.net.au (outbound.icp-qv1-irony-out1.iinet.net.au [203.59.1.108]) by mx1.freebsd.org (Postfix) with ESMTP id 5A8C513C45A; Mon, 11 Feb 2008 07:52:00 +0000 (UTC) (envelope-from robertjenssen@ozemail.com.au) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AgAAADONr0d8qzag/2dsb2JhbAAIpwU X-IronPort-AV: E=Sophos;i="4.25,332,1199631600"; d="scan'208";a="285358796" Received: from unknown (HELO [192.168.0.4]) ([124.171.54.160]) by outbound.icp-qv1-irony-out1.iinet.net.au with ESMTP; 11 Feb 2008 16:51:59 +0900 From: Robert Jenssen To: Brooks Davis Date: Mon, 11 Feb 2008 18:51:58 +1100 User-Agent: KMail/1.9.7 References: <200802111137.21550.robertjenssen@ozemail.com.au> <20080211010626.GA69153@lor.one-eyed-alien.net> In-Reply-To: <20080211010626.GA69153@lor.one-eyed-alien.net> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200802111851.58155.robertjenssen@ozemail.com.au> Cc: net@freebsd.org Subject: Re: dhclient conflict between /sbin/devd and /etc/rc.d/netif ? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Feb 2008 07:52:02 -0000 Hi Brooks and all, On Mon, 11 Feb 2008 12:06:26 pm you wrote: > On Mon, Feb 11, 2008 at 11:37:21AM +1100, Robert Jenssen wrote: > > Hi, > > Every so often I have trouble connecting rt2560 based PCI wireless network > > card to my wireless router/access point. Typically I get: > > > > # sudo /etc/rc.d/netif restart ral0 > > Starting wpa_supplicant. > > ral0: no link .............. giving up > > ral0: flags=8843 metric 0 mtu 1500 > > ether 00:11:50:63:cd:47 > > media: IEEE 802.11 Wireless Ethernet autoselect (DS/1Mbps) > > status: no carrier > > > > Even though there seems to be plenty of signal power: > > > > # sudo ifconfig ral0 list scan > > SSID BSSID CHAN RATE S:N INT CAPS > > xxxxxxx... 00:xx:xx:xx:xx:xx 10 54M -74:-95 100 EPS WPA > > > > Recently I noticed that sometimes, after the above "netif restart" fails, the > > ral0 interface "automagically" comes up anyway. Then dhclient is owned > > by /sbin/devd. The default devd.conf starts dhclient for both ethernet and > > PCI-cardbus devices. Is it a good idea for both /sbin/devd > > and /etc/rc.d/netif to start a dhclient on ral0 at about the same time? In the "magical" case above what I think is happening is that the dhclient startup from /etc/rc.d/netif called by rc fails. Later /etc/rc.d/netif is called again from /etc/pccard_ether:pccard_ether_start() by /sbin/devd. That call succeeds. The rc system uses rcorder to determine the order in which to run the rc scripts. On my system rcorder shows devd fairly early in the list. The devd.conf file calls a number of rc scripts. So far as I can see /sbin/devd doesn't check that these are called in the order listed by rcorder. Is this a problem? I have disabled devd (set the moused port explicitly in rc.conf) and done some simple tests on /usr/src/sbin/dhclient.c. In particular, at line 365 main() allows a hard-coded maximum of 10 seconds for the call to interface_link_status() to succeed. I changed this to 20 seconds with a print out and ran /etc/rc.d/netif restart a few times with rc_debug="YES". The results were 15 15 5 5 5 5 5 15 15 5 5 5 5 5 21(timed out!) 5 5 and 5 seconds. Presumably the (10n+5) seconds is a magic number inside my wireless card or router. I'm going to set the hardcoded value to 25 seconds. Would it be possible for you to commit a similar change? Here is a patch: *** src/sbin/dhclient/dhclient.c 2007-02-10 04:50:26.000000000 +1100 --- /usr/src/sbin/dhclient/dhclient.c 2008-02-11 18:09:25.000000000 +1100 *************** *** 360,370 **** fflush(stderr); sleep(1); while (!interface_link_status(ifi->name)) { fprintf(stderr, "."); fflush(stderr); ! if (++i > 10) { fprintf(stderr, " giving up\n"); exit(1); } sleep(1); } --- 360,370 ---- fflush(stderr); sleep(1); while (!interface_link_status(ifi->name)) { fprintf(stderr, "."); fflush(stderr); ! if (++i > 25) { fprintf(stderr, " giving up\n"); exit(1); } sleep(1); } ("diff -C 5" to show the sleep()s!). Rather than dhclient.c timing 10 seconds and calling exit(), as shown above, shouldn't the dhclient.conf "timeout" configuration item cover this situation? I see that PR bin/98577 wants this hardcoded timeout reduced or made adjustable via dhclient.conf. Best regards, Rob Jenssen