From owner-freebsd-i386@FreeBSD.ORG Tue Jul 31 17:50:49 2007 Return-Path: Delivered-To: freebsd-i386@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EB9AC16A41B; Tue, 31 Jul 2007 17:50:48 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from fallbackmx02.syd.optusnet.com.au (fallbackmx02.syd.optusnet.com.au [211.29.133.72]) by mx1.freebsd.org (Postfix) with ESMTP id BF00D13C428; Tue, 31 Jul 2007 17:50:47 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail01.syd.optusnet.com.au (mail01.syd.optusnet.com.au [211.29.132.182]) by fallbackmx02.syd.optusnet.com.au (8.12.11.20060308/8.12.11) with ESMTP id l6UMQVKj018498; Tue, 31 Jul 2007 08:26:31 +1000 Received: from besplex.bde.org (c220-239-235-248.carlnfd3.nsw.optusnet.com.au [220.239.235.248]) by mail01.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id l6UMQLda019530 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 31 Jul 2007 08:26:28 +1000 Date: Tue, 31 Jul 2007 08:26:21 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: "Chauncey N. Menefee" In-Reply-To: <200707301716.l6UHG3eD020378@www.freebsd.org> Message-ID: <20070731072434.F5028@besplex.bde.org> References: <200707301716.l6UHG3eD020378@www.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-gnats-submit@freebsd.org, freebsd-i386@freebsd.org Subject: Re: i386/115054: NTP errors out on startup but restart of NTP fixes problem X-BeenThere: freebsd-i386@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: I386-specific issues for FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 31 Jul 2007 17:50:49 -0000 On Mon, 30 Jul 2007, Chauncey N. Menefee wrote: >> From what we've been able to gather the NTP daemon is starting up before the network card and errors out. Restarting the NTP service afterwards clears up the problem. Several versions of FreeBSD have annoying behaviouor for network startup, involving the network not actually being up when ifconfig returns and subsequent different mishandling of this by various utilities. I use the workaround of a couple of pings in rc.d/netif (ping -c2 -t2 $ntpdhost or ping -c1 -t1 $ntpdhost) so that ping times out instead of more important services. This usually works for ntpd startup, but not for nfs startup. Nfs doesn't fail, but makes you wish it would, by failing at first and then only retrying after about 30 or 60 seconds, to that booting takes too long. This problem seems to get worse with each release of FreeBSD and/or with newer NICs. I never noticed fxp or even ed or rl NICs. Now it is barely noticeable with fxp and very noticeable with sk, bge and em NICs. For bge, "ifconfig up" after "ifconfig down" takes 2 seconds to return, but the network still isn't quite back up at that point, as shown by "route get $ntpdhost" taking another 5+ seconds to return and the route cloning not even being quite complete when it returns: Under FreeBSD-~5.2: %%% Script started on Tue Jul 31 07:58:24 2007 ttyv1:root@besplex:/tmp> route get delplex; ifconfig bge0 down; time ifconfig b ge0 up; time route get delplex; time route get delplex route to: delplex destination: delplex interface: bge0 flags: recvpipe sendpipe ssthresh rtt,msec rttvar hopcount mtu expire 0 0 0 0 0 0 1500 1052 1.90 real 0.00 user 1.90 sys route to: delplex destination: 192.168.2.0 mask: 255.255.255.0 interface: bge0 flags: recvpipe sendpipe ssthresh rtt,msec rttvar hopcount mtu expire 0 0 0 0 0 0 1500 -7 5.25 real 0.00 user 0.00 sys route to: delplex destination: delplex interface: bge0 flags: recvpipe sendpipe ssthresh rtt,msec rttvar hopcount mtu expire 0 0 0 0 0 0 1500 1200 0.00 real 0.00 user 0.00 sys ttyv1:root@besplex:/tmp> exit Script done on Tue Jul 31 07:58:56 2007 %%% Maybe I should be using "route get $ntpdhost; route get $nfshost ..." instead of the pings, since route(8) apparently waits long enough, while waiting for the minimal amount of time is harder to program with ping (ping -c1 $ntpdhost takes 11+ seconds where "route get $ntpdhost" takes only 5+, and then it is unclear if ping waited long enough since it loses the packet anyway; I avoid this 11+ second wait using -t1 or -t2, but the 1-2 second timeout is apparently not long enough). At boot time, the initial ifconfig seems to involve too much link flapping. At least for bge in -current on a different machine booted to single-user mode so that I can look at the initial state, the interface is already up (but unused), with the message about this being printed a couple of seconds after reaching the shell prompt (actually in the middle of "ifconfig "). Then the initial ifconfig causes the link to go down and up. The behaviour of -current is quite different for the above commands -- both "ifconfig up" and "route get" return before the link is actually up; they return in < 0.01 seconds, but the link still takes about 2 seconds to come back according to the "link state changed" message. This is probably why I'm using the ping hack with a constant timeout -- I had forgotten some details and want to use the same rc.d/netif on all machines. Another difference in -current is that the second "route get" doesn't show the cloning completed. That might be only because I had to test on an inactive machine since bringing bge0 down breaks normal operation. Bruce