From owner-freebsd-i386@FreeBSD.ORG Tue Jul 31 18:00:11 2007 Return-Path: Delivered-To: freebsd-i386@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3B0D816A41F for ; Tue, 31 Jul 2007 18:00:11 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 1568B13C458 for ; Tue, 31 Jul 2007 18:00:11 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.1/8.14.1) with ESMTP id l6VI0Aee070150 for ; Tue, 31 Jul 2007 18:00:10 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.1/8.14.1/Submit) id l6VI0AKN070144; Tue, 31 Jul 2007 18:00:10 GMT (envelope-from gnats) Date: Tue, 31 Jul 2007 18:00:10 GMT Message-Id: <200707311800.l6VI0AKN070144@freefall.freebsd.org> To: freebsd-i386@FreeBSD.org From: Bruce Evans Cc: Subject: Re: i386/115054: NTP errors out on startup but restart of NTP fixes problem X-BeenThere: freebsd-i386@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Bruce Evans List-Id: I386-specific issues for FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 31 Jul 2007 18:00:11 -0000 The following reply was made to PR i386/115054; it has been noted by GNATS. From: Bruce Evans To: "Chauncey N. Menefee" Cc: freebsd-gnats-submit@freebsd.org, freebsd-i386@freebsd.org Subject: Re: i386/115054: NTP errors out on startup but restart of NTP fixes problem Date: Tue, 31 Jul 2007 08:26:21 +1000 (EST) On Mon, 30 Jul 2007, Chauncey N. Menefee wrote: >> From what we've been able to gather the NTP daemon is starting up before the network card and errors out. Restarting the NTP service afterwards clears up the problem. Several versions of FreeBSD have annoying behaviouor for network startup, involving the network not actually being up when ifconfig returns and subsequent different mishandling of this by various utilities. I use the workaround of a couple of pings in rc.d/netif (ping -c2 -t2 $ntpdhost or ping -c1 -t1 $ntpdhost) so that ping times out instead of more important services. This usually works for ntpd startup, but not for nfs startup. Nfs doesn't fail, but makes you wish it would, by failing at first and then only retrying after about 30 or 60 seconds, to that booting takes too long. This problem seems to get worse with each release of FreeBSD and/or with newer NICs. I never noticed fxp or even ed or rl NICs. Now it is barely noticeable with fxp and very noticeable with sk, bge and em NICs. For bge, "ifconfig up" after "ifconfig down" takes 2 seconds to return, but the network still isn't quite back up at that point, as shown by "route get $ntpdhost" taking another 5+ seconds to return and the route cloning not even being quite complete when it returns: Under FreeBSD-~5.2: %%% Script started on Tue Jul 31 07:58:24 2007 ttyv1:root@besplex:/tmp> route get delplex; ifconfig bge0 down; time ifconfig b ge0 up; time route get delplex; time route get delplex route to: delplex destination: delplex interface: bge0 flags: recvpipe sendpipe ssthresh rtt,msec rttvar hopcount mtu expire 0 0 0 0 0 0 1500 1052 1.90 real 0.00 user 1.90 sys route to: delplex destination: 192.168.2.0 mask: 255.255.255.0 interface: bge0 flags: recvpipe sendpipe ssthresh rtt,msec rttvar hopcount mtu expire 0 0 0 0 0 0 1500 -7 5.25 real 0.00 user 0.00 sys route to: delplex destination: delplex interface: bge0 flags: recvpipe sendpipe ssthresh rtt,msec rttvar hopcount mtu expire 0 0 0 0 0 0 1500 1200 0.00 real 0.00 user 0.00 sys ttyv1:root@besplex:/tmp> exit Script done on Tue Jul 31 07:58:56 2007 %%% Maybe I should be using "route get $ntpdhost; route get $nfshost ..." instead of the pings, since route(8) apparently waits long enough, while waiting for the minimal amount of time is harder to program with ping (ping -c1 $ntpdhost takes 11+ seconds where "route get $ntpdhost" takes only 5+, and then it is unclear if ping waited long enough since it loses the packet anyway; I avoid this 11+ second wait using -t1 or -t2, but the 1-2 second timeout is apparently not long enough). At boot time, the initial ifconfig seems to involve too much link flapping. At least for bge in -current on a different machine booted to single-user mode so that I can look at the initial state, the interface is already up (but unused), with the message about this being printed a couple of seconds after reaching the shell prompt (actually in the middle of "ifconfig "). Then the initial ifconfig causes the link to go down and up. The behaviour of -current is quite different for the above commands -- both "ifconfig up" and "route get" return before the link is actually up; they return in < 0.01 seconds, but the link still takes about 2 seconds to come back according to the "link state changed" message. This is probably why I'm using the ping hack with a constant timeout -- I had forgotten some details and want to use the same rc.d/netif on all machines. Another difference in -current is that the second "route get" doesn't show the cloning completed. That might be only because I had to test on an inactive machine since bringing bge0 down breaks normal operation. Bruce