Date: Tue, 31 Jul 2007 18:00:10 GMT From: Bruce Evans <brde@optusnet.com.au> To: freebsd-i386@FreeBSD.org Subject: Re: i386/115054: NTP errors out on startup but restart of NTP fixes problem Message-ID: <200707311800.l6VI0AKN070144@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
The following reply was made to PR i386/115054; it has been noted by GNATS. From: Bruce Evans <brde@optusnet.com.au> To: "Chauncey N. Menefee" <cmenefee@prism-grp.com> Cc: freebsd-gnats-submit@freebsd.org, freebsd-i386@freebsd.org Subject: Re: i386/115054: NTP errors out on startup but restart of NTP fixes problem Date: Tue, 31 Jul 2007 08:26:21 +1000 (EST) On Mon, 30 Jul 2007, Chauncey N. Menefee wrote: >> From what we've been able to gather the NTP daemon is starting up before the network card and errors out. Restarting the NTP service afterwards clears up the problem. Several versions of FreeBSD have annoying behaviouor for network startup, involving the network not actually being up when ifconfig returns and subsequent different mishandling of this by various utilities. I use the workaround of a couple of pings in rc.d/netif (ping -c2 -t2 $ntpdhost or ping -c1 -t1 $ntpdhost) so that ping times out instead of more important services. This usually works for ntpd startup, but not for nfs startup. Nfs doesn't fail, but makes you wish it would, by failing at first and then only retrying after about 30 or 60 seconds, to that booting takes too long. This problem seems to get worse with each release of FreeBSD and/or with newer NICs. I never noticed fxp or even ed or rl NICs. Now it is barely noticeable with fxp and very noticeable with sk, bge and em NICs. For bge, "ifconfig up" after "ifconfig down" takes 2 seconds to return, but the network still isn't quite back up at that point, as shown by "route get $ntpdhost" taking another 5+ seconds to return and the route cloning not even being quite complete when it returns: Under FreeBSD-~5.2: %%% Script started on Tue Jul 31 07:58:24 2007 ttyv1:root@besplex:/tmp> route get delplex; ifconfig bge0 down; time ifconfig b ge0 up; time route get delplex; time route get delplex route to: delplex destination: delplex interface: bge0 flags: <UP,HOST,DONE,LLINFO,WASCLONED> recvpipe sendpipe ssthresh rtt,msec rttvar hopcount mtu expire 0 0 0 0 0 0 1500 1052 1.90 real 0.00 user 1.90 sys route to: delplex destination: 192.168.2.0 mask: 255.255.255.0 interface: bge0 flags: <UP,DONE,CLONING> recvpipe sendpipe ssthresh rtt,msec rttvar hopcount mtu expire 0 0 0 0 0 0 1500 -7 5.25 real 0.00 user 0.00 sys route to: delplex destination: delplex interface: bge0 flags: <UP,HOST,DONE,LLINFO,WASCLONED> recvpipe sendpipe ssthresh rtt,msec rttvar hopcount mtu expire 0 0 0 0 0 0 1500 1200 0.00 real 0.00 user 0.00 sys ttyv1:root@besplex:/tmp> exit Script done on Tue Jul 31 07:58:56 2007 %%% Maybe I should be using "route get $ntpdhost; route get $nfshost ..." instead of the pings, since route(8) apparently waits long enough, while waiting for the minimal amount of time is harder to program with ping (ping -c1 $ntpdhost takes 11+ seconds where "route get $ntpdhost" takes only 5+, and then it is unclear if ping waited long enough since it loses the packet anyway; I avoid this 11+ second wait using -t1 or -t2, but the 1-2 second timeout is apparently not long enough). At boot time, the initial ifconfig seems to involve too much link flapping. At least for bge in -current on a different machine booted to single-user mode so that I can look at the initial state, the interface is already up (but unused), with the message about this being printed a couple of seconds after reaching the shell prompt (actually in the middle of "ifconfig <no options>"). Then the initial ifconfig causes the link to go down and up. The behaviour of -current is quite different for the above commands -- both "ifconfig up" and "route get" return before the link is actually up; they return in < 0.01 seconds, but the link still takes about 2 seconds to come back according to the "link state changed" message. This is probably why I'm using the ping hack with a constant timeout -- I had forgotten some details and want to use the same rc.d/netif on all machines. Another difference in -current is that the second "route get" doesn't show the cloning completed. That might be only because I had to test on an inactive machine since bringing bge0 down breaks normal operation. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200707311800.l6VI0AKN070144>