Date: Tue, 31 Jul 2007 08:26:21 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> To: "Chauncey N. Menefee" <cmenefee@prism-grp.com> Cc: freebsd-gnats-submit@freebsd.org, freebsd-i386@freebsd.org Subject: Re: i386/115054: NTP errors out on startup but restart of NTP fixes problem Message-ID: <20070731072434.F5028@besplex.bde.org> In-Reply-To: <200707301716.l6UHG3eD020378@www.freebsd.org> References: <200707301716.l6UHG3eD020378@www.freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 30 Jul 2007, Chauncey N. Menefee wrote:
>> From what we've been able to gather the NTP daemon is starting up before the network card and errors out. Restarting the NTP service afterwards clears up the problem.
Several versions of FreeBSD have annoying behaviouor for network
startup, involving the network not actually being up when ifconfig
returns and subsequent different mishandling of this by various
utilities.  I use the workaround of a couple of pings in rc.d/netif
(ping -c2 -t2 $ntpdhost or ping -c1 -t1 $ntpdhost) so that ping times
out instead of more important services.  This usually works for ntpd
startup, but not for nfs startup.  Nfs doesn't fail, but makes you
wish it would, by failing at first and then only retrying after about
30 or 60 seconds, to that booting takes too long.
This problem seems to get worse with each release of FreeBSD and/or
with newer NICs.  I never noticed fxp or even ed or rl NICs.  Now it
is barely noticeable with fxp and very noticeable with sk, bge and em
NICs.  For bge, "ifconfig up" after "ifconfig down" takes 2 seconds
to return, but the network still isn't quite back up at that point,
as shown by "route get $ntpdhost" taking another 5+ seconds to return
and the route cloning not even being quite complete when it returns:
Under FreeBSD-~5.2:
%%%
Script started on Tue Jul 31 07:58:24 2007
ttyv1:root@besplex:/tmp> route get delplex; ifconfig bge0 down; time ifconfig b
ge0 up; time route get delplex; time route get delplex
    route to: delplex
destination: delplex
   interface: bge0
       flags: <UP,HOST,DONE,LLINFO,WASCLONED>
  recvpipe  sendpipe  ssthresh  rtt,msec    rttvar  hopcount      mtu     expire
        0         0         0         0         0         0      1500      1052
         1.90 real         0.00 user         1.90 sys
    route to: delplex
destination: 192.168.2.0
        mask: 255.255.255.0
   interface: bge0
       flags: <UP,DONE,CLONING>
  recvpipe  sendpipe  ssthresh  rtt,msec    rttvar  hopcount      mtu     expire
        0         0         0         0         0         0      1500        -7
         5.25 real         0.00 user         0.00 sys
    route to: delplex
destination: delplex
   interface: bge0
       flags: <UP,HOST,DONE,LLINFO,WASCLONED>
  recvpipe  sendpipe  ssthresh  rtt,msec    rttvar  hopcount      mtu     expire
        0         0         0         0         0         0      1500      1200
         0.00 real         0.00 user         0.00 sys
ttyv1:root@besplex:/tmp> exit
Script done on Tue Jul 31 07:58:56 2007
%%%
Maybe I should be using "route get $ntpdhost; route get $nfshost ..."
instead of the pings, since route(8) apparently waits long enough,
while waiting for the minimal amount of time is harder to program with
ping (ping -c1 $ntpdhost takes 11+ seconds where "route get $ntpdhost"
takes only 5+, and then it is unclear if ping waited long enough since
it loses the packet anyway; I avoid this 11+ second wait using -t1 or
-t2, but the 1-2 second timeout is apparently not long enough).
At boot time, the initial ifconfig seems to involve too much link
flapping.  At least for bge in -current on a different machine booted
to single-user mode so that I can look at the initial state, the
interface is already up (but unused), with the message about this being
printed a couple of seconds after reaching the shell prompt (actually
in the middle of "ifconfig <no options>").  Then the initial ifconfig
causes the link to go down and up.
The behaviour of -current is quite different for the above commands
-- both "ifconfig up" and "route get" return before the link is actually
up; they return in < 0.01 seconds, but the link still takes about 2
seconds to come back according to the "link state changed" message.
This is probably why I'm using the ping hack with a constant timeout --
I had forgotten some details and want to use the same rc.d/netif on all
machines.  Another difference in -current is that the second "route get"
doesn't show the cloning completed.  That might be only because I had
to test on an inactive machine since bringing bge0 down breaks normal
operation.
Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070731072434.F5028>
