Date: Thu, 2 Aug 2007 06:31:29 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> To: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= <des@des.no> Cc: "Chauncey N. Menefee" <cmenefee@prism-grp.com>, freebsd-gnats-submit@freebsd.org, freebsd-i386@freebsd.org Subject: Re: i386/115054: NTP errors out on startup but restart of NTP fixes problem Message-ID: <20070802060947.O76862@delplex.bde.org> In-Reply-To: <86odhrlb18.fsf@ds4.des.no> References: <200707301716.l6UHG3eD020378@www.freebsd.org> <20070731072434.F5028@besplex.bde.org> <86odhrlb18.fsf@ds4.des.no>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 1 Aug 2007, [utf-8] Dag-Erling Smørgrav wrote:
> Bruce Evans <brde@optusnet.com.au> writes:
>> Several versions of FreeBSD have annoying behaviouor for network
>> startup, involving the network not actually being up when ifconfig
>> returns and subsequent different mishandling of this by various
>> utilities. [...]
>> This problem seems to get worse with each release of FreeBSD and/or
>> with newer NICs. I never noticed fxp or even ed or rl NICs. Now it
>> is barely noticeable with fxp and very noticeable with sk, bge and em
>> NICs.
>
> I have never seen this with any of the cards I've used (xl, fxp, rl, re,
> sis, bge, sk, msk and probably others, in no particular order).
>
> Perhaps there is a hardware issue involved? Does the problem occur if
> you hardcode the link speed instead of relying on autonegotiation?
No difference. I thought it might be the cheap switch, but going
direct makes no difference except to break hard-coding the link speed
for bge. Thie followings is with bge (1Gbps capable but reduced to
100baseTX full-duplex by autonegotiation) under -current, connected
to fxp (100baseTX full-duplex by autonegotiation or hard-coded) under
FreeBSD-~5.2:
%%%
ttyv0:root@besplex:~> ifconfig bge0 down; time ifconfig bge0 up; time ping -c1
delplex; time route get delplex; time route get delplex
0.48 real 0.00 user 0.47 sys
PING delplex.bde.org (192.168.2.4): 56 data bytes
Aug 2 05:57:49 besplex kernel: bge0: link state changed to DOWN
Aug 2 05:57:51 besplex kernel: bge0: link state changed to UP
--- delplex.bde.org ping statistics ---
1 packets transmitted, 0 packets received, 100% packet loss
11.01 real 0.00 user 0.00 sys
route to: delplex
destination: delplex
interface: bge0
flags: <UP,HOST,DONE,LLINFO,WASCLONED>
recvpipe sendpipe ssthresh rtt,msec rttvar hopcount mtu expire
0 0 0 0 0 0 1500 1191
0.00 real 0.00 user 0.00 sys
route to: delplex
destination: delplex
interface: bge0
flags: <UP,HOST,DONE,LLINFO,WASCLONED>
recvpipe sendpipe ssthresh rtt,msec rttvar hopcount mtu expire
0 0 0 0 0 0 1500 1191
0.00 real 0.00 user 0.00 sys
%%%
-current gives the differences that:
o ifconfig returns after 0.48 seconds instead of after 2+ seconds. The
"link state changed to UP" message still takes 2+ seconds altogether.
o The message is now printed to a different unwanted place (using tprintf()
I think, instead of using printf(), but I want it in stderr). The above
output was captured using vidcontrol.
o The timestamps on the messages made by syslogd are almost precise enough
to show the 2 second delay.
o ping still returns after 11+ seconds, but now it starts about 1.5 seconds
earlier relative to the UP message, so the 11 seconds may be just ping's
timeout and not related to UPness.
%%%
ttyv0:root@besplex:~> ifconfig bge0 down; time ifconfig bge0 up; time route get
delplex; time route get delplex
0.48 real 0.00 user 0.47 sys
route to: delplex
Aug 2 05:58:25 besplex kernel: bge0: link state changed to DOWN
Aug 2 05:58:27 besplex kernel: bge0: link state changed to UP
destination: 192.168.2.0
mask: 255.255.255.0
interface: bge0
flags: <UP,DONE,CLONING>
recvpipe sendpipe ssthresh rtt,msec rttvar hopcount mtu expire
0 0 0 0 0 0 1500 -7
5.26 real 0.00 user 0.00 sys
route to: delplex
destination: delplex
interface: bge0
flags: <UP,HOST,DONE,LLINFO,WASCLONED>
recvpipe sendpipe ssthresh rtt,msec rttvar hopcount mtu expire
0 0 0 0 0 0 1500 1196
0.00 real 0.00 user 0.00 sys
%%%
The first "route get" still returns after 5+ seconds, but now it starts
about 1.5 seconds earlier relative to the UP message, so the 5 seconds
may be just route's timeout and not related to UPness.
The -current bge driver is acting identically to the ~5.2 bge driver.
Userland is ~5.2 all tests. One reason I didn't report this earlier is
that it might be due to the ~5.2 userland and I don't have time to test
with a full -current userland, but ifconfig and route(8) seem to be portable
enough to mostly work with both kernels. route(8) has a known problem
concerning the base for the expire time (it was broken for a long time
in -current due to the change to mono-time, but this causes few problems).
Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070802060947.O76862>
