Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 18 Jan 1995 14:54:22 +0100
From:      Andras Olah <olah@cs.utwente.nl>
To:        davidg@Root.COM
Cc:        hackers@FreeBSD.org
Subject:   Re: Netinet internals (Was: Patching a running kernel) 
Message-ID:  <3909.790437262@utis156.cs.utwente.nl>
In-Reply-To: Your message of Tue, 17 Jan 1995 21:12:23 PST

next in thread | raw e-mail | index | archive | help
[ .. ]

>    These are obviously both bugs. I didn't notice the echo and the ack
> occuring seperately when I analyzed the packet stream after making the
> change...so this is a surprise to me. On the other hand, now that you mention
> it, it does appear that this is what the code is actually going to do. Hmmm.
> The second is an oversight and it should certainly account for any options that
> may reduce the length - provided that the sender takes this into account when
> deciding whether or not to send. The problem that was originally 'solved' by
> these changes was one where interactive usage over ether or other high speed
> network connection was 'choppy' because of the 200ms delays inserted into echo
> and other short packets (vi was especially bad).

IMO, these changes (setting ACKNOW if segment is shorter than MSS)
aren't necessary because the echo packets aren't delayed for 200ms. 
I've compiled out the code in question and hitting a single
character generates the following traffic:

09:30:05.610948 localhost.1025 > localhost.telnet: P 1:2(1) ack 2 win 16384 <nop,nop,ts 1198:1198> [tos 0x10]
09:30:05.613837 localhost.telnet > localhost.1025: P 2:3(1) ack 2 win 16384 <nop,nop,ts 1198:1198> [tos 0x10]
09:30:05.770186 localhost.1025 > localhost.telnet: . ack 3 win 16384 <nop,nop,ts 1198:1198> [tos 0x10]

The first packet carries the character, the second acks it and
carries the echo and the third acks the echo.  Only the ack of the
echo is triggered by the delack timer which is normal, but that
doesn't affect the responsiveness of vi (or anything else).  The
following fragment from tcp_output assures that the echo isn't
delayed:

	if ((idle || tp->t_flags & TF_NODELAY) &&
	    len + off >= so->so_snd.sb_cc)
		goto send;

Therefore, I'd suggest that we change our tcp_input back to the
original 4.4 version with respect to delayed acks.

[ ... ]

>    Indeed, rfc1122 does say "SHOULD"...which makes this behavior not required.
> The problem with acking this often is that on high speed, half-duplex networks
> like ethernet, the collision rate caused by acks this frequently can consume a
> large amount of the banwidth (measured 10-20%).

That's an interesting point, I'll check it out.  I'd appreciate if
you have traces or other descriptions of such behavior.

>    In the 4.4-lite TCP code, the acks every 2 packets was indirectly caused by
> limiting the window to 4K. This had exceptionally bad side effects on long,
> slow, high latency connections that are typical on the internet and usually
> resulted in connection thrashing (my terminology) - i.e. the connection becomes
> bursty and unable to stream.

I totally agree that small windows kill throughput on long-delay,
reasonably fast links, so decreasing window size isn't a way to go.

The reason I think delayed acks for more than two segments may be a
problem is that it may adversely affect the congestion control
algorithms.  Slow start increases cwnd by maxseg for each ack in the
exponential phase, thus less frequent acks result in slower
slow-start.  There's something about it in a paper of Lawrence
Brakmo (ftp.cs.arizona.edu:xkernel/Papers/tcp_problems.ps).  I'll
try to make some tests to see exactly what goes on.

Anyhow, this seems to be a rare problem because most of the time the
application reads data from the socket in a tight loop.  In such
cases, tcp_input awakes the user process, the PRU_RCVD usrreq call
from soreceive will call tcp_output, which assures that every second
segment is acked.  The acks are only delayed longer than 2 segs when
the application does other things between two reads and the windows
are large.  In that case, acks to in-sequence segments are delayed
until the app finally reads the data or the delack timer goes off.

I was able to demonstrate this on our network (Sun - router - FBSD)
running ttcp with 32K buffers and 50ms delays between the read
calls.  ACKs were generated sometimes for only every 10th segment. 
And this is a feature of every BSD implementation (at least SunOS
4.1.1 does the same).

> -DG

Andras



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3909.790437262>