From owner-freebsd-hackers  Sun Jan 22 02:12:39 1995
Return-Path: hackers-owner
Received: (from root@localhost) by freefall.cdrom.com (8.6.9/8.6.6) id CAA01758 for hackers-outgoing; Sun, 22 Jan 1995 02:12:39 -0800
Received: from Root.COM (implode.Root.COM [198.145.90.1]) by freefall.cdrom.com (8.6.9/8.6.6) with ESMTP id CAA01752 for <hackers@freebsd.org>; Sun, 22 Jan 1995 02:12:27 -0800
Received: from corbin.Root.COM (corbin.Root.COM [198.145.90.18]) by Root.COM (8.6.8/8.6.5) with ESMTP id CAA22016; Sun, 22 Jan 1995 02:12:07 -0800
Received: from localhost (localhost [127.0.0.1]) by corbin.Root.COM (8.6.9/8.6.5) with SMTP id CAA00270; Sun, 22 Jan 1995 02:12:07 -0800
Message-Id: <199501221012.CAA00270@corbin.Root.COM>
X-Authentication-Warning: corbin.Root.COM: Host localhost didn't use HELO protocol
To: Andras Olah <olah@cs.utwente.nl>
cc: hackers@FreeBSD.org
Subject: Re: Netinet internals (Was: Patching a running kernel) 
In-reply-to: Your message of "Wed, 18 Jan 95 14:54:22 +0100."
             <3909.790437262@utis156.cs.utwente.nl> 
From: David Greenman <davidg@Root.COM>
Reply-To: davidg@Root.COM
Date: Sun, 22 Jan 1995 02:12:05 -0800
Sender: hackers-owner@FreeBSD.org
Precedence: bulk

>>    These are obviously both bugs. I didn't notice the echo and the ack
>> occuring seperately when I analyzed the packet stream after making the
>> change...so this is a surprise to me. On the other hand, now that you mention
>> it, it does appear that this is what the code is actually going to do. Hmmm.
>> The second is an oversight and it should certainly account for any options that
>> may reduce the length - provided that the sender takes this into account when
>> deciding whether or not to send. The problem that was originally 'solved' by
>> these changes was one where interactive usage over ether or other high speed
>> network connection was 'choppy' because of the 200ms delays inserted into echo
>> and other short packets (vi was especially bad).
>
>IMO, these changes (setting ACKNOW if segment is shorter than MSS)
>aren't necessary because the echo packets aren't delayed for 200ms. 
>I've compiled out the code in question and hitting a single
>character generates the following traffic:
>
>09:30:05.610948 localhost.1025 > localhost.telnet: P 1:2(1) ack 2 win 16384 <nop,nop,ts 1198:1198> [tos 0x10]
>09:30:05.613837 localhost.telnet > localhost.1025: P 2:3(1) ack 2 win 16384 <nop,nop,ts 1198:1198> [tos 0x10]
>09:30:05.770186 localhost.1025 > localhost.telnet: . ack 3 win 16384 <nop,nop,ts 1198:1198> [tos 0x10]
>
>The first packet carries the character, the second acks it and
>carries the echo and the third acks the echo.  Only the ack of the
>echo is triggered by the delack timer which is normal, but that
>doesn't affect the responsiveness of vi (or anything else).  The
>following fragment from tcp_output assures that the echo isn't
>delayed:

   Only for the first character. If you type faster than 5 chars/sec the
situation changes. This can easily happen if you start using keys that
generate multiple bytes (like arrow keys).

>
>	if ((idle || tp->t_flags & TF_NODELAY) &&
>	    len + off >= so->so_snd.sb_cc)
>		goto send;
>
>Therefore, I'd suggest that we change our tcp_input back to the
>original 4.4 version with respect to delayed acks.

   The problem shows itself when using tcsh and some other shells that handle
prompts (output) and input in special ways. The prompt is output in multiple
short packets. The first packet is short (4 bytes in my case) and subsequant
packets make up the rest of the prompt. The result is that, after the first
short packet, the receiver wouldn't (normally) ack it and this results in the
sender not sending the rest until fastimeout occurs - about 200ms later.
Similar situations like this occur at other times, especially while editing. I
personally find the choppiness over ethernet annoying. Choppiness aside, this
also fixes another problem where some 4.3BSD based hosts would often send less
than the expected mss during a normal bulk transfer on an idle network. The
short packet would be randomly mixed in the data stream - probably because of
bugs in the socket->tcp buffer transition. This happens often enough to cause
*serious* performance penalties. In the worst case, I've seen transfer rates
that should have been 800-900k/sec get reduced to a slow crawl of 20k/sec.
   If you're a slow typer, you may never notice these problems. :-)
   I really don't much like the Nagel algorithm. I think something that is
more adaptive to different latency situations is needed. We know the average
latency and should take advantage of this.
   On another related topic, I think the 'fast' timeout handling is bogus. The
actual delay varies between 0-200ms. I suppose that this is done for
'performance', but having a single [200ms] timeout to transmit all queued
DELACK packets seems wrong to me. I don't know how this affects performance on
a typical LAN, but all the effects I've imagined aren't good (I'm imagining
cases where a few dozen TCP connections are active - large bursts of short ACK
packets become the norm).

>>    Indeed, rfc1122 does say "SHOULD"...which makes this behavior not required.
>> The problem with acking this often is that on high speed, half-duplex networks
>> like ethernet, the collision rate caused by acks this frequently can consume a
>> large amount of the banwidth (measured 10-20%).
>
>That's an interesting point, I'll check it out.  I'd appreciate if
>you have traces or other descriptions of such behavior.

   When my workload lightens a bit, I may be able to due some experiments with
the old code and provide some packet traces. I may also have some old email
around detailing the problem that I could send you...I'll look around. The
easy way to see the effects of this problem is via a simple netstat -i and
look at the total collisions before and after a bulk transfer. ...and of
course it also shows itself in the raw performance numbers.

>The reason I think delayed acks for more than two segments may be a
>problem is that it may adversely affect the congestion control
>algorithms.  Slow start increases cwnd by maxseg for each ack in the
>exponential phase, thus less frequent acks result in slower
>slow-start.  There's something about it in a paper of Lawrence
>Brakmo (ftp.cs.arizona.edu:xkernel/Papers/tcp_problems.ps).  I'll
>try to make some tests to see exactly what goes on.

   That's an interesting point. The future way of the world seems to be in the
direction of full duplex communications [in LANs], so perhaps the problems
with 802.3 ethernet shouldn't be optimized for. On the other hand, the
requirements of LANs are usually quite different than long haul networks;
perhaps there is a sort of compromise that can be acheived.

   I should say that I don't consider myself a TCP expert. I know enough to
get into trouble, but not enough to fully understand the ramifications.

   Sorry for the delay in responding; I was at Usenix and having a lot of
trouble replying to email remotely due to some recently introduced kernel bugs.

-DG