Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 25 Apr 1998 18:34:10 -0400 (EDT)
From:      Garrett Wollman <wollman@khavrinen.lcs.mit.edu>
To:        Bill Trost <trost@cloud.rain.com>
Cc:        current@FreeBSD.ORG
Subject:   Re: Bandwidth throttling etc. 
Message-ID:  <199804252234.SAA16963@khavrinen.lcs.mit.edu>
In-Reply-To: <908.893476488@cloud.rain.com>
References:  <199804242126.RAA10941@khavrinen.lcs.mit.edu> <Pine.GSO.3.96.980424165606.18437A-100000@echonyc.com> <199804241932.VAA22011@labinfo.iet.unipi.it> <908.893476488@cloud.rain.com>

next in thread | previous in thread | raw e-mail | index | archive | help
<<On Fri, 24 Apr 1998 20:54:48 -0700, Bill Trost <trost@cloud.rain.com> said:

> Before anyone spends oodles of time doing this sort of work, has anyone
> taken the radical (-: step of actually profiling the current network
> stack(s) to see where the time is being eaten?

A large part of the problem is not so much the code itself being
inefficient, but rather the fact that it can be fairly mismatched with
different applications' needs.  The way this has been dealt with in
the past has been to introduce additional options (kluges, really)
which select among various behaviors desired by different
applications.  Two examples, opposite sides of the same coin:

1) Tinygram avoidance.  In most situations, it makes sense to buffer
small transmissions in the hope that they can be combined together,
thus amortizing the header overhead across a larger data space.
Whoops!  It turns out that some very common applications are sending
tinygrams intenionally, because interactive performance requires it.
``Solution'': TCP_NODELAY.

2) Write-boundary preservation.  It is required that sender-TCPs set
the PSH bit when they are done with a transmission, in case the
receiver-TCP is configured to wait for PSH before passing data to
higher layers.  To put it in UNIX terms, it means that TCP tries to
preserve write boundaries when writes are ``large enough''.  Whoops!
It turns out that most applications which write ``enough'' at a time
are simply doing internal buffering of a bulk transfer (like FTP or
HTTP or the DATA phase of SMTP), and it is in fact wasteful to
preserve write boundaries in such transfers, since the write size is
rarely an exact multiple of the path MTU, so you end up sending a
partial packet at the end of each write just so you can transmit the
useless PSH bit.  ``Solution'': TCP_NOPUSH.

The way you really want a TCP to behave is thus:

1) Get explicit indication of the desire for a PSH operation from the
application.  (We could do this with MSG_EOR if only it were available
by the time TCP gets control.)

2) Keep at most one packet's worth of buffering.  (And while you're at
it, set up the strucure which contains that data to be shaped precisely
like a TCP packet, which eliminates a number of state variables which
would have to be stored in the TCB otherwise.)  When you are ready to
send it, you can just do so.

3) Use only TCP's internal transmit flow control (subject to reasonable
resource limits, of course).  Only when TCP says you can send, should
you copy in the user's data---and do it in MSS-sized chunks.  If
there's any left over, and the user hasn't specified a PSH, put it in
the coalesce buffer.

4) Keep a reference to all the packets you have sent -- this is your
retransmission queue and SACK scoreboard all in one.  (This may
require cooperation from the network interface, depending on how its
packet memory is allocated.)

5) Make sure that information about preferred alignments, buffer
sizes, and whatnot is exported to the application, so that it can
assist in this process if desired.

-GAWollman

--
Garrett A. Wollman   | O Siem / We are all family / O Siem / We're all the same
wollman@lcs.mit.edu  | O Siem / The fires of freedom 
Opinions not those of| Dance in the burning flame
MIT, LCS, CRS, or NSA|                     - Susan Aglukark and Chad Irschick

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199804252234.SAA16963>