From owner-freebsd-current Sat Apr 25 15:34:31 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id PAA03912 for freebsd-current-outgoing; Sat, 25 Apr 1998 15:34:31 -0700 (PDT) (envelope-from owner-freebsd-current@FreeBSD.ORG) Received: from khavrinen.lcs.mit.edu (khavrinen.lcs.mit.edu [18.24.4.193]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id PAA03902 for ; Sat, 25 Apr 1998 15:34:24 -0700 (PDT) (envelope-from wollman@khavrinen.lcs.mit.edu) Received: (from wollman@localhost) by khavrinen.lcs.mit.edu (8.8.8/8.8.8) id SAA16963; Sat, 25 Apr 1998 18:34:10 -0400 (EDT) (envelope-from wollman) Date: Sat, 25 Apr 1998 18:34:10 -0400 (EDT) From: Garrett Wollman Message-Id: <199804252234.SAA16963@khavrinen.lcs.mit.edu> To: Bill Trost Cc: current@FreeBSD.ORG Subject: Re: Bandwidth throttling etc. In-Reply-To: <908.893476488@cloud.rain.com> References: <199804242126.RAA10941@khavrinen.lcs.mit.edu> <199804241932.VAA22011@labinfo.iet.unipi.it> <908.893476488@cloud.rain.com> Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG < said: > Before anyone spends oodles of time doing this sort of work, has anyone > taken the radical (-: step of actually profiling the current network > stack(s) to see where the time is being eaten? A large part of the problem is not so much the code itself being inefficient, but rather the fact that it can be fairly mismatched with different applications' needs. The way this has been dealt with in the past has been to introduce additional options (kluges, really) which select among various behaviors desired by different applications. Two examples, opposite sides of the same coin: 1) Tinygram avoidance. In most situations, it makes sense to buffer small transmissions in the hope that they can be combined together, thus amortizing the header overhead across a larger data space. Whoops! It turns out that some very common applications are sending tinygrams intenionally, because interactive performance requires it. ``Solution'': TCP_NODELAY. 2) Write-boundary preservation. It is required that sender-TCPs set the PSH bit when they are done with a transmission, in case the receiver-TCP is configured to wait for PSH before passing data to higher layers. To put it in UNIX terms, it means that TCP tries to preserve write boundaries when writes are ``large enough''. Whoops! It turns out that most applications which write ``enough'' at a time are simply doing internal buffering of a bulk transfer (like FTP or HTTP or the DATA phase of SMTP), and it is in fact wasteful to preserve write boundaries in such transfers, since the write size is rarely an exact multiple of the path MTU, so you end up sending a partial packet at the end of each write just so you can transmit the useless PSH bit. ``Solution'': TCP_NOPUSH. The way you really want a TCP to behave is thus: 1) Get explicit indication of the desire for a PSH operation from the application. (We could do this with MSG_EOR if only it were available by the time TCP gets control.) 2) Keep at most one packet's worth of buffering. (And while you're at it, set up the strucure which contains that data to be shaped precisely like a TCP packet, which eliminates a number of state variables which would have to be stored in the TCB otherwise.) When you are ready to send it, you can just do so. 3) Use only TCP's internal transmit flow control (subject to reasonable resource limits, of course). Only when TCP says you can send, should you copy in the user's data---and do it in MSS-sized chunks. If there's any left over, and the user hasn't specified a PSH, put it in the coalesce buffer. 4) Keep a reference to all the packets you have sent -- this is your retransmission queue and SACK scoreboard all in one. (This may require cooperation from the network interface, depending on how its packet memory is allocated.) 5) Make sure that information about preferred alignments, buffer sizes, and whatnot is exported to the application, so that it can assist in this process if desired. -GAWollman -- Garrett A. Wollman | O Siem / We are all family / O Siem / We're all the same wollman@lcs.mit.edu | O Siem / The fires of freedom Opinions not those of| Dance in the burning flame MIT, LCS, CRS, or NSA| - Susan Aglukark and Chad Irschick To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message