From owner-freebsd-hackers Wed Jun 25 16:13:19 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id QAA19580 for hackers-outgoing; Wed, 25 Jun 1997 16:13:19 -0700 (PDT) Received: from alpha.xerox.com (alpha.Xerox.COM [13.1.64.93]) by hub.freebsd.org (8.8.5/8.8.5) with SMTP id QAA19572 for ; Wed, 25 Jun 1997 16:13:17 -0700 (PDT) Received: from crevenia.parc.xerox.com ([13.2.116.11]) by alpha.xerox.com with SMTP id <36409(3)>; Wed, 25 Jun 1997 13:13:57 PDT Received: from localhost by crevenia.parc.xerox.com with SMTP id <177513>; Wed, 25 Jun 1997 13:13:48 -0700 To: Kenjiro Cho cc: Chris Csanady , hackers@freebsd.org Subject: Re: TCP/IP bug? Unnecessary fragmentation... In-reply-to: Your message of "Fri, 30 May 97 00:01:00 PDT." <199705300701.QAA12323@hotaka.csl.sony.co.jp> Date: Wed, 25 Jun 1997 13:13:43 PDT From: Bill Fenner Message-Id: <97Jun25.131348pdt.177513@crevenia.parc.xerox.com> Sender: owner-hackers@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Kenjiro Cho wrote: >I think, considering the wide use of TCP, the socket layer should try >to call tcp_usr_send all at once when possible. That would be going backwards, to some extent. Van Jacobson wrote in a 1988 message about upping TCP stack performance: |The biggest single effect was a change to sosend (the routine |between the user "write" syscall and tcp_output). Its loop |looked something like: | | while there is user data & space in the socket buffer | copy from user space to socket | call the protocol "send" routine | |After hooking a scope to our ethernet cable & looking at the |packet spacings, I changed this to | | while there is user data & space in the socket buffer | copy up to 1K (one cluster's worth) from user space to socket | call the protocol "send" routine | |and the throughput jumped from 380 to 456 KB/s (+20%). There's |one school of thought that says the first loop was better |because it minimized the "boundary crossings", the fixed costs |of routine calls and context changes. This same school is |always lobbying for "bigger": bigger packets, bigger windows, |bigger buffers, for essentially the same reason: the bigger |chunks are, the fewer boundary crossings you pay for. The |correct school, mine :-), says there's always a fixed cost and a |variable cost (e.g., the cost of maintaining tcp state and |tacking a tcp packet header on the front of some data is |independent of the amount of data; the cost of filling in the |checksum field in that header scales linearly with the amount of |data). If the size is large enough to make the fixed cost small |compared to the variable cost, making things bigger LOWERS |throughput because you throw away opportunities for parallelism. It's clear that there's a mismatch here but the fix is probably making sosend() allocate mbuf's differently if needed. (Of course, the costs that Van is talking about have clearly changed since the days of the Sun 3/60 that he ran those particular tests on, too.) Bill