From owner-freebsd-hackers Fri May 30 00:01:32 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id AAA02881 for hackers-outgoing; Fri, 30 May 1997 00:01:32 -0700 (PDT) Received: from inetfw.sonycsl.co.jp (inetfw.sonycsl.co.jp [203.137.129.4]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id AAA02876 for ; Fri, 30 May 1997 00:01:29 -0700 (PDT) Received: from hotaka.csl.sony.co.jp (hotaka.csl.sony.co.jp [43.27.98.57]) by inetfw.sonycsl.co.jp (8.8.3/3.5W) with ESMTP id HAA28266; Fri, 30 May 1997 07:01:16 GMT Received: from localhost (localhost [127.0.0.1]) by hotaka.csl.sony.co.jp (8.8.4/3.3W3) with ESMTP id QAA12323; Fri, 30 May 1997 16:01:01 +0900 (JST) Message-Id: <199705300701.QAA12323@hotaka.csl.sony.co.jp> To: Chris Csanady cc: hackers@FreeBSD.ORG Subject: Re: TCP/IP bug? Unnecessary fragmentation... In-reply-to: Your message of "Thu, 29 May 1997 15:44:30 EST." <199705292044.PAA04544@friley01.res.iastate.edu> Date: Fri, 30 May 1997 16:01:00 +0900 From: Kenjiro Cho Sender: owner-hackers@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk Chris Csanady said: >> Another oddity is that if you close the connection immediately, for the >> same data sizes that it does this odd fragmentation (100 < S < 208), >> the FIN is piggybacked on the 2nd data segment. This doesn't happen >> with any other data size. >> I have some graphs, and I believe that it does it at many other sizes too. >> The performance hit is _much_ more that just mbuf/cluster handling should >> impose it would seem. It seems to do it up around 2000, 2100(?), 4000... >> Is this a generic 44BSD problem, or is it FreeBSD specific? It is not a bug but caused by a mismatch of the mbuf allocation and TCP, which is common to *BSD systems. The mbuf allocation algorithm is that if the data size is less than 2 MLENs (100 bytes for the first one, 108 bytes otherwise), allocate 2 mbufs. If the data size is larger than 2 MLENs, allocate a 2KB mbuf cluster. So, 2 small mbufs will be allocated when you send: 100 < len <= 208 or n * 2K + 108 < len <= n * 2K + 216 tcp_usr_send is called for each mbuf. The Nagle algorithm of TCP prevents the sender from sending a small packet when outstanding packets have not yet been acknowledged. Thus, the 2nd packet won't be sent until the ack of the 1st packet comes back. To make matters worse, the ack of the 1st packet will be delayed up to 200ms on the receiver side by the delayed ack mechanism. I think, considering the wide use of TCP, the socket layer should try to call tcp_usr_send all at once when possible. --kj