From owner-freebsd-hackers Sun Dec 2 21:36:42 2001 Delivered-To: freebsd-hackers@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by hub.freebsd.org (Postfix) with ESMTP id 313D537B416 for ; Sun, 2 Dec 2001 21:36:39 -0800 (PST) Received: (from dillon@localhost) by apollo.backplane.com (8.11.6/8.9.1) id fB35ac395075; Sun, 2 Dec 2001 21:36:38 -0800 (PST) (envelope-from dillon) Date: Sun, 2 Dec 2001 21:36:38 -0800 (PST) From: Matthew Dillon Message-Id: <200112030536.fB35ac395075@apollo.backplane.com> To: Luigi Rizzo Cc: Richard Sharpe , freebsd-hackers@FreeBSD.ORG Subject: Re: Patch #3 (TCP / Linux / Performance) References: <20011128153817.T61580@monorchid.lemis.com> <15364.38174.938500.946169@caddis.yogotech.com> <20011128104629.A43642@walton.maths.tcd.ie> <5.1.0.14.1.20011130181236.00a80160@postamt1.charite.de> <200111302047.fAUKlT811090@apollo.backplane.com> <200111302130.fAULUU324648@apollo.backplane.com> <3C08CF9D.2030109@ns.aus.com> <200112012138.fB1LcG837063@apollo.backplane.com> <200112020810.fB28Arr77757@apollo.backplane.com> <20011202204702.A54149@iguana.aciri.org> Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG :curious, as the loopback's MTU is normally 16384. :Also, any idea on where does the 4096 limit (1460*2+1176) come from ? : : cheers : luigi It comes from the size of an mbuf, which is 2K. If you are trying to send 4100 bytes of data what winds up happening is this: * construct 2048 byte mbuf and queue (TF_MORETOCOME set) 1460 byte packet gets pushed out * construct 2048 byte mbuf and queue (TF_MORETOCOME set) 1460 byte packet gets pushed out (1172 bytes left over in mbuf) <<--- ack is received (semi synchronous) 1172 bytes in transmit buffer are pushed out due to the ack * construct 4 byte mbuf and queue (TF_MORETOCOME clear) 4 bytes is pushed out due to TCP_NOWAIT being set. There are two localhost MTUs. If you use 'localhost' the MTU is 16384. If you use the IP address of an ethernet interface on the machine the MTU winds up being 1500 even though it is effectively a localhost connection. An MTU of 1500 generates the 1460 byte push-outs. However, even with an MTU of 16384 you still have the same problem when sending, say, 16384+2052 bytes of data. After it pushed out a 16384 byte segment it winds up with 2048 bytes queued in the mbuf and a received ack (again, semi synchronous because this is localhost) will cause it to push out the 2048 bytes prematurely, before the last 4 bytes can get queued. What we need is a mechanism in the tcp_input() code to NOT call tcp_output() when an ACK is received, under certain circumstances. I was thinking of taking the TF_MORETOCOME flag and causing it to be left set for the duration of the write (except for the last sub-write). At the moment it is set and cleared for each sub-write and the ack wiggles its way in while it happens to be clear. In anycase, this would all tcp_input() to skip calling tcp_output() prematurely. But it isn't so easy to implement since the TF_ flags are in the 'tp' structure, not the 'so' socket structure, and higher levels do not have direct access to the tcp-specific 'tp' structure. -Matt Matthew Dillon To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message