Date: Sun, 2 Dec 2001 21:36:38 -0800 (PST) From: Matthew Dillon <dillon@apollo.backplane.com> To: Luigi Rizzo <rizzo@aciri.org> Cc: Richard Sharpe <sharpe@ns.aus.com>, freebsd-hackers@FreeBSD.ORG Subject: Re: Patch #3 (TCP / Linux / Performance) Message-ID: <200112030536.fB35ac395075@apollo.backplane.com> References: <20011128153817.T61580@monorchid.lemis.com> <15364.38174.938500.946169@caddis.yogotech.com> <20011128104629.A43642@walton.maths.tcd.ie> <5.1.0.14.1.20011130181236.00a80160@postamt1.charite.de> <200111302047.fAUKlT811090@apollo.backplane.com> <200111302130.fAULUU324648@apollo.backplane.com> <3C08CF9D.2030109@ns.aus.com> <200112012138.fB1LcG837063@apollo.backplane.com> <200112020810.fB28Arr77757@apollo.backplane.com> <20011202204702.A54149@iguana.aciri.org>
next in thread | previous in thread | raw e-mail | index | archive | help
:curious, as the loopback's MTU is normally 16384. :Also, any idea on where does the 4096 limit (1460*2+1176) come from ? : : cheers : luigi It comes from the size of an mbuf, which is 2K. If you are trying to send 4100 bytes of data what winds up happening is this: * construct 2048 byte mbuf and queue (TF_MORETOCOME set) 1460 byte packet gets pushed out * construct 2048 byte mbuf and queue (TF_MORETOCOME set) 1460 byte packet gets pushed out (1172 bytes left over in mbuf) <<--- ack is received (semi synchronous) 1172 bytes in transmit buffer are pushed out due to the ack * construct 4 byte mbuf and queue (TF_MORETOCOME clear) 4 bytes is pushed out due to TCP_NOWAIT being set. There are two localhost MTUs. If you use 'localhost' the MTU is 16384. If you use the IP address of an ethernet interface on the machine the MTU winds up being 1500 even though it is effectively a localhost connection. An MTU of 1500 generates the 1460 byte push-outs. However, even with an MTU of 16384 you still have the same problem when sending, say, 16384+2052 bytes of data. After it pushed out a 16384 byte segment it winds up with 2048 bytes queued in the mbuf and a received ack (again, semi synchronous because this is localhost) will cause it to push out the 2048 bytes prematurely, before the last 4 bytes can get queued. What we need is a mechanism in the tcp_input() code to NOT call tcp_output() when an ACK is received, under certain circumstances. I was thinking of taking the TF_MORETOCOME flag and causing it to be left set for the duration of the write (except for the last sub-write). At the moment it is set and cleared for each sub-write and the ack wiggles its way in while it happens to be clear. In anycase, this would all tcp_input() to skip calling tcp_output() prematurely. But it isn't so easy to implement since the TF_ flags are in the 'tp' structure, not the 'so' socket structure, and higher levels do not have direct access to the tcp-specific 'tp' structure. -Matt Matthew Dillon <dillon@backplane.com> To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200112030536.fB35ac395075>