Date: Sun, 2 Dec 2001 21:36:38 -0800 (PST) From: Matthew Dillon <dillon@apollo.backplane.com> To: Luigi Rizzo <rizzo@aciri.org> Cc: Richard Sharpe <sharpe@ns.aus.com>, freebsd-hackers@FreeBSD.ORG Subject: Re: Patch #3 (TCP / Linux / Performance) Message-ID: <200112030536.fB35ac395075@apollo.backplane.com> References: <20011128153817.T61580@monorchid.lemis.com> <15364.38174.938500.946169@caddis.yogotech.com> <20011128104629.A43642@walton.maths.tcd.ie> <5.1.0.14.1.20011130181236.00a80160@postamt1.charite.de> <200111302047.fAUKlT811090@apollo.backplane.com> <200111302130.fAULUU324648@apollo.backplane.com> <3C08CF9D.2030109@ns.aus.com> <200112012138.fB1LcG837063@apollo.backplane.com> <200112020810.fB28Arr77757@apollo.backplane.com> <20011202204702.A54149@iguana.aciri.org>
next in thread | previous in thread | raw e-mail | index | archive | help
:curious, as the loopback's MTU is normally 16384.
:Also, any idea on where does the 4096 limit (1460*2+1176) come from ?
:
: cheers
: luigi
It comes from the size of an mbuf, which is 2K. If you are trying to
send 4100 bytes of data what winds up happening is this:
* construct 2048 byte mbuf and queue (TF_MORETOCOME set)
1460 byte packet gets pushed out
* construct 2048 byte mbuf and queue (TF_MORETOCOME set)
1460 byte packet gets pushed out
(1172 bytes left over in mbuf)
<<--- ack is received (semi synchronous)
1172 bytes in transmit buffer are pushed out due to the ack
* construct 4 byte mbuf and queue (TF_MORETOCOME clear)
4 bytes is pushed out due to TCP_NOWAIT being set.
There are two localhost MTUs. If you use 'localhost' the MTU is 16384.
If you use the IP address of an ethernet interface on the machine the
MTU winds up being 1500 even though it is effectively a localhost
connection. An MTU of 1500 generates the 1460 byte push-outs.
However, even with an MTU of 16384 you still have the same problem when
sending, say, 16384+2052 bytes of data. After it pushed out a 16384 byte
segment it winds up with 2048 bytes queued in the mbuf and a
received ack (again, semi synchronous because this is localhost) will
cause it to push out the 2048 bytes prematurely, before the last 4 bytes
can get queued.
What we need is a mechanism in the tcp_input() code to NOT call
tcp_output() when an ACK is received, under certain circumstances.
I was thinking of taking the TF_MORETOCOME flag and causing it to be
left set for the duration of the write (except for the last sub-write).
At the moment it is set and cleared for each sub-write and the ack wiggles
its way in while it happens to be clear. In anycase, this would all
tcp_input() to skip calling tcp_output() prematurely. But it isn't so
easy to implement since the TF_ flags are in the 'tp' structure, not
the 'so' socket structure, and higher levels do not have direct access
to the tcp-specific 'tp' structure.
-Matt
Matthew Dillon
<dillon@backplane.com>
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200112030536.fB35ac395075>
