Date: Mon, 3 Dec 2001 02:07:27 -0500 (EST) From: Mike Silbersack <silby@silby.com> To: Matthew Dillon <dillon@apollo.backplane.com> Cc: Luigi Rizzo <rizzo@aciri.org>, Richard Sharpe <sharpe@ns.aus.com>, <freebsd-hackers@FreeBSD.ORG> Subject: Re: Patch #3 (TCP / Linux / Performance) Message-ID: <Pine.BSF.4.30.0112030205080.46337-100000@niwun.pair.com> In-Reply-To: <200112030536.fB35ac395075@apollo.backplane.com>
next in thread | previous in thread | raw e-mail | index | archive | help
This part of the thread sounds really familiar. I recall someone coming up with a patch for this a few weeks ago, possibly committing it to -current. I'm too tired and it's too late, though; I'll look for it tomorrow if Matt doesn't find the thread in the archives first. Mike "Silby" Silbersack On Sun, 2 Dec 2001, Matthew Dillon wrote: > > :curious, as the loopback's MTU is normally 16384. > :Also, any idea on where does the 4096 limit (1460*2+1176) come from ? > : > : cheers > : luigi > > It comes from the size of an mbuf, which is 2K. If you are trying to > send 4100 bytes of data what winds up happening is this: > > * construct 2048 byte mbuf and queue (TF_MORETOCOME set) > 1460 byte packet gets pushed out > * construct 2048 byte mbuf and queue (TF_MORETOCOME set) > 1460 byte packet gets pushed out > (1172 bytes left over in mbuf) > <<--- ack is received (semi synchronous) > 1172 bytes in transmit buffer are pushed out due to the ack > * construct 4 byte mbuf and queue (TF_MORETOCOME clear) > 4 bytes is pushed out due to TCP_NOWAIT being set. > > There are two localhost MTUs. If you use 'localhost' the MTU is 16384. > If you use the IP address of an ethernet interface on the machine the > MTU winds up being 1500 even though it is effectively a localhost > connection. An MTU of 1500 generates the 1460 byte push-outs. > > However, even with an MTU of 16384 you still have the same problem when > sending, say, 16384+2052 bytes of data. After it pushed out a 16384 byte > segment it winds up with 2048 bytes queued in the mbuf and a > received ack (again, semi synchronous because this is localhost) will > cause it to push out the 2048 bytes prematurely, before the last 4 bytes > can get queued. > > What we need is a mechanism in the tcp_input() code to NOT call > tcp_output() when an ACK is received, under certain circumstances. > I was thinking of taking the TF_MORETOCOME flag and causing it to be > left set for the duration of the write (except for the last sub-write). > At the moment it is set and cleared for each sub-write and the ack wiggles > its way in while it happens to be clear. In anycase, this would all > tcp_input() to skip calling tcp_output() prematurely. But it isn't so > easy to implement since the TF_ flags are in the 'tp' structure, not > the 'so' socket structure, and higher levels do not have direct access > to the tcp-specific 'tp' structure. > > -Matt > Matthew Dillon > <dillon@backplane.com> > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-hackers" in the body of the message > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.30.0112030205080.46337-100000>