Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 11 Oct 1999 03:59:49 -0700 (PDT)
From:      Alfred Perlstein <bright@wintelcom.net>
To:        Mohit Aron <aron@cs.rice.edu>
Cc:        freebsd-hackers@FreeBSD.ORG
Subject:   work in progress, (was Re: sbappend() is not scalable)
Message-ID:  <Pine.BSF.4.05.9910110331040.8080-100000@fw.wintelcom.net>
In-Reply-To: <199910082051.PAA25028@cs.rice.edu>

next in thread | previous in thread | raw e-mail | index | archive | help

On Fri, 8 Oct 1999, Mohit Aron wrote:

> Hi,
> 	I recently did some experiments with TCP over a high b/w-delay path
> and found a scalability problem in sbappend(). The experimental setup
> consisted of a 100Mbps network with a round-trip delay of 100ms. Under this
> situation, FreeBSD's TCP version is incapable of attaining more than 65 Mbps
> on a 300MHz Pentium II - even without slow-start.
> 
> I tracked down the problem to sbappend() - the routine that appends user data
> into the socket buffers for network transmission. Every time a TCP ACK 
> acknowledges some data, space is created in the socket buffer that permits
> more data to be appended. Unfortunately, the implementation does not maintain
> a pointer to the end of the list of mbufs in the socket buffer. Thus each 
> time any data is added, the whole list of mbufs is traversed to reach the 
> very end where the data is added. Since the b/w-delay product is large, there
> can be about 600 mbufs in the socket buffer waiting to be acknowledged. Thus
> upon every ACK, about 600 mbufs are traversed causing the TCP sender to run 
> out of CPU.
> 
> The problem is not limited only to high b/w networks - it is also present in
> long latency paths (satellite links). Thus a server transferring a large file
> over a satellite link can spend lot of CPU due to the above problem.
> 
> Hope the problem shall be fixed in future releases,

I started work on this, addmittedly i'm pretty new to the uipc code
and right now I have some work done towards this:

http://www.freebsd.org/~alfred/sockbuf3.diff

(pre green's socketbuf limiting stuff)

however it panics the box if you send a lot of data, a good way
to have it blow up is to "ls -lR /" through telnet.  It's also
pretty verbose with debug printfs.

It panics when tcp_output does an mcopy with invalid parameters, it
seems that sb_mb is getting set to NULL somehow (my new sbcompress
may be the culpret)

the reason i'm posting it is that i'm tired and and hoping to wake
up with a email saying "here just fix line xxx of zzz" :)

the patches also address (or try to address) a flaw in the sbcompress() 
function, right now it always tries to copy mbuf 'backwards' my patch
tries to do a copy forward if it can.

personally i don't like sbcompress I'm interested in what people think
about making it 'lazy' the algorithm would work like so:

on sbcompress, 
walk the mbuf list free'ing empty bufs (already done)
note any places where a copy would work to compress, but instead
of compressing, just update a counter in the socketbuf.
if sbcompress notices the that the amount of "fragmanentation"
has exceeded a certain level then it will walk the entire
socket compressing it and reset the counters.

It would also be interesting to vary how sbcompress works based
on the amount of free mbufs in the system (using phk's 
green/yellow/red state to determine what to do)

either way this would _really_ help with short lived sockets
that are transmitting small amounts of data.

the only problem is that i'm not sure if it's ok to mess with
the mbufs after they've been put into the socketbuffer because
someone else my be holding a reference to it.

comments?

the patch also adds a whole lot of comments, and removes some
useless casts and changes a lot of m = 0 to m = NULL.

And if anyone made it this far, :) do you happen to know what
the #ifdef notyet along with mcopypack stuff is for?

-Alfred



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.05.9910110331040.8080-100000>