From owner-freebsd-hackers Mon Oct 11 8:15:20 1999 Delivered-To: freebsd-hackers@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id 80C9B151F5 for ; Mon, 11 Oct 1999 08:15:18 -0700 (PDT) (envelope-from bright@wintelcom.net) Received: from localhost (bright@localhost) by fw.wintelcom.net (8.9.3/8.9.3) with ESMTP id IAA10976; Mon, 11 Oct 1999 08:35:00 -0700 (PDT) Date: Mon, 11 Oct 1999 08:35:00 -0700 (PDT) From: Alfred Perlstein To: Mohit Aron Cc: freebsd-hackers@FreeBSD.ORG Subject: testable! Re: work in progress, (was Re: sbappend() is not scalable) In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG replying to my own message. :) I have a newer version of the patch, it doesn't look like it's panic'ing but it really needs some serious testing. http://www.freebsd.org/~alfred/sockbuf4.diff If you are running -current and know how to get a crashdump please give this a whirl and tell me what it does for you. thanks, -Alfred Perlstein - [bright@rush.net|alfred@freebsd.org] Wintelcom systems administrator and programmer - http://www.wintelcom.net/ [bright@wintelcom.net] On Mon, 11 Oct 1999, Alfred Perlstein wrote: > > On Fri, 8 Oct 1999, Mohit Aron wrote: > > > Hi, > > I recently did some experiments with TCP over a high b/w-delay path > > and found a scalability problem in sbappend(). The experimental setup > > consisted of a 100Mbps network with a round-trip delay of 100ms. Under this > > situation, FreeBSD's TCP version is incapable of attaining more than 65 Mbps > > on a 300MHz Pentium II - even without slow-start. > > > > I tracked down the problem to sbappend() - the routine that appends user data > > into the socket buffers for network transmission. Every time a TCP ACK > > acknowledges some data, space is created in the socket buffer that permits > > more data to be appended. Unfortunately, the implementation does not maintain > > a pointer to the end of the list of mbufs in the socket buffer. Thus each > > time any data is added, the whole list of mbufs is traversed to reach the > > very end where the data is added. Since the b/w-delay product is large, there > > can be about 600 mbufs in the socket buffer waiting to be acknowledged. Thus > > upon every ACK, about 600 mbufs are traversed causing the TCP sender to run > > out of CPU. > > > > The problem is not limited only to high b/w networks - it is also present in > > long latency paths (satellite links). Thus a server transferring a large file > > over a satellite link can spend lot of CPU due to the above problem. > > > > Hope the problem shall be fixed in future releases, > > I started work on this, addmittedly i'm pretty new to the uipc code > and right now I have some work done towards this: > > http://www.freebsd.org/~alfred/sockbuf3.diff > > (pre green's socketbuf limiting stuff) > > however it panics the box if you send a lot of data, a good way > to have it blow up is to "ls -lR /" through telnet. It's also > pretty verbose with debug printfs. > > It panics when tcp_output does an mcopy with invalid parameters, it > seems that sb_mb is getting set to NULL somehow (my new sbcompress > may be the culpret) > > the reason i'm posting it is that i'm tired and and hoping to wake > up with a email saying "here just fix line xxx of zzz" :) > > the patches also address (or try to address) a flaw in the sbcompress() > function, right now it always tries to copy mbuf 'backwards' my patch > tries to do a copy forward if it can. > > personally i don't like sbcompress I'm interested in what people think > about making it 'lazy' the algorithm would work like so: > > on sbcompress, > walk the mbuf list free'ing empty bufs (already done) > note any places where a copy would work to compress, but instead > of compressing, just update a counter in the socketbuf. > if sbcompress notices the that the amount of "fragmanentation" > has exceeded a certain level then it will walk the entire > socket compressing it and reset the counters. > > It would also be interesting to vary how sbcompress works based > on the amount of free mbufs in the system (using phk's > green/yellow/red state to determine what to do) > > either way this would _really_ help with short lived sockets > that are transmitting small amounts of data. > > the only problem is that i'm not sure if it's ok to mess with > the mbufs after they've been put into the socketbuffer because > someone else my be holding a reference to it. > > comments? > > the patch also adds a whole lot of comments, and removes some > useless casts and changes a lot of m = 0 to m = NULL. > > And if anyone made it this far, :) do you happen to know what > the #ifdef notyet along with mcopypack stuff is for? > > -Alfred > > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message