From owner-freebsd-net@FreeBSD.ORG Wed Aug 14 16:05:12 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id E040396B; Wed, 14 Aug 2013 16:05:11 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id BDE6E245C; Wed, 14 Aug 2013 16:05:11 +0000 (UTC) Received: from jre-mbp.elischer.org (etroy.elischer.org [121.45.226.51]) (authenticated bits=0) by vps1.elischer.org (8.14.7/8.14.6) with ESMTP id r7EG56r7018504 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Wed, 14 Aug 2013 09:05:09 -0700 (PDT) (envelope-from julian@freebsd.org) Message-ID: <520BAAAC.8070707@freebsd.org> Date: Thu, 15 Aug 2013 00:05:00 +0800 From: Julian Elischer User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:17.0) Gecko/20130801 Thunderbird/17.0.8 MIME-Version: 1.0 To: Lawrence Stewart Subject: Re: TSO and FreeBSD vs Linux References: <520A6D07.5080106@freebsd.org> <520AFBE8.1090109@freebsd.org> <520B24A0.4000706@freebsd.org> In-Reply-To: <520B24A0.4000706@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: FreeBSD Net X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Aug 2013 16:05:12 -0000 On 8/14/13 2:33 PM, Julian Elischer wrote: > On 8/14/13 11:39 AM, Lawrence Stewart wrote: > There's a thing controlled by ethtool called GRO (generic receive >> offload) which appears to be enabled by default on at least Ubuntu >> and I >> guess other Linux's too. It's responsible for aggregating ACKs and >> data >> to batch them up the stack if the driver doesn't provide a hardware >> offload implementation. Try rerunning your experiments with the ACK >> batching disabled on the Linux host to get an additional comparison >> point. > I will try that as soon as I get back to the machines in question. turning on and off GRO seems to make no difference, either at the overall throughput level or at the low level packet-by-packet level (according to tcptrace). >>> for two examples look at: >>> >>> >>> http://www.freebsd.org/~julian/LvsF-tcp-start.tiff >>> and >>> http://www.freebsd.org/~julian/LvsF-tcp.tiff >>> >>> in each case, we can see FreeBSD on the left and Linux on the right. >>> >>> The first case shows the case as the sessions start, and the >>> second case >>> shows >>> some distance later (when the sequence numbers wrap around.. no >>> particular >>> reason to use that, it was just fun to see). >>> In both cases you can see that each Linux packet (white)(once they >>> have got >>> going) is responding to multiple bumps in the send window sequence >>> number (green and yellow lines) (representing the arrival of >>> several ACKs) >>> while FreeBSD produces a whole bunch of smaller packets, slavishly >>> following >>> exactly the size of each incoming ack.. This gives us quite a >>> performance debt. >> Again, please s/performance/what-you-really-mean/ here. > ok, In my tests this makes FreeBSD data transfers much slower, by as > much as 60%. >> >>> Notice that this behaviour in Linux seems to be modal.. it seems to >>> 'switch on' a little bit >>> into the 'starting' trace. >>> >>> In addition, you can see also that Linux gets going faster even in >>> the >>> beginning where >>> TSO isn't in play, by sending a lot more packets up-front. (of course >>> the wisdom of this >>> can be argued). >> They switched to using an initial window of 10 segments some time ago. >> FreeBSD starts with 3 or more recently, 10 if you're running recent >> 9-STABLE or 10-CURRENT. > I tried setting initial values as shown: > net.inet.tcp.local_slowstart_flightsize: 10 > net.inet.tcp.slowstart_flightsize: 10 > it didn't seem to make too much difference but I will redo the test. > >> >>> Has anyone done any work on aggregating ACKs, or delaying >>> responding to >>> them? >> As noted by Navdeep, we already have the code to aggregate ACKs in our >> software LRO implementation. The bigger problem is that appropriate >> byte >> counting places a default 2*MSS limit on the amount of ACKed data the >> window can grow by i.e. if an ACK for 64k of data comes up the stack, >> we'll grow the window by 2 segments worth of data in response. That >> needs to be addressed - we could send the ACK count up with the >> aggregated single ACK or just ignore abc_l_var when LRO is in use >> for a >> connection. > so, does "Software LRO" mean that LRO on hte NIC should be ON or OFF > to see this? > > >> >> Cheers, >> Lawrence >> >> > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >