From owner-freebsd-net@FreeBSD.ORG Tue Aug 13 17:29:54 2013 Return-Path: Delivered-To: net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 41A056D for ; Tue, 13 Aug 2013 17:29:54 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 170DA2684 for ; Tue, 13 Aug 2013 17:29:53 +0000 (UTC) Received: from Julian-MBP3.local (etroy.elischer.org [121.45.226.51]) (authenticated bits=0) by vps1.elischer.org (8.14.7/8.14.6) with ESMTP id r7DHTnWO013577 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Tue, 13 Aug 2013 10:29:52 -0700 (PDT) (envelope-from julian@freebsd.org) Message-ID: <520A6D07.5080106@freebsd.org> Date: Wed, 14 Aug 2013 01:29:43 +0800 From: Julian Elischer User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:17.0) Gecko/20130801 Thunderbird/17.0.8 MIME-Version: 1.0 To: FreeBSD Net Subject: TSO and FreeBSD vs Linux Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Aug 2013 17:29:54 -0000 I have been tracking down a performance embarrassment on AMAZON EC2 and have found it I think. Our OS cousins over at Linux land have implemented some interesting behaviour when TSO is in use. They seem to aggregate ACKS when there is a lot of traffic so that they can create the largest possible TSO packet. We on the other hand respond to each and every returning ACK, as it arrives and thus generally fall into the behaviour of sending a bunch of small packets, the size of each ack. for two examples look at: http://www.freebsd.org/~julian/LvsF-tcp-start.tiff and http://www.freebsd.org/~julian/LvsF-tcp.tiff in each case, we can see FreeBSD on the left and Linux on the right. The first case shows the case as the sessions start, and the second case shows some distance later (when the sequence numbers wrap around.. no particular reason to use that, it was just fun to see). In both cases you can see that each Linux packet (white)(once they have got going) is responding to multiple bumps in the send window sequence number (green and yellow lines) (representing the arrival of several ACKs) while FreeBSD produces a whole bunch of smaller packets, slavishly following exactly the size of each incoming ack.. This gives us quite a performance debt. Notice that this behaviour in Linux seems to be modal.. it seems to 'switch on' a little bit into the 'starting' trace. In addition, you can see also that Linux gets going faster even in the beginning where TSO isn't in play, by sending a lot more packets up-front. (of course the wisdom of this can be argued). Has anyone done any work on aggregating ACKs, or delaying responding to them? Julian (Who's suspecting he's about to find out more about TSO and the send path, than he ever wanted to).