From owner-freebsd-net@FreeBSD.ORG Mon Oct 21 15:15:37 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 890393FC for ; Mon, 21 Oct 2013 15:15:37 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id E825E2E6F for ; Mon, 21 Oct 2013 15:15:36 +0000 (UTC) Received: (qmail 42324 invoked from network); 21 Oct 2013 15:47:36 -0000 Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 21 Oct 2013 15:47:36 -0000 Message-ID: <5265450C.1060601@freebsd.org> Date: Mon, 21 Oct 2013 17:15:24 +0200 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.0.1 MIME-Version: 1.0 To: Colin Percival , Julian Elischer , freebsd-net@freebsd.org Subject: Re: LRO causing stretch ACK violations interacts badly with delayed ACKing References: <52605EC9.6090406@freebsd.org> <526478D0.1000601@freebsd.org> <5264869E.4000308@freebsd.org> In-Reply-To: <5264869E.4000308@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Oct 2013 15:15:37 -0000 On 21.10.2013 03:42, Colin Percival wrote: > On 10/20/13 17:44, Julian Elischer wrote: >> On 10/18/13 6:03 AM, Colin Percival wrote: >>> I know {TSO, LRO, ACKing policy} has been discussed here recently, and I don't >>> want to rehash everything, but I'm seeing some very bad misbehaviour with LRO >>> and delayed ACKing turned on. >>> >>> Running 'fetch -o /dev/null https://www.amazon.com/' on an EC2 instance running >> [...] >> is this just for -current? > > Good question. Turns out that it isn't -- on 9.2 I see a 95.5 ms delayed ACK: > >> 00:00:00.000000 IP 10.142.129.245.59172 > 176.32.98.166.443: Flags [S], seq 3310207763, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 292712 ecr 0], length 0 >> 00:00:00.001031 IP 176.32.98.166.443 > 10.142.129.245.59172: Flags [S.], seq 3504196464, ack 3310207764, win 8190, options [mss 1460,nop,wscale 6], length 0 >> 00:00:00.001139 IP 10.142.129.245.59172 > 176.32.98.166.443: Flags [.], ack 1, win 1026, length 0 >> 00:00:00.002269 IP 176.32.98.166.443 > 10.142.129.245.59172: Flags [.], ack 1, win 127, length 0 >> 00:00:00.002938 IP 10.142.129.245.59172 > 176.32.98.166.443: Flags [P.], seq 1:140, ack 1, win 1026, length 139 >> 00:00:00.003815 IP 176.32.98.166.443 > 10.142.129.245.59172: Flags [.], seq 1:4097, ack 140, win 108, length 4096 >> 00:00:00.099328 IP 10.142.129.245.59172 > 176.32.98.166.443: Flags [.], ack 4097, win 1026, length 0 > > but not on 9.1... although that might just be that LRO isn't happening: >> 00:00:00.000000 IP 10.148.177.92.48728 > 176.32.98.166.443: Flags [S], seq 2729946716, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 64564 ecr 0], length 0 >> 00:00:00.000722 IP 176.32.98.166.443 > 10.148.177.92.48728: Flags [S.], seq 595247561, ack 2729946717, win 8190, options [mss 1460,nop,wscale 6], length 0 >> 00:00:00.000820 IP 10.148.177.92.48728 > 176.32.98.166.443: Flags [.], ack 1, win 1026, length 0 >> 00:00:00.001998 IP 176.32.98.166.443 > 10.148.177.92.48728: Flags [.], ack 1, win 127, length 0 >> 00:00:00.002716 IP 10.148.177.92.48728 > 176.32.98.166.443: Flags [P.], seq 1:140, ack 1, win 1026, length 139 >> 00:00:00.003527 IP 176.32.98.166.443 > 10.148.177.92.48728: Flags [.], ack 140, win 108, length 0 >> 00:00:00.003834 IP 176.32.98.166.443 > 10.148.177.92.48728: Flags [.], seq 1:1461, ack 140, win 108, length 1460 >> 00:00:00.003850 IP 176.32.98.166.443 > 10.148.177.92.48728: Flags [.], seq 1461:2921, ack 140, win 108, length 1460 >> 00:00:00.003870 IP 10.148.177.92.48728 > 176.32.98.166.443: Flags [.], ack 2921, win 981, length 0 >> 00:00:00.003888 IP 176.32.98.166.443 > 10.148.177.92.48728: Flags [P.], seq 2921:4097, ack 140, win 108, length 1176 >> 00:00:00.003973 IP 10.148.177.92.48728 > 176.32.98.166.443: Flags [.], ack 4097, win 1026, length 0 > > I can't find any changes in netfront.c or tcp_lro.c to explain why 9.1 and > 9.2 are behaving differently -- anyone have any ideas? The last time I looked our soft-LRO had a few remaining issues. One of them was that in certain situations reordering may happen with segments that can't be aggregated into a LRO state. The other was that the driver is responsible to manage the flushing of LRO states that haven't seen updates in some time. Most drivers likely don't do that correctly for the simple reason that IIRC never has been a description on how to do that correctly. This may explain why there is so much latency. Normally a LRO state should not wait more than 5-10ms before flushing. Also the total amount of time it can aggregate segments is not limited which can be bad too. IIRC Navdeep did a couple of changes to the Chelsio driver to work around some of these problems. Taking a closer look at tcp_lro.c and fixing these issues is on my todo list but I haven't come by it yet. -- Andre