From owner-freebsd-net@FreeBSD.ORG  Mon Oct 21 15:15:37 2013
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTP id 890393FC
 for <freebsd-net@freebsd.org>; Mon, 21 Oct 2013 15:15:37 +0000 (UTC)
 (envelope-from andre@freebsd.org)
Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id E825E2E6F
 for <freebsd-net@freebsd.org>; Mon, 21 Oct 2013 15:15:36 +0000 (UTC)
Received: (qmail 42324 invoked from network); 21 Oct 2013 15:47:36 -0000
Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2])
 (envelope-sender <andre@freebsd.org>)
 by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP
 for <cperciva@freebsd.org>; 21 Oct 2013 15:47:36 -0000
Message-ID: <5265450C.1060601@freebsd.org>
Date: Mon, 21 Oct 2013 17:15:24 +0200
From: Andre Oppermann <andre@freebsd.org>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
 rv:24.0) Gecko/20100101 Thunderbird/24.0.1
MIME-Version: 1.0
To: Colin Percival <cperciva@freebsd.org>, 
 Julian Elischer <julian@freebsd.org>, freebsd-net@freebsd.org
Subject: Re: LRO causing stretch ACK violations interacts badly with delayed
 ACKing
References: <52605EC9.6090406@freebsd.org> <526478D0.1000601@freebsd.org>
 <5264869E.4000308@freebsd.org>
In-Reply-To: <5264869E.4000308@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 21 Oct 2013 15:15:37 -0000

On 21.10.2013 03:42, Colin Percival wrote:
> On 10/20/13 17:44, Julian Elischer wrote:
>> On 10/18/13 6:03 AM, Colin Percival wrote:
>>> I know {TSO, LRO, ACKing policy} has been discussed here recently, and I don't
>>> want to rehash everything, but I'm seeing some very bad misbehaviour with LRO
>>> and delayed ACKing turned on.
>>>
>>> Running 'fetch -o /dev/null https://www.amazon.com/' on an EC2 instance running
>> [...]
>> is this just for -current?
>
> Good question.  Turns out that it isn't -- on 9.2 I see a 95.5 ms delayed ACK:
>
>> 00:00:00.000000 IP 10.142.129.245.59172 > 176.32.98.166.443: Flags [S], seq 3310207763, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 292712 ecr 0], length 0
>> 00:00:00.001031 IP 176.32.98.166.443 > 10.142.129.245.59172: Flags [S.], seq 3504196464, ack 3310207764, win 8190, options [mss 1460,nop,wscale 6], length 0
>> 00:00:00.001139 IP 10.142.129.245.59172 > 176.32.98.166.443: Flags [.], ack 1, win 1026, length 0
>> 00:00:00.002269 IP 176.32.98.166.443 > 10.142.129.245.59172: Flags [.], ack 1, win 127, length 0
>> 00:00:00.002938 IP 10.142.129.245.59172 > 176.32.98.166.443: Flags [P.], seq 1:140, ack 1, win 1026, length 139
>> 00:00:00.003815 IP 176.32.98.166.443 > 10.142.129.245.59172: Flags [.], seq 1:4097, ack 140, win 108, length 4096
>> 00:00:00.099328 IP 10.142.129.245.59172 > 176.32.98.166.443: Flags [.], ack 4097, win 1026, length 0
>
> but not on 9.1... although that might just be that LRO isn't happening:
>> 00:00:00.000000 IP 10.148.177.92.48728 > 176.32.98.166.443: Flags [S], seq 2729946716, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 64564 ecr 0], length 0
>> 00:00:00.000722 IP 176.32.98.166.443 > 10.148.177.92.48728: Flags [S.], seq 595247561, ack 2729946717, win 8190, options [mss 1460,nop,wscale 6], length 0
>> 00:00:00.000820 IP 10.148.177.92.48728 > 176.32.98.166.443: Flags [.], ack 1, win 1026, length 0
>> 00:00:00.001998 IP 176.32.98.166.443 > 10.148.177.92.48728: Flags [.], ack 1, win 127, length 0
>> 00:00:00.002716 IP 10.148.177.92.48728 > 176.32.98.166.443: Flags [P.], seq 1:140, ack 1, win 1026, length 139
>> 00:00:00.003527 IP 176.32.98.166.443 > 10.148.177.92.48728: Flags [.], ack 140, win 108, length 0
>> 00:00:00.003834 IP 176.32.98.166.443 > 10.148.177.92.48728: Flags [.], seq 1:1461, ack 140, win 108, length 1460
>> 00:00:00.003850 IP 176.32.98.166.443 > 10.148.177.92.48728: Flags [.], seq 1461:2921, ack 140, win 108, length 1460
>> 00:00:00.003870 IP 10.148.177.92.48728 > 176.32.98.166.443: Flags [.], ack 2921, win 981, length 0
>> 00:00:00.003888 IP 176.32.98.166.443 > 10.148.177.92.48728: Flags [P.], seq 2921:4097, ack 140, win 108, length 1176
>> 00:00:00.003973 IP 10.148.177.92.48728 > 176.32.98.166.443: Flags [.], ack 4097, win 1026, length 0
>
> I can't find any changes in netfront.c or tcp_lro.c to explain why 9.1 and
> 9.2 are behaving differently -- anyone have any ideas?

The last time I looked our soft-LRO had a few remaining issues.  One of
them was that in certain situations reordering may happen with segments
that can't be aggregated into a LRO state.  The other was that the driver
is responsible to manage the flushing of LRO states that haven't seen
updates in some time.  Most drivers likely don't do that correctly for
the simple reason that IIRC never has been a description on how to do
that correctly.  This may explain why there is so much latency.  Normally
a LRO state should not wait more than 5-10ms before flushing.  Also the
total amount of time it can aggregate segments is not limited which can
be bad too.  IIRC Navdeep did a couple of changes to the Chelsio driver
to work around some of these problems.

Taking a closer look at tcp_lro.c and fixing these issues is on my todo
list but I haven't come by it yet.

-- 
Andre