Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 21 Oct 2013 17:15:24 +0200
From:      Andre Oppermann <andre@freebsd.org>
To:        Colin Percival <cperciva@freebsd.org>,  Julian Elischer <julian@freebsd.org>, freebsd-net@freebsd.org
Subject:   Re: LRO causing stretch ACK violations interacts badly with delayed ACKing
Message-ID:  <5265450C.1060601@freebsd.org>
In-Reply-To: <5264869E.4000308@freebsd.org>
References:  <52605EC9.6090406@freebsd.org> <526478D0.1000601@freebsd.org> <5264869E.4000308@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 21.10.2013 03:42, Colin Percival wrote:
> On 10/20/13 17:44, Julian Elischer wrote:
>> On 10/18/13 6:03 AM, Colin Percival wrote:
>>> I know {TSO, LRO, ACKing policy} has been discussed here recently, and I don't
>>> want to rehash everything, but I'm seeing some very bad misbehaviour with LRO
>>> and delayed ACKing turned on.
>>>
>>> Running 'fetch -o /dev/null https://www.amazon.com/' on an EC2 instance running
>> [...]
>> is this just for -current?
>
> Good question.  Turns out that it isn't -- on 9.2 I see a 95.5 ms delayed ACK:
>
>> 00:00:00.000000 IP 10.142.129.245.59172 > 176.32.98.166.443: Flags [S], seq 3310207763, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 292712 ecr 0], length 0
>> 00:00:00.001031 IP 176.32.98.166.443 > 10.142.129.245.59172: Flags [S.], seq 3504196464, ack 3310207764, win 8190, options [mss 1460,nop,wscale 6], length 0
>> 00:00:00.001139 IP 10.142.129.245.59172 > 176.32.98.166.443: Flags [.], ack 1, win 1026, length 0
>> 00:00:00.002269 IP 176.32.98.166.443 > 10.142.129.245.59172: Flags [.], ack 1, win 127, length 0
>> 00:00:00.002938 IP 10.142.129.245.59172 > 176.32.98.166.443: Flags [P.], seq 1:140, ack 1, win 1026, length 139
>> 00:00:00.003815 IP 176.32.98.166.443 > 10.142.129.245.59172: Flags [.], seq 1:4097, ack 140, win 108, length 4096
>> 00:00:00.099328 IP 10.142.129.245.59172 > 176.32.98.166.443: Flags [.], ack 4097, win 1026, length 0
>
> but not on 9.1... although that might just be that LRO isn't happening:
>> 00:00:00.000000 IP 10.148.177.92.48728 > 176.32.98.166.443: Flags [S], seq 2729946716, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 64564 ecr 0], length 0
>> 00:00:00.000722 IP 176.32.98.166.443 > 10.148.177.92.48728: Flags [S.], seq 595247561, ack 2729946717, win 8190, options [mss 1460,nop,wscale 6], length 0
>> 00:00:00.000820 IP 10.148.177.92.48728 > 176.32.98.166.443: Flags [.], ack 1, win 1026, length 0
>> 00:00:00.001998 IP 176.32.98.166.443 > 10.148.177.92.48728: Flags [.], ack 1, win 127, length 0
>> 00:00:00.002716 IP 10.148.177.92.48728 > 176.32.98.166.443: Flags [P.], seq 1:140, ack 1, win 1026, length 139
>> 00:00:00.003527 IP 176.32.98.166.443 > 10.148.177.92.48728: Flags [.], ack 140, win 108, length 0
>> 00:00:00.003834 IP 176.32.98.166.443 > 10.148.177.92.48728: Flags [.], seq 1:1461, ack 140, win 108, length 1460
>> 00:00:00.003850 IP 176.32.98.166.443 > 10.148.177.92.48728: Flags [.], seq 1461:2921, ack 140, win 108, length 1460
>> 00:00:00.003870 IP 10.148.177.92.48728 > 176.32.98.166.443: Flags [.], ack 2921, win 981, length 0
>> 00:00:00.003888 IP 176.32.98.166.443 > 10.148.177.92.48728: Flags [P.], seq 2921:4097, ack 140, win 108, length 1176
>> 00:00:00.003973 IP 10.148.177.92.48728 > 176.32.98.166.443: Flags [.], ack 4097, win 1026, length 0
>
> I can't find any changes in netfront.c or tcp_lro.c to explain why 9.1 and
> 9.2 are behaving differently -- anyone have any ideas?

The last time I looked our soft-LRO had a few remaining issues.  One of
them was that in certain situations reordering may happen with segments
that can't be aggregated into a LRO state.  The other was that the driver
is responsible to manage the flushing of LRO states that haven't seen
updates in some time.  Most drivers likely don't do that correctly for
the simple reason that IIRC never has been a description on how to do
that correctly.  This may explain why there is so much latency.  Normally
a LRO state should not wait more than 5-10ms before flushing.  Also the
total amount of time it can aggregate segments is not limited which can
be bad too.  IIRC Navdeep did a couple of changes to the Chelsio driver
to work around some of these problems.

Taking a closer look at tcp_lro.c and fixing these issues is on my todo
list but I haven't come by it yet.

-- 
Andre




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5265450C.1060601>