Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 13 Jan 2020 23:39:54 +0000
From:      "Scheffenegger, Richard" <Richard.Scheffenegger@netapp.com>
To:        "freebsd-transport@freebsd.org" <freebsd-transport@freebsd.org>
Cc:        "Cui, Cheng" <Cheng.Cui@netapp.com>, Christoph Paasch <cpaasch@apple.com>,  Vidhi Goel <vidhi_goel@apple.com>, 'Michael Tuexen' <tuexen@freebsd.org>
Subject:   SACK + RTO interaction
Message-ID:  <SN4PR0601MB3728AC5A0466DBBEB1FA53B486350@SN4PR0601MB3728.namprd06.prod.outlook.com>

next in thread | raw e-mail | index | archive | help
Hi guys,

I believe, Cheng has uncovered another long lurking bug, this time in the i=
nteraction between RTO and SACK.

Since its inception, tcp_sack_partialack stops the RTO timer apparently - w=
hich doesn't seem right.

What he observed is that if you run twice during the same SACK loss recover=
y episode into a lost retransmission (which is currently only recoverable b=
y RTO), the initial loss is recovered by the RTO (unless an partial ACK dis=
abled the timer prior to it firing), and the 2nd twice lost segment is at t=
he mercy of any other tcp timer which hopefully is still active (keepalive,=
 persist, ...).

https://reviews.freebsd.org/source/src/browse/head/sys/netinet/tcp_sack.c#7=
82

I strongly suspect, that this should never cancelled the RTO, but reset it =
anew after a partial ACK. At least that would be more logical - to pull for=
ward the timeout, if you are making some forward progress - not to stop the=
 timeout completely, if one (of possibly many) retransmissions went through=
; if SACK loss recovery doesn't complete in an RTO timeout (which is many m=
ore RTTs than the single RTT a SACK loss recover should be taking), it woul=
d be prudent to give up and fall back to RTO, not?

This effect may also explain some of the other sporadic, very lengthy SACK =
recoveries we couldn't really pin down so far...

The patch should be easy enough
tcp_timer_activate(tp, TT_REXMT, tp->t_rxtcur);
instead of
tcp_timer_activate(tp, TT_REXMT, 0);

Any comments?

BTW, found the same in Darwin.



Richard Scheffenegger
Consulting Solution Architect
NAS & Networking

NetApp
+43 1 3676 811 3157 Direct Phone
+43 664 8866 1857 Mobile Phone
Richard.Scheffenegger@netapp.com<mailto:Richard.Scheffenegger@netapp.com>


[Welcome to Data Driven]<https://datavisionary.netapp.com/>;

<https://datavisionary.netapp.com/>;
[Facebook]<https://www.facebook.com/NetApp?fref=3Dts>; [Twitter] <https://tw=
itter.com/NetApp>
 #DataDriven

https://ts.la/richard49892




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?SN4PR0601MB3728AC5A0466DBBEB1FA53B486350>