Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 4 May 2023 14:47:00 -0400
From:      Randall Stewart <rrs@netflix.com>
To:        "Rodney W. Grimes" <freebsd-rwg@gndrsh.dnsmgr.net>
Cc:        Chen Shuo <chenshuo@chenshuo.com>, freebsd-net <freebsd-net@freebsd.org>, freebsd-transport@freebsd.org
Subject:   Re: Cwnd grows slowly during slow-start due to LRO of the receiver side.
Message-ID:  <56338AD8-60B6-4B6B-AE1D-B48ED8D28909@netflix.com>
In-Reply-To: <202305021355.342DtKWj021076@gndrsh.dnsmgr.net>
References:  <202305021355.342DtKWj021076@gndrsh.dnsmgr.net>

next in thread | previous in thread | raw e-mail | index | archive | help

--Apple-Mail=_1B855FBF-EC26-4490-9A2C-6AA3C5357C0D
Content-Type: multipart/alternative;
	boundary="Apple-Mail=_FB2B8B55-52E3-4FD9-9253-EDCFAB8012E6"


--Apple-Mail=_FB2B8B55-52E3-4FD9-9253-EDCFAB8012E6
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=utf-8

Rodney/Chen

This is a real issue in the internet=E2=80=A6 and its not just LRO/TSO =
making this
all happen. You have cable modem technology that will batch up and keep =
the
most recent ack and thus aggregate some number of acks (I have seen up =
to
10 acks eaten this way.. each of those for 2 segments)..=20

You have other middle boxes as well doing similar things and then there =
is the
channel access technology that at least gives you all the acks only =
issue is
they store them up and release them all at once so forget getting a nice
ack-clocking coming out of the stack.

The only way to deal with it is to generally raise abc_l_var to a much =
larger
value. That way has you get an aggregated ack your cwnd will open.. down =
side
is this lets you be more bursty=E2=80=A6 pacing can help here but only =
the bbr and rack
pace in FreeBSD=E2=80=A6

R

> On May 2, 2023, at 9:55 AM, Rodney W. Grimes =
<freebsd-rwg@gndrsh.dnsmgr.net> wrote:
>=20
> Second attempt, first one failed due to not being a member
> of the list :-(.
>=20
>> Adding freebsd-transport@freebsd.org to get that specific groups
>> eyes on this issue.
>>=20
>> Rod
>>=20
>>> As per newreno_ack_received() in sys/netinet/cc/cc_newreno.c,
>>> FreeBSD TCP sender strictly follows RFC 5681 with RFC 3465 extension
>>> That is, during slow-start, when receiving an ACK of 'bytes_acked'
>>>=20
>>>    cwnd +=3D min(bytes_acked, abc_l_var * SMSS);  // abc_l_var =3D 2 =
dflt
>>>=20
>>> As discussed in sec3.2 of RFC 3465, L=3D2*SMSS bytes exactly =
balances
>>> the negative impact of the delayed ACK algorithm.  RFC 5681 also
>>> requires that a receiver SHOULD generate an ACK for at least every
>>> second full-sized segment, so bytes_acked per ACK is at most 2 * =
SMSS.
>>> If both sender and receiver follow it. cwnd should grow =
exponentially
>>> during slow-slow:
>>>=20
>>>    cwnd *=3D 2    (per RTT)
>>>=20
>>> However, LRO and TSO are widely used today, so receiver may generate
>>> much less ACKs than it used to do.  As I observed, Both FreeBSD and
>>> Linux generates at most one ACK per segment assembled by LRO/GRO.
>>> The worst case is one ACK per 45 MSS, as 45 * 1448 =3D 65160 < =
65535.
>>>=20
>>> Sending 1MB over a link of 100ms delay from FreeBSD 13.2:
>>>=20
>>> 0.000 IP sender > sink: Flags [S], seq 205083268, win 65535, options
>>> [mss 1460,nop,wscale 10,sackOK,TS val 495212525 ecr 0], length 0
>>> 0.100 IP sink > sender: Flags [S.], seq 708257395, ack 205083269, =
win
>>> 65160, options [mss 1460,sackOK,TS val 563185696 ecr
>>> 495212525,nop,wscale 7], length 0
>>> 0.100 IP sender > sink: Flags [.], ack 1, win 65, options =
[nop,nop,TS
>>> val 495212626 ecr 563185696], length 0
>>> // TSopt omitted below for brevity.
>>>=20
>>> // cwnd =3D 10 * MSS, sent 10 * MSS
>>> 0.101 IP sender > sink: Flags [.], seq 1:14481, ack 1, win 65, =
length 14480
>>>=20
>>> // got one ACK for 10 * MSS, cwnd +=3D 2 * MSS, sent 12 * MSS
>>> 0.201 IP sink > sender: Flags [.], ack 14481, win 427, length 0
>>> 0.201 IP sender > sink: Flags [.], seq 14481:31857, ack 1, win 65, =
length 17376
>>>=20
>>> // got ACK of 12*MSS above, cwnd +=3D 2 * MSS, sent 14 * MSS
>>> 0.301 IP sink > sender: Flags [.], ack 31857, win 411, length 0
>>> 0.301 IP sender > sink: Flags [.], seq 31857:52129, ack 1, win 65, =
length 20272
>>>=20
>>> // got ACK of 14*MSS above, cwnd +=3D 2 * MSS, sent 16 * MSS
>>> 0.402 IP sink > sender: Flags [.], ack 52129, win 395, length 0
>>> 0.402 IP sender > sink: Flags [P.], seq 52129:73629, ack 1, win 65,
>>> length 21500
>>> 0.402 IP sender > sink: Flags [.], seq 73629:75077, ack 1, win 65, =
length 1448
>>>=20
>>> As a consequence, instead of growing exponentially, cwnd grows
>>> more-or-less quadratically during slow-start, unless abc_l_var is
>>> set to a sufficiently large value.
>>>=20
>>> NewReno took more than 20 seconds to ramp up throughput to 100Mbps
>>> over an emulated 100ms delay link.  While Linux took ~2 seconds.
>>> I can provide the pcap file if anyone is interested.
>>>=20
>>> Switching to CUBIC won't help, because it uses the logic in NewReno
>>> ack_received() for slow start.
>>>=20
>>> Is this a well-known issue and abc_l_var is the only cure for it?
>>> =
https://www.google.com/url?q=3Dhttps://calomel.org/freebsd_network_tuning.=
html&source=3Dgmail-imap&ust=3D1683640529000000&usg=3DAOvVaw0MoyDmFAOg9MlB=
5yX3FzJP
>>>=20
>>> Thank you!
>>>=20
>>> Best,
>>> Shuo Chen
>>>=20
>>>=20
>>=20
>> --=20
>> Rod Grimes                                                 =
rgrimes@freebsd.org
>>=20
>>=20
>=20
> --=20
> Rod Grimes                                                 =
rgrimes@freebsd.org
>=20

------
Randall Stewart
rrs@netflix.com




--Apple-Mail=_FB2B8B55-52E3-4FD9-9253-EDCFAB8012E6
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=utf-8

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html; =
charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; line-break: after-white-space;" class=3D""><div =
class=3D"">Rodney/Chen</div><div class=3D""><br class=3D""></div>This is =
a real issue in the internet=E2=80=A6 and its not just LRO/TSO making =
this<div class=3D"">all happen. You have cable modem technology that =
will batch up and keep the</div><div class=3D"">most recent ack and thus =
aggregate some number of acks (I have seen up to</div><div class=3D"">10 =
acks eaten this way.. each of those for 2 segments)..&nbsp;</div><div =
class=3D""><br class=3D""></div><div class=3D"">You have other middle =
boxes as well doing similar things and then there is the</div><div =
class=3D"">channel access technology that at least gives you all the =
acks only issue is</div><div class=3D"">they store them up and release =
them all at once so forget getting a nice</div><div =
class=3D"">ack-clocking coming out of the stack.</div><div class=3D""><br =
class=3D""></div><div class=3D"">The only way to deal with it is to =
generally raise abc_l_var to a much larger</div><div class=3D"">value. =
That way has you get an aggregated ack your cwnd will open.. down =
side</div><div class=3D"">is this lets you be more bursty=E2=80=A6 =
pacing can help here but only the bbr and rack</div><div class=3D"">pace =
in FreeBSD=E2=80=A6</div><div class=3D""><br class=3D""></div><div =
class=3D"">R<br class=3D""><div><br class=3D""><blockquote type=3D"cite" =
class=3D""><div class=3D"">On May 2, 2023, at 9:55 AM, Rodney W. Grimes =
&lt;<a href=3D"mailto:freebsd-rwg@gndrsh.dnsmgr.net" =
class=3D"">freebsd-rwg@gndrsh.dnsmgr.net</a>&gt; wrote:</div><br =
class=3D"Apple-interchange-newline"><div class=3D""><div class=3D"">Second=
 attempt, first one failed due to not being a member<br class=3D"">of =
the list :-(.<br class=3D""><br class=3D""><blockquote type=3D"cite" =
class=3D"">Adding <a href=3D"mailto:freebsd-transport@freebsd.org" =
class=3D"">freebsd-transport@freebsd.org</a> to get that specific =
groups<br class=3D"">eyes on this issue.<br class=3D""><br =
class=3D"">Rod<br class=3D""><br class=3D""><blockquote type=3D"cite" =
class=3D"">As per newreno_ack_received() in =
sys/netinet/cc/cc_newreno.c,<br class=3D"">FreeBSD TCP sender strictly =
follows RFC 5681 with RFC 3465 extension<br class=3D"">That is, during =
slow-start, when receiving an ACK of 'bytes_acked'<br class=3D""><br =
class=3D""> &nbsp;&nbsp;&nbsp;cwnd +=3D min(bytes_acked, abc_l_var * =
SMSS); &nbsp;// abc_l_var =3D 2 dflt<br class=3D""><br class=3D"">As =
discussed in sec3.2 of RFC 3465, L=3D2*SMSS bytes exactly balances<br =
class=3D"">the negative impact of the delayed ACK algorithm. &nbsp;RFC =
5681 also<br class=3D"">requires that a receiver SHOULD generate an ACK =
for at least every<br class=3D"">second full-sized segment, so =
bytes_acked per ACK is at most 2 * SMSS.<br class=3D"">If both sender =
and receiver follow it. cwnd should grow exponentially<br =
class=3D"">during slow-slow:<br class=3D""><br class=3D""> =
&nbsp;&nbsp;&nbsp;cwnd *=3D 2 &nbsp;&nbsp;&nbsp;(per RTT)<br =
class=3D""><br class=3D"">However, LRO and TSO are widely used today, so =
receiver may generate<br class=3D"">much less ACKs than it used to do. =
&nbsp;As I observed, Both FreeBSD and<br class=3D"">Linux generates at =
most one ACK per segment assembled by LRO/GRO.<br class=3D"">The worst =
case is one ACK per 45 MSS, as 45 * 1448 =3D 65160 &lt; 65535.<br =
class=3D""><br class=3D"">Sending 1MB over a link of 100ms delay from =
FreeBSD 13.2:<br class=3D""><br class=3D""> 0.000 IP sender &gt; sink: =
Flags [S], seq 205083268, win 65535, options<br class=3D"">[mss =
1460,nop,wscale 10,sackOK,TS val 495212525 ecr 0], length 0<br class=3D"">=
 0.100 IP sink &gt; sender: Flags [S.], seq 708257395, ack 205083269, =
win<br class=3D"">65160, options [mss 1460,sackOK,TS val 563185696 =
ecr<br class=3D"">495212525,nop,wscale 7], length 0<br class=3D""> 0.100 =
IP sender &gt; sink: Flags [.], ack 1, win 65, options [nop,nop,TS<br =
class=3D"">val 495212626 ecr 563185696], length 0<br class=3D""> // =
TSopt omitted below for brevity.<br class=3D""><br class=3D""> // cwnd =3D=
 10 * MSS, sent 10 * MSS<br class=3D""> 0.101 IP sender &gt; sink: Flags =
[.], seq 1:14481, ack 1, win 65, length 14480<br class=3D""><br =
class=3D""> // got one ACK for 10 * MSS, cwnd +=3D 2 * MSS, sent 12 * =
MSS<br class=3D""> 0.201 IP sink &gt; sender: Flags [.], ack 14481, win =
427, length 0<br class=3D""> 0.201 IP sender &gt; sink: Flags [.], seq =
14481:31857, ack 1, win 65, length 17376<br class=3D""><br class=3D""> =
// got ACK of 12*MSS above, cwnd +=3D 2 * MSS, sent 14 * MSS<br =
class=3D""> 0.301 IP sink &gt; sender: Flags [.], ack 31857, win 411, =
length 0<br class=3D""> 0.301 IP sender &gt; sink: Flags [.], seq =
31857:52129, ack 1, win 65, length 20272<br class=3D""><br class=3D""> =
// got ACK of 14*MSS above, cwnd +=3D 2 * MSS, sent 16 * MSS<br =
class=3D""> 0.402 IP sink &gt; sender: Flags [.], ack 52129, win 395, =
length 0<br class=3D""> 0.402 IP sender &gt; sink: Flags [P.], seq =
52129:73629, ack 1, win 65,<br class=3D"">length 21500<br class=3D""> =
0.402 IP sender &gt; sink: Flags [.], seq 73629:75077, ack 1, win 65, =
length 1448<br class=3D""><br class=3D"">As a consequence, instead of =
growing exponentially, cwnd grows<br class=3D"">more-or-less =
quadratically during slow-start, unless abc_l_var is<br class=3D"">set =
to a sufficiently large value.<br class=3D""><br class=3D"">NewReno took =
more than 20 seconds to ramp up throughput to 100Mbps<br class=3D"">over =
an emulated 100ms delay link. &nbsp;While Linux took ~2 seconds.<br =
class=3D"">I can provide the pcap file if anyone is interested.<br =
class=3D""><br class=3D"">Switching to CUBIC won't help, because it uses =
the logic in NewReno<br class=3D"">ack_received() for slow start.<br =
class=3D""><br class=3D"">Is this a well-known issue and abc_l_var is =
the only cure for it?<br class=3D""><a =
href=3D"https://www.google.com/url?q=3Dhttps://calomel.org/freebsd_network=
_tuning.html&amp;source=3Dgmail-imap&amp;ust=3D1683640529000000&amp;usg=3D=
AOvVaw0MoyDmFAOg9MlB5yX3FzJP" =
class=3D"">https://www.google.com/url?q=3Dhttps://calomel.org/freebsd_netw=
ork_tuning.html&amp;source=3Dgmail-imap&amp;ust=3D1683640529000000&amp;usg=
=3DAOvVaw0MoyDmFAOg9MlB5yX3FzJP</a><br class=3D""><br class=3D"">Thank =
you!<br class=3D""><br class=3D"">Best,<br class=3D"">Shuo Chen<br =
class=3D""><br class=3D""><br class=3D""></blockquote><br class=3D"">-- =
<br class=3D"">Rod Grimes =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a =
href=3D"mailto:rgrimes@freebsd.org" class=3D"">rgrimes@freebsd.org</a><br =
class=3D""><br class=3D""><br class=3D""></blockquote><br class=3D"">-- =
<br class=3D"">Rod Grimes =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a =
href=3D"mailto:rgrimes@freebsd.org" class=3D"">rgrimes@freebsd.org</a><br =
class=3D""><br class=3D""></div></div></blockquote></div><br =
class=3D""><div class=3D"">
<div style=3D"caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); =
font-family: Helvetica; font-size: 12px; font-style: normal; =
font-variant-caps: normal; font-weight: normal; letter-spacing: normal; =
orphans: auto; text-align: start; text-indent: 0px; text-transform: =
none; white-space: normal; widows: auto; word-spacing: 0px; =
-webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; =
text-decoration: none;">------</div><div style=3D"caret-color: rgb(0, 0, =
0); color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; =
font-style: normal; font-variant-caps: normal; font-weight: normal; =
letter-spacing: normal; orphans: auto; text-align: start; text-indent: =
0px; text-transform: none; white-space: normal; widows: auto; =
word-spacing: 0px; -webkit-text-size-adjust: auto; =
-webkit-text-stroke-width: 0px; text-decoration: none;">Randall =
Stewart</div><div style=3D"caret-color: rgb(0, 0, 0); color: rgb(0, 0, =
0); font-family: Helvetica; font-size: 12px; font-style: normal; =
font-variant-caps: normal; font-weight: normal; letter-spacing: normal; =
orphans: auto; text-align: start; text-indent: 0px; text-transform: =
none; white-space: normal; widows: auto; word-spacing: 0px; =
-webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; =
text-decoration: none;"><a href=3D"mailto:rrs@netflix.com" =
class=3D"">rrs@netflix.com</a></div><div style=3D"caret-color: rgb(0, 0, =
0); color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; =
font-style: normal; font-variant-caps: normal; font-weight: normal; =
letter-spacing: normal; orphans: auto; text-align: start; text-indent: =
0px; text-transform: none; white-space: normal; widows: auto; =
word-spacing: 0px; -webkit-text-size-adjust: auto; =
-webkit-text-stroke-width: 0px; text-decoration: none;" class=3D""><br =
class=3D""></div><br class=3D"Apple-interchange-newline">
</div>
<br class=3D""></div></body></html>=

--Apple-Mail=_FB2B8B55-52E3-4FD9-9253-EDCFAB8012E6--

--Apple-Mail=_1B855FBF-EC26-4490-9A2C-6AA3C5357C0D
Content-Disposition: attachment;
	filename=smime.p7s
Content-Type: application/pkcs7-signature;
	name=smime.p7s
Content-Transfer-Encoding: base64

MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgEFADCABgkqhkiG9w0BBwEAAKCCAzYw
ggMyMIICGqADAgECAgqxywKqrHPB2ybTMA0GCSqGSIb3DQEBCwUAMEcxGDAWBgNVBAMTD1JhbmRh
bGwgU3Rld2FydDEeMBwGCSqGSIb3DQEJARYPcnJzQG5ldGZsaXguY29tMQswCQYDVQQGEwJVUzAe
Fw0yMTAxMjQxMjIwMTRaFw0yNjAxMjQxMjIwMTRaMEcxGDAWBgNVBAMTD1JhbmRhbGwgU3Rld2Fy
dDEeMBwGCSqGSIb3DQEJARYPcnJzQG5ldGZsaXguY29tMQswCQYDVQQGEwJVUzCCASIwDQYJKoZI
hvcNAQEBBQADggEPADCCAQoCggEBAMUAht2nr/NFlK+tmmN9PdO3DBPfeYh9fLcbVihR+/dipO41
AsFy9y+2uDVaFhTEvp406P0o9PQQTuYXqrCr76eWQIj3V787e1WKjTup1mIyQeWHGf1gvb/7vmI2
zHg6QZEIC4W8xeO8SLKyHiwlFHZn8Rn1HxtB7Ge+NulygkgUgJYhXD5E29jVGXAc6Qn9Vr9AexPf
KaOhHCaNB/Twcinayz6D8CO/Ym1LOs3+ceSOa4cB07fepmbqDSXDkOeA3U7KLaluHrRTlj6DO+JU
nqKXX7jJ68KTYSZ0qH4fZsk8cxFkwYI/3HDJi+oF+FDkf7SRo1Q2w+e3M/5MReLIQ7sCAwEAAaMg
MB4wDwYJKoZIhvcvAQEKBAIFADALBgNVHQ8EBAMCB4AwDQYJKoZIhvcNAQELBQADggEBAHJfum1j
1WIVFjOJT/hqMIN751aXkablmwesW94lNJKjslPULbbcP5nZGg2lGpHcZ+0I5F/1TTiEsT2H2rhA
uAnSsUxTpxRA+aoe+xtJOa5vle3CprhFkHAvB7EIoLiNaPd0DNK6kKYsbvr5Z5Eq7TF3SIO77Qh9
/8VgUfOb0ARDgix59Q6MM0NmIabEwh0cDWQYlGgDDtN9DNk5PGM4pjs48suwEdLmFTMOmGTkCp7I
Vq6iHDNinBiB6+BB4VYMAO1o9qS+0pnfdmPJybt0zVGrhm/c1Fmm3Jec7NEuiKeXmhPIwdwMkKyp
AsX0sHuFCYwioBTHHZpvnir+H2rRakgxggHrMIIB5wIBATBVMEcxGDAWBgNVBAMTD1JhbmRhbGwg
U3Rld2FydDEeMBwGCSqGSIb3DQEJARYPcnJzQG5ldGZsaXguY29tMQswCQYDVQQGEwJVUwIKscsC
qqxzwdsm0zANBglghkgBZQMEAgEFAKBpMBgGCSqGSIb3DQEJAzELBgkqhkiG9w0BBwEwHAYJKoZI
hvcNAQkFMQ8XDTIzMDUwNDE4NDcwMFowLwYJKoZIhvcNAQkEMSIEIFFnJMxE05F2HZyiCdD2i9Vl
WelG7tDsvnF5RtBNkzSRMA0GCSqGSIb3DQEBCwUABIIBAB2CbIeFUgbuByvgHuns/SIxc2hQ5DC4
KY6ElUa5h6mTLjt4OtJVEeduI5NPLEB6Q+ZUJRtNmOWEHte3i+QA/lnQ+91o3VCvYLn6S6SqiCQ+
Jw0cKsM4hMjczWrXSyljADhsj+rrBh4b8vBF19lJ4y/8Wur5CV9YUJmP8bOCSdmPSojedA0Pd7RM
F1NvMgNSvb8AH/CFtuEBKvLpNcthv0i2yCjbD9jac8gsWssYVtNK0EQubV4pCSTXUgL0pbTGG3tS
QZ3p54Bn1WGA6s/ETTzU/3j6yhGY2BbAqecMdGDiGzdmQC4rt2wx1RJuOmd9Jm5D+rKZtEUlwM0f
ZVNwnLsAAAAAAAA=
--Apple-Mail=_1B855FBF-EC26-4490-9A2C-6AA3C5357C0D--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?56338AD8-60B6-4B6B-AE1D-B48ED8D28909>