Date: Thu, 4 May 2023 14:47:00 -0400 From: Randall Stewart <rrs@netflix.com> To: "Rodney W. Grimes" <freebsd-rwg@gndrsh.dnsmgr.net> Cc: Chen Shuo <chenshuo@chenshuo.com>, freebsd-net <freebsd-net@freebsd.org>, freebsd-transport@freebsd.org Subject: Re: Cwnd grows slowly during slow-start due to LRO of the receiver side. Message-ID: <56338AD8-60B6-4B6B-AE1D-B48ED8D28909@netflix.com> In-Reply-To: <202305021355.342DtKWj021076@gndrsh.dnsmgr.net> References: <202305021355.342DtKWj021076@gndrsh.dnsmgr.net>
next in thread | previous in thread | raw e-mail | index | archive | help
--Apple-Mail=_1B855FBF-EC26-4490-9A2C-6AA3C5357C0D Content-Type: multipart/alternative; boundary="Apple-Mail=_FB2B8B55-52E3-4FD9-9253-EDCFAB8012E6" --Apple-Mail=_FB2B8B55-52E3-4FD9-9253-EDCFAB8012E6 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Rodney/Chen This is a real issue in the internet=E2=80=A6 and its not just LRO/TSO = making this all happen. You have cable modem technology that will batch up and keep = the most recent ack and thus aggregate some number of acks (I have seen up = to 10 acks eaten this way.. each of those for 2 segments)..=20 You have other middle boxes as well doing similar things and then there = is the channel access technology that at least gives you all the acks only = issue is they store them up and release them all at once so forget getting a nice ack-clocking coming out of the stack. The only way to deal with it is to generally raise abc_l_var to a much = larger value. That way has you get an aggregated ack your cwnd will open.. down = side is this lets you be more bursty=E2=80=A6 pacing can help here but only = the bbr and rack pace in FreeBSD=E2=80=A6 R > On May 2, 2023, at 9:55 AM, Rodney W. Grimes = <freebsd-rwg@gndrsh.dnsmgr.net> wrote: >=20 > Second attempt, first one failed due to not being a member > of the list :-(. >=20 >> Adding freebsd-transport@freebsd.org to get that specific groups >> eyes on this issue. >>=20 >> Rod >>=20 >>> As per newreno_ack_received() in sys/netinet/cc/cc_newreno.c, >>> FreeBSD TCP sender strictly follows RFC 5681 with RFC 3465 extension >>> That is, during slow-start, when receiving an ACK of 'bytes_acked' >>>=20 >>> cwnd +=3D min(bytes_acked, abc_l_var * SMSS); // abc_l_var =3D 2 = dflt >>>=20 >>> As discussed in sec3.2 of RFC 3465, L=3D2*SMSS bytes exactly = balances >>> the negative impact of the delayed ACK algorithm. RFC 5681 also >>> requires that a receiver SHOULD generate an ACK for at least every >>> second full-sized segment, so bytes_acked per ACK is at most 2 * = SMSS. >>> If both sender and receiver follow it. cwnd should grow = exponentially >>> during slow-slow: >>>=20 >>> cwnd *=3D 2 (per RTT) >>>=20 >>> However, LRO and TSO are widely used today, so receiver may generate >>> much less ACKs than it used to do. As I observed, Both FreeBSD and >>> Linux generates at most one ACK per segment assembled by LRO/GRO. >>> The worst case is one ACK per 45 MSS, as 45 * 1448 =3D 65160 < = 65535. >>>=20 >>> Sending 1MB over a link of 100ms delay from FreeBSD 13.2: >>>=20 >>> 0.000 IP sender > sink: Flags [S], seq 205083268, win 65535, options >>> [mss 1460,nop,wscale 10,sackOK,TS val 495212525 ecr 0], length 0 >>> 0.100 IP sink > sender: Flags [S.], seq 708257395, ack 205083269, = win >>> 65160, options [mss 1460,sackOK,TS val 563185696 ecr >>> 495212525,nop,wscale 7], length 0 >>> 0.100 IP sender > sink: Flags [.], ack 1, win 65, options = [nop,nop,TS >>> val 495212626 ecr 563185696], length 0 >>> // TSopt omitted below for brevity. >>>=20 >>> // cwnd =3D 10 * MSS, sent 10 * MSS >>> 0.101 IP sender > sink: Flags [.], seq 1:14481, ack 1, win 65, = length 14480 >>>=20 >>> // got one ACK for 10 * MSS, cwnd +=3D 2 * MSS, sent 12 * MSS >>> 0.201 IP sink > sender: Flags [.], ack 14481, win 427, length 0 >>> 0.201 IP sender > sink: Flags [.], seq 14481:31857, ack 1, win 65, = length 17376 >>>=20 >>> // got ACK of 12*MSS above, cwnd +=3D 2 * MSS, sent 14 * MSS >>> 0.301 IP sink > sender: Flags [.], ack 31857, win 411, length 0 >>> 0.301 IP sender > sink: Flags [.], seq 31857:52129, ack 1, win 65, = length 20272 >>>=20 >>> // got ACK of 14*MSS above, cwnd +=3D 2 * MSS, sent 16 * MSS >>> 0.402 IP sink > sender: Flags [.], ack 52129, win 395, length 0 >>> 0.402 IP sender > sink: Flags [P.], seq 52129:73629, ack 1, win 65, >>> length 21500 >>> 0.402 IP sender > sink: Flags [.], seq 73629:75077, ack 1, win 65, = length 1448 >>>=20 >>> As a consequence, instead of growing exponentially, cwnd grows >>> more-or-less quadratically during slow-start, unless abc_l_var is >>> set to a sufficiently large value. >>>=20 >>> NewReno took more than 20 seconds to ramp up throughput to 100Mbps >>> over an emulated 100ms delay link. While Linux took ~2 seconds. >>> I can provide the pcap file if anyone is interested. >>>=20 >>> Switching to CUBIC won't help, because it uses the logic in NewReno >>> ack_received() for slow start. >>>=20 >>> Is this a well-known issue and abc_l_var is the only cure for it? >>> = https://www.google.com/url?q=3Dhttps://calomel.org/freebsd_network_tuning.= html&source=3Dgmail-imap&ust=3D1683640529000000&usg=3DAOvVaw0MoyDmFAOg9MlB= 5yX3FzJP >>>=20 >>> Thank you! >>>=20 >>> Best, >>> Shuo Chen >>>=20 >>>=20 >>=20 >> --=20 >> Rod Grimes = rgrimes@freebsd.org >>=20 >>=20 >=20 > --=20 > Rod Grimes = rgrimes@freebsd.org >=20 ------ Randall Stewart rrs@netflix.com --Apple-Mail=_FB2B8B55-52E3-4FD9-9253-EDCFAB8012E6 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html; = charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; line-break: after-white-space;" class=3D""><div = class=3D"">Rodney/Chen</div><div class=3D""><br class=3D""></div>This is = a real issue in the internet=E2=80=A6 and its not just LRO/TSO making = this<div class=3D"">all happen. You have cable modem technology that = will batch up and keep the</div><div class=3D"">most recent ack and thus = aggregate some number of acks (I have seen up to</div><div class=3D"">10 = acks eaten this way.. each of those for 2 segments).. </div><div = class=3D""><br class=3D""></div><div class=3D"">You have other middle = boxes as well doing similar things and then there is the</div><div = class=3D"">channel access technology that at least gives you all the = acks only issue is</div><div class=3D"">they store them up and release = them all at once so forget getting a nice</div><div = class=3D"">ack-clocking coming out of the stack.</div><div class=3D""><br = class=3D""></div><div class=3D"">The only way to deal with it is to = generally raise abc_l_var to a much larger</div><div class=3D"">value. = That way has you get an aggregated ack your cwnd will open.. down = side</div><div class=3D"">is this lets you be more bursty=E2=80=A6 = pacing can help here but only the bbr and rack</div><div class=3D"">pace = in FreeBSD=E2=80=A6</div><div class=3D""><br class=3D""></div><div = class=3D"">R<br class=3D""><div><br class=3D""><blockquote type=3D"cite" = class=3D""><div class=3D"">On May 2, 2023, at 9:55 AM, Rodney W. Grimes = <<a href=3D"mailto:freebsd-rwg@gndrsh.dnsmgr.net" = class=3D"">freebsd-rwg@gndrsh.dnsmgr.net</a>> wrote:</div><br = class=3D"Apple-interchange-newline"><div class=3D""><div class=3D"">Second= attempt, first one failed due to not being a member<br class=3D"">of = the list :-(.<br class=3D""><br class=3D""><blockquote type=3D"cite" = class=3D"">Adding <a href=3D"mailto:freebsd-transport@freebsd.org" = class=3D"">freebsd-transport@freebsd.org</a> to get that specific = groups<br class=3D"">eyes on this issue.<br class=3D""><br = class=3D"">Rod<br class=3D""><br class=3D""><blockquote type=3D"cite" = class=3D"">As per newreno_ack_received() in = sys/netinet/cc/cc_newreno.c,<br class=3D"">FreeBSD TCP sender strictly = follows RFC 5681 with RFC 3465 extension<br class=3D"">That is, during = slow-start, when receiving an ACK of 'bytes_acked'<br class=3D""><br = class=3D""> cwnd +=3D min(bytes_acked, abc_l_var * = SMSS); // abc_l_var =3D 2 dflt<br class=3D""><br class=3D"">As = discussed in sec3.2 of RFC 3465, L=3D2*SMSS bytes exactly balances<br = class=3D"">the negative impact of the delayed ACK algorithm. RFC = 5681 also<br class=3D"">requires that a receiver SHOULD generate an ACK = for at least every<br class=3D"">second full-sized segment, so = bytes_acked per ACK is at most 2 * SMSS.<br class=3D"">If both sender = and receiver follow it. cwnd should grow exponentially<br = class=3D"">during slow-slow:<br class=3D""><br class=3D""> = cwnd *=3D 2 (per RTT)<br = class=3D""><br class=3D"">However, LRO and TSO are widely used today, so = receiver may generate<br class=3D"">much less ACKs than it used to do. = As I observed, Both FreeBSD and<br class=3D"">Linux generates at = most one ACK per segment assembled by LRO/GRO.<br class=3D"">The worst = case is one ACK per 45 MSS, as 45 * 1448 =3D 65160 < 65535.<br = class=3D""><br class=3D"">Sending 1MB over a link of 100ms delay from = FreeBSD 13.2:<br class=3D""><br class=3D""> 0.000 IP sender > sink: = Flags [S], seq 205083268, win 65535, options<br class=3D"">[mss = 1460,nop,wscale 10,sackOK,TS val 495212525 ecr 0], length 0<br class=3D"">= 0.100 IP sink > sender: Flags [S.], seq 708257395, ack 205083269, = win<br class=3D"">65160, options [mss 1460,sackOK,TS val 563185696 = ecr<br class=3D"">495212525,nop,wscale 7], length 0<br class=3D""> 0.100 = IP sender > sink: Flags [.], ack 1, win 65, options [nop,nop,TS<br = class=3D"">val 495212626 ecr 563185696], length 0<br class=3D""> // = TSopt omitted below for brevity.<br class=3D""><br class=3D""> // cwnd =3D= 10 * MSS, sent 10 * MSS<br class=3D""> 0.101 IP sender > sink: Flags = [.], seq 1:14481, ack 1, win 65, length 14480<br class=3D""><br = class=3D""> // got one ACK for 10 * MSS, cwnd +=3D 2 * MSS, sent 12 * = MSS<br class=3D""> 0.201 IP sink > sender: Flags [.], ack 14481, win = 427, length 0<br class=3D""> 0.201 IP sender > sink: Flags [.], seq = 14481:31857, ack 1, win 65, length 17376<br class=3D""><br class=3D""> = // got ACK of 12*MSS above, cwnd +=3D 2 * MSS, sent 14 * MSS<br = class=3D""> 0.301 IP sink > sender: Flags [.], ack 31857, win 411, = length 0<br class=3D""> 0.301 IP sender > sink: Flags [.], seq = 31857:52129, ack 1, win 65, length 20272<br class=3D""><br class=3D""> = // got ACK of 14*MSS above, cwnd +=3D 2 * MSS, sent 16 * MSS<br = class=3D""> 0.402 IP sink > sender: Flags [.], ack 52129, win 395, = length 0<br class=3D""> 0.402 IP sender > sink: Flags [P.], seq = 52129:73629, ack 1, win 65,<br class=3D"">length 21500<br class=3D""> = 0.402 IP sender > sink: Flags [.], seq 73629:75077, ack 1, win 65, = length 1448<br class=3D""><br class=3D"">As a consequence, instead of = growing exponentially, cwnd grows<br class=3D"">more-or-less = quadratically during slow-start, unless abc_l_var is<br class=3D"">set = to a sufficiently large value.<br class=3D""><br class=3D"">NewReno took = more than 20 seconds to ramp up throughput to 100Mbps<br class=3D"">over = an emulated 100ms delay link. While Linux took ~2 seconds.<br = class=3D"">I can provide the pcap file if anyone is interested.<br = class=3D""><br class=3D"">Switching to CUBIC won't help, because it uses = the logic in NewReno<br class=3D"">ack_received() for slow start.<br = class=3D""><br class=3D"">Is this a well-known issue and abc_l_var is = the only cure for it?<br class=3D""><a = href=3D"https://www.google.com/url?q=3Dhttps://calomel.org/freebsd_network= _tuning.html&source=3Dgmail-imap&ust=3D1683640529000000&usg=3D= AOvVaw0MoyDmFAOg9MlB5yX3FzJP" = class=3D"">https://www.google.com/url?q=3Dhttps://calomel.org/freebsd_netw= ork_tuning.html&source=3Dgmail-imap&ust=3D1683640529000000&usg= =3DAOvVaw0MoyDmFAOg9MlB5yX3FzJP</a><br class=3D""><br class=3D"">Thank = you!<br class=3D""><br class=3D"">Best,<br class=3D"">Shuo Chen<br = class=3D""><br class=3D""><br class=3D""></blockquote><br class=3D"">-- = <br class=3D"">Rod Grimes = &n= bsp; &nbs= p; = <a = href=3D"mailto:rgrimes@freebsd.org" class=3D"">rgrimes@freebsd.org</a><br = class=3D""><br class=3D""><br class=3D""></blockquote><br class=3D"">-- = <br class=3D"">Rod Grimes = &n= bsp; &nbs= p; = <a = href=3D"mailto:rgrimes@freebsd.org" class=3D"">rgrimes@freebsd.org</a><br = class=3D""><br class=3D""></div></div></blockquote></div><br = class=3D""><div class=3D""> <div style=3D"caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); = font-family: Helvetica; font-size: 12px; font-style: normal; = font-variant-caps: normal; font-weight: normal; letter-spacing: normal; = orphans: auto; text-align: start; text-indent: 0px; text-transform: = none; white-space: normal; widows: auto; word-spacing: 0px; = -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; = text-decoration: none;">------</div><div style=3D"caret-color: rgb(0, 0, = 0); color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; = font-style: normal; font-variant-caps: normal; font-weight: normal; = letter-spacing: normal; orphans: auto; text-align: start; text-indent: = 0px; text-transform: none; white-space: normal; widows: auto; = word-spacing: 0px; -webkit-text-size-adjust: auto; = -webkit-text-stroke-width: 0px; text-decoration: none;">Randall = Stewart</div><div style=3D"caret-color: rgb(0, 0, 0); color: rgb(0, 0, = 0); font-family: Helvetica; font-size: 12px; font-style: normal; = font-variant-caps: normal; font-weight: normal; letter-spacing: normal; = orphans: auto; text-align: start; text-indent: 0px; text-transform: = none; white-space: normal; widows: auto; word-spacing: 0px; = -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; = text-decoration: none;"><a href=3D"mailto:rrs@netflix.com" = class=3D"">rrs@netflix.com</a></div><div style=3D"caret-color: rgb(0, 0, = 0); color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; = font-style: normal; font-variant-caps: normal; font-weight: normal; = letter-spacing: normal; orphans: auto; text-align: start; text-indent: = 0px; text-transform: none; white-space: normal; widows: auto; = word-spacing: 0px; -webkit-text-size-adjust: auto; = -webkit-text-stroke-width: 0px; text-decoration: none;" class=3D""><br = class=3D""></div><br class=3D"Apple-interchange-newline"> </div> <br class=3D""></div></body></html>= --Apple-Mail=_FB2B8B55-52E3-4FD9-9253-EDCFAB8012E6-- --Apple-Mail=_1B855FBF-EC26-4490-9A2C-6AA3C5357C0D Content-Disposition: attachment; filename=smime.p7s Content-Type: application/pkcs7-signature; name=smime.p7s Content-Transfer-Encoding: base64 MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgEFADCABgkqhkiG9w0BBwEAAKCCAzYw ggMyMIICGqADAgECAgqxywKqrHPB2ybTMA0GCSqGSIb3DQEBCwUAMEcxGDAWBgNVBAMTD1JhbmRh bGwgU3Rld2FydDEeMBwGCSqGSIb3DQEJARYPcnJzQG5ldGZsaXguY29tMQswCQYDVQQGEwJVUzAe Fw0yMTAxMjQxMjIwMTRaFw0yNjAxMjQxMjIwMTRaMEcxGDAWBgNVBAMTD1JhbmRhbGwgU3Rld2Fy dDEeMBwGCSqGSIb3DQEJARYPcnJzQG5ldGZsaXguY29tMQswCQYDVQQGEwJVUzCCASIwDQYJKoZI hvcNAQEBBQADggEPADCCAQoCggEBAMUAht2nr/NFlK+tmmN9PdO3DBPfeYh9fLcbVihR+/dipO41 AsFy9y+2uDVaFhTEvp406P0o9PQQTuYXqrCr76eWQIj3V787e1WKjTup1mIyQeWHGf1gvb/7vmI2 zHg6QZEIC4W8xeO8SLKyHiwlFHZn8Rn1HxtB7Ge+NulygkgUgJYhXD5E29jVGXAc6Qn9Vr9AexPf KaOhHCaNB/Twcinayz6D8CO/Ym1LOs3+ceSOa4cB07fepmbqDSXDkOeA3U7KLaluHrRTlj6DO+JU nqKXX7jJ68KTYSZ0qH4fZsk8cxFkwYI/3HDJi+oF+FDkf7SRo1Q2w+e3M/5MReLIQ7sCAwEAAaMg MB4wDwYJKoZIhvcvAQEKBAIFADALBgNVHQ8EBAMCB4AwDQYJKoZIhvcNAQELBQADggEBAHJfum1j 1WIVFjOJT/hqMIN751aXkablmwesW94lNJKjslPULbbcP5nZGg2lGpHcZ+0I5F/1TTiEsT2H2rhA uAnSsUxTpxRA+aoe+xtJOa5vle3CprhFkHAvB7EIoLiNaPd0DNK6kKYsbvr5Z5Eq7TF3SIO77Qh9 /8VgUfOb0ARDgix59Q6MM0NmIabEwh0cDWQYlGgDDtN9DNk5PGM4pjs48suwEdLmFTMOmGTkCp7I Vq6iHDNinBiB6+BB4VYMAO1o9qS+0pnfdmPJybt0zVGrhm/c1Fmm3Jec7NEuiKeXmhPIwdwMkKyp AsX0sHuFCYwioBTHHZpvnir+H2rRakgxggHrMIIB5wIBATBVMEcxGDAWBgNVBAMTD1JhbmRhbGwg U3Rld2FydDEeMBwGCSqGSIb3DQEJARYPcnJzQG5ldGZsaXguY29tMQswCQYDVQQGEwJVUwIKscsC qqxzwdsm0zANBglghkgBZQMEAgEFAKBpMBgGCSqGSIb3DQEJAzELBgkqhkiG9w0BBwEwHAYJKoZI hvcNAQkFMQ8XDTIzMDUwNDE4NDcwMFowLwYJKoZIhvcNAQkEMSIEIFFnJMxE05F2HZyiCdD2i9Vl WelG7tDsvnF5RtBNkzSRMA0GCSqGSIb3DQEBCwUABIIBAB2CbIeFUgbuByvgHuns/SIxc2hQ5DC4 KY6ElUa5h6mTLjt4OtJVEeduI5NPLEB6Q+ZUJRtNmOWEHte3i+QA/lnQ+91o3VCvYLn6S6SqiCQ+ Jw0cKsM4hMjczWrXSyljADhsj+rrBh4b8vBF19lJ4y/8Wur5CV9YUJmP8bOCSdmPSojedA0Pd7RM F1NvMgNSvb8AH/CFtuEBKvLpNcthv0i2yCjbD9jac8gsWssYVtNK0EQubV4pCSTXUgL0pbTGG3tS QZ3p54Bn1WGA6s/ETTzU/3j6yhGY2BbAqecMdGDiGzdmQC4rt2wx1RJuOmd9Jm5D+rKZtEUlwM0f ZVNwnLsAAAAAAAA= --Apple-Mail=_1B855FBF-EC26-4490-9A2C-6AA3C5357C0D--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?56338AD8-60B6-4B6B-AE1D-B48ED8D28909>