Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 5 May 2023 06:35:03 -0700
From:      Chen Shuo <chenshuo@chenshuo.com>
To:        Randall Stewart <rrs@netflix.com>, "Rodney W. Grimes" <freebsd-rwg@gndrsh.dnsmgr.net>
Cc:        freebsd-net <freebsd-net@freebsd.org>, freebsd-transport@freebsd.org
Subject:   Re: Cwnd grows slowly during slow-start due to LRO of the receiver side.
Message-ID:  <CAKZ7KuKFDzRJ-GEC=667%2ButSvBMM_yo%2B%2B0yDakbqRN8jf=rRAg@mail.gmail.com>
In-Reply-To: <56338AD8-60B6-4B6B-AE1D-B48ED8D28909@netflix.com>
References:  <202305021355.342DtKWj021076@gndrsh.dnsmgr.net> <56338AD8-60B6-4B6B-AE1D-B48ED8D28909@netflix.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi Rodney,
Thanks for bringing this to the correct mailing list.

Hi Randall,

Thanks for your information, I didn't know that middle boxes can do such th=
ings.

Linux effectively sets abc_l_var to +inf, and opens cwnd quicker for
aggregated ACKs.
Its receiver also enters "quickack" mode after establishing a link to
"accelerate slow-start".
So its cwnd grows much more aggressively.

My puzzle has been solved.

Regards,
Shuo

On Thu, May 4, 2023 at 11:47=E2=80=AFAM Randall Stewart <rrs@netflix.com> w=
rote:
>
> Rodney/Chen
>
> This is a real issue in the internet=E2=80=A6 and its not just LRO/TSO ma=
king this
> all happen. You have cable modem technology that will batch up and keep t=
he
> most recent ack and thus aggregate some number of acks (I have seen up to
> 10 acks eaten this way.. each of those for 2 segments)..
>
> You have other middle boxes as well doing similar things and then there i=
s the
> channel access technology that at least gives you all the acks only issue=
 is
> they store them up and release them all at once so forget getting a nice
> ack-clocking coming out of the stack.
>
> The only way to deal with it is to generally raise abc_l_var to a much la=
rger
> value. That way has you get an aggregated ack your cwnd will open.. down =
side
> is this lets you be more bursty=E2=80=A6 pacing can help here but only th=
e bbr and rack
> pace in FreeBSD=E2=80=A6
>
> R
>
> On May 2, 2023, at 9:55 AM, Rodney W. Grimes <freebsd-rwg@gndrsh.dnsmgr.n=
et> wrote:
>
> Second attempt, first one failed due to not being a member
> of the list :-(.
>
> Adding freebsd-transport@freebsd.org to get that specific groups
> eyes on this issue.
>
> Rod
>
> As per newreno_ack_received() in sys/netinet/cc/cc_newreno.c,
> FreeBSD TCP sender strictly follows RFC 5681 with RFC 3465 extension
> That is, during slow-start, when receiving an ACK of 'bytes_acked'
>
>    cwnd +=3D min(bytes_acked, abc_l_var * SMSS);  // abc_l_var =3D 2 dflt
>
> As discussed in sec3.2 of RFC 3465, L=3D2*SMSS bytes exactly balances
> the negative impact of the delayed ACK algorithm.  RFC 5681 also
> requires that a receiver SHOULD generate an ACK for at least every
> second full-sized segment, so bytes_acked per ACK is at most 2 * SMSS.
> If both sender and receiver follow it. cwnd should grow exponentially
> during slow-slow:
>
>    cwnd *=3D 2    (per RTT)
>
> However, LRO and TSO are widely used today, so receiver may generate
> much less ACKs than it used to do.  As I observed, Both FreeBSD and
> Linux generates at most one ACK per segment assembled by LRO/GRO.
> The worst case is one ACK per 45 MSS, as 45 * 1448 =3D 65160 < 65535.
>
> Sending 1MB over a link of 100ms delay from FreeBSD 13.2:
>
> 0.000 IP sender > sink: Flags [S], seq 205083268, win 65535, options
> [mss 1460,nop,wscale 10,sackOK,TS val 495212525 ecr 0], length 0
> 0.100 IP sink > sender: Flags [S.], seq 708257395, ack 205083269, win
> 65160, options [mss 1460,sackOK,TS val 563185696 ecr
> 495212525,nop,wscale 7], length 0
> 0.100 IP sender > sink: Flags [.], ack 1, win 65, options [nop,nop,TS
> val 495212626 ecr 563185696], length 0
> // TSopt omitted below for brevity.
>
> // cwnd =3D 10 * MSS, sent 10 * MSS
> 0.101 IP sender > sink: Flags [.], seq 1:14481, ack 1, win 65, length 144=
80
>
> // got one ACK for 10 * MSS, cwnd +=3D 2 * MSS, sent 12 * MSS
> 0.201 IP sink > sender: Flags [.], ack 14481, win 427, length 0
> 0.201 IP sender > sink: Flags [.], seq 14481:31857, ack 1, win 65, length=
 17376
>
> // got ACK of 12*MSS above, cwnd +=3D 2 * MSS, sent 14 * MSS
> 0.301 IP sink > sender: Flags [.], ack 31857, win 411, length 0
> 0.301 IP sender > sink: Flags [.], seq 31857:52129, ack 1, win 65, length=
 20272
>
> // got ACK of 14*MSS above, cwnd +=3D 2 * MSS, sent 16 * MSS
> 0.402 IP sink > sender: Flags [.], ack 52129, win 395, length 0
> 0.402 IP sender > sink: Flags [P.], seq 52129:73629, ack 1, win 65,
> length 21500
> 0.402 IP sender > sink: Flags [.], seq 73629:75077, ack 1, win 65, length=
 1448
>
> As a consequence, instead of growing exponentially, cwnd grows
> more-or-less quadratically during slow-start, unless abc_l_var is
> set to a sufficiently large value.
>
> NewReno took more than 20 seconds to ramp up throughput to 100Mbps
> over an emulated 100ms delay link.  While Linux took ~2 seconds.
> I can provide the pcap file if anyone is interested.
>
> Switching to CUBIC won't help, because it uses the logic in NewReno
> ack_received() for slow start.
>
> Is this a well-known issue and abc_l_var is the only cure for it?
> https://www.google.com/url?q=3Dhttps://calomel.org/freebsd_network_tuning=
.html&source=3Dgmail-imap&ust=3D1683640529000000&usg=3DAOvVaw0MoyDmFAOg9MlB=
5yX3FzJP
>
> Thank you!
>
> Best,
> Shuo Chen
>
>
>
> --
> Rod Grimes                                                 rgrimes@freebs=
d.org
>
>
>
> --
> Rod Grimes                                                 rgrimes@freebs=
d.org
>
>
> ------
> Randall Stewart
> rrs@netflix.com
>
>
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAKZ7KuKFDzRJ-GEC=667%2ButSvBMM_yo%2B%2B0yDakbqRN8jf=rRAg>