Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 12 Sep 2019 15:41:19 -0400
From:      Randall Stewart <rrs@netflix.com>
To:        "Scheffenegger, Richard" <Richard.Scheffenegger@netapp.com>
Cc:        Michael Tuexen <tuexen@FreeBSD.org>, Lawrence Stewart <lstewart@netflix.com>, Jonathan Looney <jtl@netflix.com>, "freebsd-transport@freebsd.org" <freebsd-transport@freebsd.org>, "Cui, Cheng" <Cheng.Cui@netapp.com>, Tom Jones <thj@freebsd.org>, "bz@freebsd.org" <bz@freebsd.org>, "Eggert, Lars" <lars@netapp.com>
Subject:   Re: reno cwnd growth while app limited...
Message-ID:  <1E528E13-B087-4318-A9CA-F08957B3F03A@netflix.com>
In-Reply-To: <SN4PR0601MB37282B52C38205880B02A81D86B00@SN4PR0601MB3728.namprd06.prod.outlook.com>
References:  <CY4PR0601MB3715561F17C9C213ACFBB61586B10@CY4PR0601MB3715.namprd06.prod.outlook.com> <E86A08A0-CFB6-4292-ADBD-AEB2E1DAF64C@netflix.com> <SN4PR0601MB37282B52C38205880B02A81D86B00@SN4PR0601MB3728.namprd06.prod.outlook.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Richard

Yes that is something one could do i.e. NewCWV=E2=80=A6 we had patches =
for it at
one time but it was largely untested.

Pacing of course changes all of this as well.. i.e. BBR for example does =
not worry about
this since its pacing the packets so you never get a burst. We will soon =
be committing
and updated rack stack that also has this ability

R

> On Sep 12, 2019, at 2:49 PM, Scheffenegger, Richard =
<Richard.Scheffenegger@netapp.com> wrote:
>=20
> Michael,
> =20
> Thanks a lot for pointing out the uperf utility - this could be =
configured exactly in the way I wanted to demonstrate this...
> =20
> In the below graphs, I traced the evolution of cwnd for a flow across =
the loopback in a VM.
> =20
> The application is doing 3060 times 10kB writes, 10ms pausing (well =
below the 230ms minimum TCP idle period), and once that phase is over, =
floods the session with 8x writes of 10MB each.
> =20
> Currently, the stack will initially grow cwnd up to the limit set by =
the receiver's window (set to 1.2 MB) - during the low bandwidth rate =
phase, where no loss occurs...
> =20
> Thus the application can send out a massive burst of data in a single =
RTT (or at linerate) when it chooses to do so...
> =20
> =20
> Using the guidance given by NewCWV (RFC7661), and growing cwnd only, =
when flightsize is larger than half of cwnd, the congestion window =
remains in more reasonable ranges during the application limited phase, =
thus limiting the maximum burst size.
> =20
> Growth of cwnd in SS or CA otherwise is normal, but the inverse case =
(application transisioning from high throughput to low) is not =
addressed; but I wonder if a reduction could be achieved without the =
timer infrastructure described in 7661 (e.g. reducing cwnd by 1 mss, =
when flightsize is < =C2=BD cwnd, while not doing recovery=E2=80=A6
> =20
> <image004.png>
> Unlimited ssthresh:
> <image005.png>
> =20
> =20
> <image009.png>
> =20
> =20
> Richard Scheffenegger
> Consulting Solution Architect
> NAS & Networking
> =20
> NetApp
> +43 1 3676 811 3157 Direct Phone
> +43 664 8866 1857 Mobile Phone
> Richard.Scheffenegger@netapp.com
> =20
> https://ts.la/richard49892
> =20
> =20
> -----Original Message-----
> From: Randall Stewart <rrs@netflix.com>=20
> Sent: Mittwoch, 11. September 2019 14:18
> To: Scheffenegger, Richard <Richard.Scheffenegger@netapp.com>
> Cc: Lawrence Stewart <lstewart@netflix.com>; Michael Tuexen =
<tuexen@FreeBSD.org>; Jonathan Looney <jtl@netflix.com>; =
freebsd-transport@freebsd.org; Cui, Cheng <Cheng.Cui@netapp.com>; Tom =
Jones <thj@freebsd.org>; bz@freebsd.org; Eggert, Lars <lars@netapp.com>
> Subject: Re: reno cwnd growth while app limited...
> =20
> NetApp Security WARNING: This is an external email. Do not click links =
or open attachments unless you recognize the sender and know the content =
is safe.
> =20
> =20
> =20
> =20
> Interesting graph :)
> =20
> =20
> I know that years ago I had a discussion along these lines (talking =
about burst-limits) with
> Kacheong Poon and Mark Allman. IIRR Kacheong said, at that time, sun =
limited the cwnd to
> something like 4MSS more than the flight size (I could have that mixed =
up though and it might
> have been Mark proposing that.. its been a while sun was still a =
company then :D).
> =20
> On the other hand I am not sure that such a tight limit takes into =
account all of the ack-artifacts that
> seem to be rabid in the internet now..  BBR took the approach of =
limiting its cwnd to 2xBDP (or at
> least what it thought was the BDP).. which is more along the lines of =
your .5 if I am reading you right.
> =20
> It might be something worth looking into but I would want to =
contemplate it for a while :)
> =20
> R
> =20
> > On Sep 11, 2019, at 8:04 AM, Scheffenegger, Richard =
<Richard.Scheffenegger@netapp.com> wrote:
> >=20
> > Hi,
> >=20
> > I was just looking at some graph data running two parallel dctcp =
flows against a cubic receiver (some internal validation) with =
traditional ecn feedback.
> >=20
> > <image002.jpg>
> >=20
> >=20
> > Now, in the beginning, a single flow can not overutilize the link =
capacity, and never runs into any loss/mark=E2=80=A6 but the snd_cwnd =
grows unbounded (since DCTCP is using the newreno =E2=80=9Ccc_ack_received=
=E2=80=9D mechanism).
> >=20
> > However, newreno_ack_received is only to grow snd_cwnd, when =
CCF_CWND_LIMITED is set, which remains set as long as snd_cwnd < snd_wnd =
(the receiver signaled receive-window).
> >=20
> > But is this still* the correct behavior?
> >=20
> > Say, the data flow rate is application limited (ever n milliseconds, =
a few kB), and the receiver has a large window signalled =E2=80=93 cwnd =
will grow until it matches the receivers window. If then the application =
chooses to no longer restrict itself, it would possibly burst out =
significantly more data than the queuing of the path can handle=E2=80=A6
> >=20
> > So, shouldn=E2=80=99t there be a second condition for cwnd growth, =
that e.g. pipe (flightsize) is close to cwnd (factor 0.5 during slow =
start, and say 0.85 during congestion avoidance), to prevent sudden =
large bursts when a flow comes out of being application limited? The =
intention here would be to restrict the worst case burst that could be =
sent out (which is dealt will differently in other stacks), to ideally =
still fit into the path=E2=80=99s queues=E2=80=A6
> >=20
> > RFC5681 is silent on application limited flows though (but one could =
thing of application limiting a flow being another form of congestion, =
during which cwnd shouldn=E2=80=99t grow=E2=80=A6)
> >=20
> > In the example above, growing cwnd up to about 500 kB and then =
remaining there should be approximately the expected setting =E2=80=93 =
based on the average of two competing flows hovering at aroud 200-250 =
kB=E2=80=A6
> >=20
> > *) I=E2=80=99m referring to the much higher likelihood nowadays, =
that the application itself pacing and transfer volume violates the =
design principle of TCP, where the implicit assumption was that the =
sender has unlimited data to send, with the timing controlled at the =
full disgression of TCP.
> >=20
> >=20
> > Richard Scheffenegger
> > Consulting Solution Architect
> > NAS & Networking
> >=20
> > NetApp
> > +43 1 3676 811 3157 Direct Phone
> > +43 664 8866 1857 Mobile Phone
> > Richard.Scheffenegger@netapp.com
> >=20
> >=20
> > <image004.jpg>
> >=20
> > <image006.jpg> <image012.jpg>
> >  #DataDriven
> >=20
> > https://ts.la/richard49892
> =20
> ------
> Randall Stewart
> rrs@netflix.com

------
Randall Stewart
rrs@netflix.com






Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1E528E13-B087-4318-A9CA-F08957B3F03A>