Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 18 Jul 2024 22:00:38 +0900
From:      Junho Choi <junho.choi@gmail.com>
To:        tuexen@freebsd.org
Cc:        Alan Somers <asomers@freebsd.org>, FreeBSD Net <freebsd-net@freebsd.org>
Subject:   Re: TCP Success Story (was Re: TCP_RACK, TCP_BBR, and firewalls)
Message-ID:  <CAJ5e%2BHAvNbazCkd_G_E=QojqknQe23khCimyKWk=TTyzHr2j0Q@mail.gmail.com>
In-Reply-To: <B86DCBA6-542F-4951-A726-3A66D3D640D6@freebsd.org>
References:  <CAOtMX2iLv5OW4jQiBOHqMvcqkQSznTyO-eWMrOcHWbpeyaeRsg@mail.gmail.com> <C7467BCD-7232-4C6C-873E-EEC2482214A7@freebsd.org> <CAOtMX2hGYfm0U0L25-vHSX0iOyKCbZydaAzye6Y6U59mQeF7rA@mail.gmail.com> <B86DCBA6-542F-4951-A726-3A66D3D640D6@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
--0000000000009edf8f061d852cb0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Alan - this is a great result to see. Thanks for experimenting.

Just curious why bbr and rack don't co-exist? Those are two separate things=
.
Is it a current bug or by design?

BR,

On Thu, Jul 18, 2024 at 5:27=E2=80=AFAM <tuexen@freebsd.org> wrote:

> > On 17. Jul 2024, at 22:00, Alan Somers <asomers@freebsd.org> wrote:
> >
> > On Sat, Jul 13, 2024 at 1:50=E2=80=AFAM <tuexen@freebsd.org> wrote:
> >>
> >>> On 13. Jul 2024, at 01:43, Alan Somers <asomers@FreeBSD.org> wrote:
> >>>
> >>> I've been experimenting with RACK and BBR.  In my environment, they
> >>> can dramatically improve single-stream TCP performance, which is
> >>> awesome.  But pf interferes.  I have to disable pf in order for them
> >>> to work at all.
> >>>
> >>> Is this a known limitation?  If not, I will experiment some more to
> >>> determine exactly what aspect of my pf configuration is responsible.
> >>> If so, can anybody suggest what changes would have to happen to make
> >>> the two compatible?
> >> A problem with same symptoms was already reported and fixed in
> >> https://reviews.freebsd.org/D43769
> >>
> >> Which version are you using?
> >>
> >> Best regards
> >> Michael
> >>>
> >>> -Alan
> >
> > TLDR; tcp_rack is good, cc_chd is better, and tcp_bbr is best
> >
> > I want to follow up with the list to post my conclusions.  Firstly
> > tuexen@ helped me solve my problem: in FreeBSD 14.0 there is a 3-way
> > incompatibility between (tcp_bbr || tcp_rack) && lro && pf.  I can
> > confirm that tcp_bbr works for me if I either disable LRO, disable PF,
> > or switch to a 14.1 server.
> >
> > Here's the real problem: on multiple production servers, downloading
> > large files (or ZFS send/recv streams) was slow.  After ruling out
> > many possible causes, wireshark revealed that the connection was
> > suffering about 0.05% packet loss.  I don't know the source of that
> > packet loss, but I don't believe it to be congestion-related.  Along
> > with a 54ms RTT, that's a fatal combination for the throughput of
> > loss-based congestion control algorithms.  According to the Mathis
> > Formula [1], I could only expect 1.1 MBps over such a connection.
> > That's actually worse than what I saw.  With default settings
> > (cc_cubic), I averaged 5.6 MBps.  Probably Mathis's assumptions are
> > outdated, but that's still pretty close for such a simple formula
> > that's 27 years old.
> >
> > So I benchmarked all available congestion control algorithms for
> > single download streams.  The results are summarized in the table
> > below.
> >
> > Algo    Packet Loss Rate    Average Throughput
> > vegas   0.05%               2.0 MBps
> > newreno 0.05%               3.2 MBps
> > cubic   0.05%               5.6 MBps
> > hd      0.05%               8.6 MBps
> > cdg     0.05%               13.5 MBps
> > rack    0.04%               14 MBps
> > htcp    0.05%               15 MBps
> > dctcp   0.05%               15 MBps
> > chd     0.05%               17.3 MBps
> > bbr     0.05%               29.2 MBps
> > cubic   10%                 159 kBps
> > chd     10%                 208 kBps
> > bbr     10%                 5.7 MBps
> >
> > RACK seemed to achieve about the same maximum bandwidth as BBR, though
> > it took a lot longer to get there.  Also, with RACK, wireshark
> > reported about 10x as many retransmissions as dropped packets, which
> > is suspicious.
> >
> > At one point, something went haywire and packet loss briefly spiked to
> > the neighborhood of 10%.  I took advantage of the chaos to repeat my
> > measurements.  As the table shows, all algorithms sucked under those
> > conditions, but BBR sucked impressively less than the others.
> >
> > Disclaimer: there was significant run-to-run variation; the presented
> > results are averages.  And I did not attempt to measure packet loss
> > exactly for most runs; 0.05% is merely an average of a few selected
> > runs.  These measurements were taken on a production server running a
> > real workload, which introduces noise.  Soon I hope to have the
> > opportunity to repeat the experiment on an idle server in the same
> > environment.
> >
> > In conclusion, while we'd like to use BBR, we really can't until we
> > upgrade to 14.1, which hopefully will be soon.  So in the meantime
> > we've switched all relevant servers from cubic to chd, and we'll
> > reevaluate BBR after the upgrade.
> Hi Alan,
>
> just to be clear: the version of BBR currently implemented is
> BBR version 1, which is known to be unfair in certain scenarios.
> Google is still working on BBR to address this problem and improve
> it in other aspects. But there is no RFC yet and the updates haven't
> been implemented yet in FreeBSD.
>
> Best regards
> Michael
> >
> > [1]: https://www.slac.stanford.edu/comp/net/wan-mon/thru-vs-loss.html
> >
> > -Alan
>
>
>

--=20
Junho Choi <junho dot choi at gmail.com> | https://saturnsoft.net

--0000000000009edf8f061d852cb0
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>Alan - this is a great result to see. Thanks for expe=
rimenting.<br></div><div><br></div><div>Just curious why bbr and rack don&#=
39;t co-exist? Those are two separate things.</div><div>Is it a current bug=
 or by design?</div><div><br></div><div>BR,<br></div></div><br><div class=
=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Thu, Jul 18, 2024=
 at 5:27=E2=80=AFAM &lt;<a href=3D"mailto:tuexen@freebsd.org">tuexen@freebs=
d.org</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"ma=
rgin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:=
1ex">&gt; On 17. Jul 2024, at 22:00, Alan Somers &lt;<a href=3D"mailto:asom=
ers@freebsd.org" target=3D"_blank">asomers@freebsd.org</a>&gt; wrote:<br>
&gt; <br>
&gt; On Sat, Jul 13, 2024 at 1:50=E2=80=AFAM &lt;<a href=3D"mailto:tuexen@f=
reebsd.org" target=3D"_blank">tuexen@freebsd.org</a>&gt; wrote:<br>
&gt;&gt; <br>
&gt;&gt;&gt; On 13. Jul 2024, at 01:43, Alan Somers &lt;asomers@FreeBSD.org=
&gt; wrote:<br>
&gt;&gt;&gt; <br>
&gt;&gt;&gt; I&#39;ve been experimenting with RACK and BBR.=C2=A0 In my env=
ironment, they<br>
&gt;&gt;&gt; can dramatically improve single-stream TCP performance, which =
is<br>
&gt;&gt;&gt; awesome.=C2=A0 But pf interferes.=C2=A0 I have to disable pf i=
n order for them<br>
&gt;&gt;&gt; to work at all.<br>
&gt;&gt;&gt; <br>
&gt;&gt;&gt; Is this a known limitation?=C2=A0 If not, I will experiment so=
me more to<br>
&gt;&gt;&gt; determine exactly what aspect of my pf configuration is respon=
sible.<br>
&gt;&gt;&gt; If so, can anybody suggest what changes would have to happen t=
o make<br>
&gt;&gt;&gt; the two compatible?<br>
&gt;&gt; A problem with same symptoms was already reported and fixed in<br>
&gt;&gt; <a href=3D"https://reviews.freebsd.org/D43769" rel=3D"noreferrer" =
target=3D"_blank">https://reviews.freebsd.org/D43769</a><br>;
&gt;&gt; <br>
&gt;&gt; Which version are you using?<br>
&gt;&gt; <br>
&gt;&gt; Best regards<br>
&gt;&gt; Michael<br>
&gt;&gt;&gt; <br>
&gt;&gt;&gt; -Alan<br>
&gt; <br>
&gt; TLDR; tcp_rack is good, cc_chd is better, and tcp_bbr is best<br>
&gt; <br>
&gt; I want to follow up with the list to post my conclusions.=C2=A0 Firstl=
y<br>
&gt; tuexen@ helped me solve my problem: in FreeBSD 14.0 there is a 3-way<b=
r>
&gt; incompatibility between (tcp_bbr || tcp_rack) &amp;&amp; lro &amp;&amp=
; pf.=C2=A0 I can<br>
&gt; confirm that tcp_bbr works for me if I either disable LRO, disable PF,=
<br>
&gt; or switch to a 14.1 server.<br>
&gt; <br>
&gt; Here&#39;s the real problem: on multiple production servers, downloadi=
ng<br>
&gt; large files (or ZFS send/recv streams) was slow.=C2=A0 After ruling ou=
t<br>
&gt; many possible causes, wireshark revealed that the connection was<br>
&gt; suffering about 0.05% packet loss.=C2=A0 I don&#39;t know the source o=
f that<br>
&gt; packet loss, but I don&#39;t believe it to be congestion-related.=C2=
=A0 Along<br>
&gt; with a 54ms RTT, that&#39;s a fatal combination for the throughput of<=
br>
&gt; loss-based congestion control algorithms.=C2=A0 According to the Mathi=
s<br>
&gt; Formula [1], I could only expect 1.1 MBps over such a connection.<br>
&gt; That&#39;s actually worse than what I saw.=C2=A0 With default settings=
<br>
&gt; (cc_cubic), I averaged 5.6 MBps.=C2=A0 Probably Mathis&#39;s assumptio=
ns are<br>
&gt; outdated, but that&#39;s still pretty close for such a simple formula<=
br>
&gt; that&#39;s 27 years old.<br>
&gt; <br>
&gt; So I benchmarked all available congestion control algorithms for<br>
&gt; single download streams.=C2=A0 The results are summarized in the table=
<br>
&gt; below.<br>
&gt; <br>
&gt; Algo=C2=A0 =C2=A0 Packet Loss Rate=C2=A0 =C2=A0 Average Throughput<br>
&gt; vegas=C2=A0 =C2=A00.05%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A02.0 MBps<br>
&gt; newreno 0.05%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A03.=
2 MBps<br>
&gt; cubic=C2=A0 =C2=A00.05%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A05.6 MBps<br>
&gt; hd=C2=A0 =C2=A0 =C2=A0 0.05%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A08.6 MBps<br>
&gt; cdg=C2=A0 =C2=A0 =C2=A00.05%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A013.5 MBps<br>
&gt; rack=C2=A0 =C2=A0 0.04%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A014 MBps<br>
&gt; htcp=C2=A0 =C2=A0 0.05%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A015 MBps<br>
&gt; dctcp=C2=A0 =C2=A00.05%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A015 MBps<br>
&gt; chd=C2=A0 =C2=A0 =C2=A00.05%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A017.3 MBps<br>
&gt; bbr=C2=A0 =C2=A0 =C2=A00.05%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A029.2 MBps<br>
&gt; cubic=C2=A0 =C2=A010%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0159 kBps<br>
&gt; chd=C2=A0 =C2=A0 =C2=A010%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0208 kBps<br>
&gt; bbr=C2=A0 =C2=A0 =C2=A010%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A05.7 MBps<br>
&gt; <br>
&gt; RACK seemed to achieve about the same maximum bandwidth as BBR, though=
<br>
&gt; it took a lot longer to get there.=C2=A0 Also, with RACK, wireshark<br=
>
&gt; reported about 10x as many retransmissions as dropped packets, which<b=
r>
&gt; is suspicious.<br>
&gt; <br>
&gt; At one point, something went haywire and packet loss briefly spiked to=
<br>
&gt; the neighborhood of 10%.=C2=A0 I took advantage of the chaos to repeat=
 my<br>
&gt; measurements.=C2=A0 As the table shows, all algorithms sucked under th=
ose<br>
&gt; conditions, but BBR sucked impressively less than the others.<br>
&gt; <br>
&gt; Disclaimer: there was significant run-to-run variation; the presented<=
br>
&gt; results are averages.=C2=A0 And I did not attempt to measure packet lo=
ss<br>
&gt; exactly for most runs; 0.05% is merely an average of a few selected<br=
>
&gt; runs.=C2=A0 These measurements were taken on a production server runni=
ng a<br>
&gt; real workload, which introduces noise.=C2=A0 Soon I hope to have the<b=
r>
&gt; opportunity to repeat the experiment on an idle server in the same<br>
&gt; environment.<br>
&gt; <br>
&gt; In conclusion, while we&#39;d like to use BBR, we really can&#39;t unt=
il we<br>
&gt; upgrade to 14.1, which hopefully will be soon.=C2=A0 So in the meantim=
e<br>
&gt; we&#39;ve switched all relevant servers from cubic to chd, and we&#39;=
ll<br>
&gt; reevaluate BBR after the upgrade.<br>
Hi Alan,<br>
<br>
just to be clear: the version of BBR currently implemented is<br>
BBR version 1, which is known to be unfair in certain scenarios.<br>
Google is still working on BBR to address this problem and improve<br>
it in other aspects. But there is no RFC yet and the updates haven&#39;t<br=
>
been implemented yet in FreeBSD.<br>
<br>
Best regards<br>
Michael<br>
&gt; <br>
&gt; [1]: <a href=3D"https://www.slac.stanford.edu/comp/net/wan-mon/thru-vs=
-loss.html" rel=3D"noreferrer" target=3D"_blank">https://www.slac.stanford.=
edu/comp/net/wan-mon/thru-vs-loss.html</a><br>
&gt; <br>
&gt; -Alan<br>
<br>
<br>
</blockquote></div><br clear=3D"all"><br><span class=3D"gmail_signature_pre=
fix">-- </span><br><div dir=3D"ltr" class=3D"gmail_signature"><div dir=3D"l=
tr"><div><div dir=3D"ltr">Junho Choi &lt;junho dot choi at <a href=3D"http:=
//gmail.com" target=3D"_blank">gmail.com</a>&gt; | <a href=3D"https://satur=
nsoft.net" target=3D"_blank">https://saturnsoft.net</a><br></div></div></di=
v></div>

--0000000000009edf8f061d852cb0--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ5e%2BHAvNbazCkd_G_E=QojqknQe23khCimyKWk=TTyzHr2j0Q>