Date: Thu, 18 Jul 2024 22:00:38 +0900 From: Junho Choi <junho.choi@gmail.com> To: tuexen@freebsd.org Cc: Alan Somers <asomers@freebsd.org>, FreeBSD Net <freebsd-net@freebsd.org> Subject: Re: TCP Success Story (was Re: TCP_RACK, TCP_BBR, and firewalls) Message-ID: <CAJ5e%2BHAvNbazCkd_G_E=QojqknQe23khCimyKWk=TTyzHr2j0Q@mail.gmail.com> In-Reply-To: <B86DCBA6-542F-4951-A726-3A66D3D640D6@freebsd.org> References: <CAOtMX2iLv5OW4jQiBOHqMvcqkQSznTyO-eWMrOcHWbpeyaeRsg@mail.gmail.com> <C7467BCD-7232-4C6C-873E-EEC2482214A7@freebsd.org> <CAOtMX2hGYfm0U0L25-vHSX0iOyKCbZydaAzye6Y6U59mQeF7rA@mail.gmail.com> <B86DCBA6-542F-4951-A726-3A66D3D640D6@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--0000000000009edf8f061d852cb0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Alan - this is a great result to see. Thanks for experimenting. Just curious why bbr and rack don't co-exist? Those are two separate things= . Is it a current bug or by design? BR, On Thu, Jul 18, 2024 at 5:27=E2=80=AFAM <tuexen@freebsd.org> wrote: > > On 17. Jul 2024, at 22:00, Alan Somers <asomers@freebsd.org> wrote: > > > > On Sat, Jul 13, 2024 at 1:50=E2=80=AFAM <tuexen@freebsd.org> wrote: > >> > >>> On 13. Jul 2024, at 01:43, Alan Somers <asomers@FreeBSD.org> wrote: > >>> > >>> I've been experimenting with RACK and BBR. In my environment, they > >>> can dramatically improve single-stream TCP performance, which is > >>> awesome. But pf interferes. I have to disable pf in order for them > >>> to work at all. > >>> > >>> Is this a known limitation? If not, I will experiment some more to > >>> determine exactly what aspect of my pf configuration is responsible. > >>> If so, can anybody suggest what changes would have to happen to make > >>> the two compatible? > >> A problem with same symptoms was already reported and fixed in > >> https://reviews.freebsd.org/D43769 > >> > >> Which version are you using? > >> > >> Best regards > >> Michael > >>> > >>> -Alan > > > > TLDR; tcp_rack is good, cc_chd is better, and tcp_bbr is best > > > > I want to follow up with the list to post my conclusions. Firstly > > tuexen@ helped me solve my problem: in FreeBSD 14.0 there is a 3-way > > incompatibility between (tcp_bbr || tcp_rack) && lro && pf. I can > > confirm that tcp_bbr works for me if I either disable LRO, disable PF, > > or switch to a 14.1 server. > > > > Here's the real problem: on multiple production servers, downloading > > large files (or ZFS send/recv streams) was slow. After ruling out > > many possible causes, wireshark revealed that the connection was > > suffering about 0.05% packet loss. I don't know the source of that > > packet loss, but I don't believe it to be congestion-related. Along > > with a 54ms RTT, that's a fatal combination for the throughput of > > loss-based congestion control algorithms. According to the Mathis > > Formula [1], I could only expect 1.1 MBps over such a connection. > > That's actually worse than what I saw. With default settings > > (cc_cubic), I averaged 5.6 MBps. Probably Mathis's assumptions are > > outdated, but that's still pretty close for such a simple formula > > that's 27 years old. > > > > So I benchmarked all available congestion control algorithms for > > single download streams. The results are summarized in the table > > below. > > > > Algo Packet Loss Rate Average Throughput > > vegas 0.05% 2.0 MBps > > newreno 0.05% 3.2 MBps > > cubic 0.05% 5.6 MBps > > hd 0.05% 8.6 MBps > > cdg 0.05% 13.5 MBps > > rack 0.04% 14 MBps > > htcp 0.05% 15 MBps > > dctcp 0.05% 15 MBps > > chd 0.05% 17.3 MBps > > bbr 0.05% 29.2 MBps > > cubic 10% 159 kBps > > chd 10% 208 kBps > > bbr 10% 5.7 MBps > > > > RACK seemed to achieve about the same maximum bandwidth as BBR, though > > it took a lot longer to get there. Also, with RACK, wireshark > > reported about 10x as many retransmissions as dropped packets, which > > is suspicious. > > > > At one point, something went haywire and packet loss briefly spiked to > > the neighborhood of 10%. I took advantage of the chaos to repeat my > > measurements. As the table shows, all algorithms sucked under those > > conditions, but BBR sucked impressively less than the others. > > > > Disclaimer: there was significant run-to-run variation; the presented > > results are averages. And I did not attempt to measure packet loss > > exactly for most runs; 0.05% is merely an average of a few selected > > runs. These measurements were taken on a production server running a > > real workload, which introduces noise. Soon I hope to have the > > opportunity to repeat the experiment on an idle server in the same > > environment. > > > > In conclusion, while we'd like to use BBR, we really can't until we > > upgrade to 14.1, which hopefully will be soon. So in the meantime > > we've switched all relevant servers from cubic to chd, and we'll > > reevaluate BBR after the upgrade. > Hi Alan, > > just to be clear: the version of BBR currently implemented is > BBR version 1, which is known to be unfair in certain scenarios. > Google is still working on BBR to address this problem and improve > it in other aspects. But there is no RFC yet and the updates haven't > been implemented yet in FreeBSD. > > Best regards > Michael > > > > [1]: https://www.slac.stanford.edu/comp/net/wan-mon/thru-vs-loss.html > > > > -Alan > > > --=20 Junho Choi <junho dot choi at gmail.com> | https://saturnsoft.net --0000000000009edf8f061d852cb0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div>Alan - this is a great result to see. Thanks for expe= rimenting.<br></div><div><br></div><div>Just curious why bbr and rack don&#= 39;t co-exist? Those are two separate things.</div><div>Is it a current bug= or by design?</div><div><br></div><div>BR,<br></div></div><br><div class= =3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Thu, Jul 18, 2024= at 5:27=E2=80=AFAM <<a href=3D"mailto:tuexen@freebsd.org">tuexen@freebs= d.org</a>> wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"ma= rgin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:= 1ex">> On 17. Jul 2024, at 22:00, Alan Somers <<a href=3D"mailto:asom= ers@freebsd.org" target=3D"_blank">asomers@freebsd.org</a>> wrote:<br> > <br> > On Sat, Jul 13, 2024 at 1:50=E2=80=AFAM <<a href=3D"mailto:tuexen@f= reebsd.org" target=3D"_blank">tuexen@freebsd.org</a>> wrote:<br> >> <br> >>> On 13. Jul 2024, at 01:43, Alan Somers <asomers@FreeBSD.org= > wrote:<br> >>> <br> >>> I've been experimenting with RACK and BBR.=C2=A0 In my env= ironment, they<br> >>> can dramatically improve single-stream TCP performance, which = is<br> >>> awesome.=C2=A0 But pf interferes.=C2=A0 I have to disable pf i= n order for them<br> >>> to work at all.<br> >>> <br> >>> Is this a known limitation?=C2=A0 If not, I will experiment so= me more to<br> >>> determine exactly what aspect of my pf configuration is respon= sible.<br> >>> If so, can anybody suggest what changes would have to happen t= o make<br> >>> the two compatible?<br> >> A problem with same symptoms was already reported and fixed in<br> >> <a href=3D"https://reviews.freebsd.org/D43769" rel=3D"noreferrer" = target=3D"_blank">https://reviews.freebsd.org/D43769</a><br> >> <br> >> Which version are you using?<br> >> <br> >> Best regards<br> >> Michael<br> >>> <br> >>> -Alan<br> > <br> > TLDR; tcp_rack is good, cc_chd is better, and tcp_bbr is best<br> > <br> > I want to follow up with the list to post my conclusions.=C2=A0 Firstl= y<br> > tuexen@ helped me solve my problem: in FreeBSD 14.0 there is a 3-way<b= r> > incompatibility between (tcp_bbr || tcp_rack) && lro &&= ; pf.=C2=A0 I can<br> > confirm that tcp_bbr works for me if I either disable LRO, disable PF,= <br> > or switch to a 14.1 server.<br> > <br> > Here's the real problem: on multiple production servers, downloadi= ng<br> > large files (or ZFS send/recv streams) was slow.=C2=A0 After ruling ou= t<br> > many possible causes, wireshark revealed that the connection was<br> > suffering about 0.05% packet loss.=C2=A0 I don't know the source o= f that<br> > packet loss, but I don't believe it to be congestion-related.=C2= =A0 Along<br> > with a 54ms RTT, that's a fatal combination for the throughput of<= br> > loss-based congestion control algorithms.=C2=A0 According to the Mathi= s<br> > Formula [1], I could only expect 1.1 MBps over such a connection.<br> > That's actually worse than what I saw.=C2=A0 With default settings= <br> > (cc_cubic), I averaged 5.6 MBps.=C2=A0 Probably Mathis's assumptio= ns are<br> > outdated, but that's still pretty close for such a simple formula<= br> > that's 27 years old.<br> > <br> > So I benchmarked all available congestion control algorithms for<br> > single download streams.=C2=A0 The results are summarized in the table= <br> > below.<br> > <br> > Algo=C2=A0 =C2=A0 Packet Loss Rate=C2=A0 =C2=A0 Average Throughput<br> > vegas=C2=A0 =C2=A00.05%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A02.0 MBps<br> > newreno 0.05%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A03.= 2 MBps<br> > cubic=C2=A0 =C2=A00.05%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A05.6 MBps<br> > hd=C2=A0 =C2=A0 =C2=A0 0.05%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A08.6 MBps<br> > cdg=C2=A0 =C2=A0 =C2=A00.05%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A013.5 MBps<br> > rack=C2=A0 =C2=A0 0.04%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A014 MBps<br> > htcp=C2=A0 =C2=A0 0.05%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A015 MBps<br> > dctcp=C2=A0 =C2=A00.05%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A015 MBps<br> > chd=C2=A0 =C2=A0 =C2=A00.05%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A017.3 MBps<br> > bbr=C2=A0 =C2=A0 =C2=A00.05%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A029.2 MBps<br> > cubic=C2=A0 =C2=A010%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0159 kBps<br> > chd=C2=A0 =C2=A0 =C2=A010%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0208 kBps<br> > bbr=C2=A0 =C2=A0 =C2=A010%=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A05.7 MBps<br> > <br> > RACK seemed to achieve about the same maximum bandwidth as BBR, though= <br> > it took a lot longer to get there.=C2=A0 Also, with RACK, wireshark<br= > > reported about 10x as many retransmissions as dropped packets, which<b= r> > is suspicious.<br> > <br> > At one point, something went haywire and packet loss briefly spiked to= <br> > the neighborhood of 10%.=C2=A0 I took advantage of the chaos to repeat= my<br> > measurements.=C2=A0 As the table shows, all algorithms sucked under th= ose<br> > conditions, but BBR sucked impressively less than the others.<br> > <br> > Disclaimer: there was significant run-to-run variation; the presented<= br> > results are averages.=C2=A0 And I did not attempt to measure packet lo= ss<br> > exactly for most runs; 0.05% is merely an average of a few selected<br= > > runs.=C2=A0 These measurements were taken on a production server runni= ng a<br> > real workload, which introduces noise.=C2=A0 Soon I hope to have the<b= r> > opportunity to repeat the experiment on an idle server in the same<br> > environment.<br> > <br> > In conclusion, while we'd like to use BBR, we really can't unt= il we<br> > upgrade to 14.1, which hopefully will be soon.=C2=A0 So in the meantim= e<br> > we've switched all relevant servers from cubic to chd, and we'= ll<br> > reevaluate BBR after the upgrade.<br> Hi Alan,<br> <br> just to be clear: the version of BBR currently implemented is<br> BBR version 1, which is known to be unfair in certain scenarios.<br> Google is still working on BBR to address this problem and improve<br> it in other aspects. But there is no RFC yet and the updates haven't<br= > been implemented yet in FreeBSD.<br> <br> Best regards<br> Michael<br> > <br> > [1]: <a href=3D"https://www.slac.stanford.edu/comp/net/wan-mon/thru-vs= -loss.html" rel=3D"noreferrer" target=3D"_blank">https://www.slac.stanford.= edu/comp/net/wan-mon/thru-vs-loss.html</a><br> > <br> > -Alan<br> <br> <br> </blockquote></div><br clear=3D"all"><br><span class=3D"gmail_signature_pre= fix">-- </span><br><div dir=3D"ltr" class=3D"gmail_signature"><div dir=3D"l= tr"><div><div dir=3D"ltr">Junho Choi <junho dot choi at <a href=3D"http:= //gmail.com" target=3D"_blank">gmail.com</a>> | <a href=3D"https://satur= nsoft.net" target=3D"_blank">https://saturnsoft.net</a><br></div></div></di= v></div> --0000000000009edf8f061d852cb0--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ5e%2BHAvNbazCkd_G_E=QojqknQe23khCimyKWk=TTyzHr2j0Q>