Date: Fri, 25 Oct 2024 08:13:24 -0400 From: Cheng Cui <cc@freebsd.org> To: void <void@f-m.fm> Cc: freebsd-net@freebsd.org Subject: Re: Performance test for CUBIC in stable/14 Message-ID: <CAGaXuiKaUurcPwwB-gJYMU2ce3i3cn7mT0JFjZCkM_dU5gBNXg@mail.gmail.com> In-Reply-To: <Zxlt-dQHwz_Gl_Sz@int21h> References: <ZxJe8e8sRU9NCHv4@vm2> <CAGaXuiKD-b4PGrqfxy9zk-BRxU==HMc9KshqyJGzH8saeOLf1A@mail.gmail.com> <ZxaccxFblDt0UQWR@int21h> <CAGaXui%2BQ7wCM1dAKis%2BvNaNJ5uODeiC765hoXWT4OBtT7npprw@mail.gmail.com> <ZxfFRg3tYtdQt0hM@vm2> <CAGaXuiJvC2i5yxEaaDmHMoadPzkk3oqQOzg1yiqBuhTR%2B=R9Sg@mail.gmail.com> <ZxgHOlRaCR6Joqdv@vm2> <CAGaXui%2BEYmRhOdwOqFRbVsboCkrrWmnHnWRSqsSAgzbn5ug6bg@mail.gmail.com> <ZxkQiHWlDz28az-N@vm2> <CAGaXuiLuuXW_gFMq=--1Z2rKZ4ZZUiV52BnW10FxDYZc6vBZ-Q@mail.gmail.com> <Zxlt-dQHwz_Gl_Sz@int21h>
next in thread | previous in thread | raw e-mail | index | archive | help
--000000000000e8353806254c0df1 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Here is my example. I am using two 6-core/12-threads desktops for my bhyve servers. CPU: AMD Ryzen 5 5560U with Radeon Graphics (2295.75-MHz K8-class CPU) You can find test results on VMs from my wiki: https://wiki.freebsd.org/chengcui/testD46046 All the CPU utilization results are low, especially for these throughput over 900 Mb/s. cc On Wed, Oct 23, 2024 at 5:43=E2=80=AFPM void <void@f-m.fm> wrote: > On Wed, Oct 23, 2024 at 03:14:08PM -0400, Cheng Cui wrote: > >I see. The result of `newreno` vs. `cubic` shows non-constant/infrequent > >packet > >retransmission. So TCP congestion control has little impact on improving > the > >performance. > > > >The performance bottleneck may come from somewhere else. For example, th= e > >sender CPU shows 97.7% utilization. Would there be any way to reduce CPU > >usage? > > There are 11 VMs running on the bhyve server. None of them are very busy > but the > server shows > % uptime > 9:54p.m. up 8 days, 6:08, 22 users, load averages: 0.82, 1.25, 1.74 > > The test vm vm4-fbsd14s: > % uptime > 9:55PM up 2 days, 3:12, 5 users, load averages: 0.35, 0.31, 0.21 > > It has > % sysctl hw.ncpu > hw.ncpu: 8 > > and > avail memory =3D 66843062272 (63746 MB) > > so it's not short of resources. > > A test just now gave these results: > - - - - - - - - - - - - - - - - - - - - - - - - - > Test Complete. Summary Results: > [ ID] Interval Transfer Bitrate Retr > [ 5] 0.00-20.04 sec 1.31 GBytes 563 Mbits/sec 0 > sender > [ 5] 0.00-20.06 sec 1.31 GBytes 563 Mbits/sec > receiver > CPU Utilization: local/sender 94.1% (0.1%u/94.1%s), remote/receiver 15.5% > (1.5%u/13.9%s) > snd_tcp_congestion cubic > rcv_tcp_congestion cubic > > iperf Done. > > so I'm not sure how the utilization figure was synthesised, unless it's > derived > from something like 'top' where 1.00 is 100%. Load when running the test > got to > 0.83 as observed in 'top' in another terminal. Five mins after the test, > load in the vm is: 0.32, 0.31, 0.26 > on the bhyve host: 0.39, 0.61, 1.11 > > Before we began testing, I was looking at the speed issue as being caused > by > something to do with interrupts and/or polling, and/or HZ, somehting that > linux > handles differently and gives better results on the same bhyve host. > Maybe rebuilding the kernel with a different scheduler on both the host > and the > freebsd vms will give a better result for freebsd if tweaking sysctls > doesn't > make much of a difference. > > In terms of real-world bandwidth, I found that the combination of your > modified > cc_cubic + rack gave the best results in terms of overall throughput in a > speedtest context, although it's slower to get to its max throughput than > cubic > alone. I'm still testing with a webdav/rsync context (cubic against > cubic+rack) > > The next lot of testing after changing the scheduler will be on a KVM > host, > with various *BSDs as guests. > > There may be a tradeoff of stability against speed I guess. > -- > > --=20 Best Regards, Cheng Cui --000000000000e8353806254c0df1 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div>Here is my example. I am using two 6-core/12-threads = desktops for my bhyve servers.</div><div><span style=3D"color:rgb(242,242,2= 42);font-family:Monaco;background-color:rgb(0,0,0)">CPU: AMD Ryzen 5 5560U = with Radeon Graphics</span><span class=3D"gmail-Apple-converted-space" styl= e=3D"color:rgb(242,242,242);font-family:Monaco">=C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 </span><span style=3D"color:rgb(242,242,242);font-family:Monaco;bac= kground-color:rgb(0,0,0)">(2295.75-MHz K8-class CPU)</span></div><div><br><= /div><div>You can find test results on VMs from my wiki:=C2=A0<a href=3D"ht= tps://wiki.freebsd.org/chengcui/testD46046">https://wiki.freebsd.org/chengc= ui/testD46046</a></div><div><br></div><div>All the CPU utilization results = are low, especially for these throughput over 900 Mb/s.</div><div><br></div= ><div>cc</div><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmai= l_attr">On Wed, Oct 23, 2024 at 5:43=E2=80=AFPM void <<a href=3D"mailto:= void@f-m.fm">void@f-m.fm</a>> wrote:<br></div><blockquote class=3D"gmail= _quote" style=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left= -style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">On Wed, O= ct 23, 2024 at 03:14:08PM -0400, Cheng Cui wrote:<br> >I see. The result of `newreno` vs. `cubic` shows non-constant/infrequen= t<br> >packet<br> >retransmission. So TCP congestion control has little impact on improvin= g the<br> >performance.<br> ><br> >The performance bottleneck may come from somewhere else. For example, t= he<br> >sender CPU shows 97.7% utilization. Would there be any way to reduce CP= U<br> >usage?<br> <br> There are 11 VMs running on the bhyve server. None of them are very busy bu= t the<br> server shows <br> % uptime<br> =C2=A0 9:54p.m.=C2=A0 up 8 days,=C2=A0 6:08, 22 users, load averages: 0.82,= 1.25, 1.74<br> <br> The test vm vm4-fbsd14s:<br> % uptime<br> =C2=A0 9:55PM=C2=A0 up 2 days,=C2=A0 3:12, 5 users, load averages: 0.35, 0.= 31, 0.21<br> <br> It has <br> % sysctl hw.ncpu<br> hw.ncpu: 8<br> <br> and<br> avail memory =3D 66843062272 (63746 MB)<br> <br> so it's not short of resources.<br> <br> A test just now gave these results:<br> - - - - - - - - - - - - - - - - - - - - - - - - -<br> Test Complete. Summary Results:<br> [ ID] Interval=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Transfer=C2=A0 =C2= =A0 =C2=A0Bitrate=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Retr<br> [=C2=A0 5]=C2=A0 =C2=A00.00-20.04=C2=A0 sec=C2=A0 1.31 GBytes=C2=A0 =C2=A05= 63 Mbits/sec=C2=A0 =C2=A0 0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= sender<br> [=C2=A0 5]=C2=A0 =C2=A00.00-20.06=C2=A0 sec=C2=A0 1.31 GBytes=C2=A0 =C2=A05= 63 Mbits/sec=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = receiver<br> CPU Utilization: local/sender 94.1% (0.1%u/94.1%s), remote/receiver 15.5%<b= r> (1.5%u/13.9%s)<br> snd_tcp_congestion cubic<br> rcv_tcp_congestion cubic<br> <br> iperf Done.<br> <br> so I'm not sure how the utilization figure was synthesised, unless it&#= 39;s derived<br> from something like 'top' where 1.00 is 100%. Load when running the= test got to<br> 0.83 as observed in 'top' in another terminal. Five mins after the = test, <br> load in the vm is: 0.32, 0.31, 0.26<br> on the bhyve host: 0.39, 0.61, 1.11<br> <br> Before we began testing, I was looking at the speed issue as being caused b= y<br> something to do with interrupts and/or polling, and/or HZ, somehting that l= inux<br> handles differently and gives better results on the same bhyve host.<br> Maybe rebuilding the kernel with a different scheduler on both the host and= the<br> freebsd vms will give a better result for freebsd if tweaking sysctls doesn= 't<br> make much of a difference.<br> <br> In terms of real-world bandwidth, I found that the combination of your modi= fied<br> cc_cubic + rack gave the best results in terms of overall throughput in a<b= r> speedtest context, although it's slower to get to its max throughput th= an cubic<br> alone. I'm still testing with a webdav/rsync context (cubic against cub= ic+rack)<br> <br> The next lot of testing after changing the scheduler will be on a KVM host,= <br> with various *BSDs as guests.<br> <br> There may be a tradeoff of stability against speed I guess.<br> -- <br> <br> </blockquote></div><br clear=3D"all"><div><br></div><span class=3D"gmail_si= gnature_prefix">-- </span><br><div dir=3D"ltr" class=3D"gmail_signature"><d= iv dir=3D"ltr"><div></div>Best Regards,<div>Cheng Cui</div></div></div></di= v> --000000000000e8353806254c0df1--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAGaXuiKaUurcPwwB-gJYMU2ce3i3cn7mT0JFjZCkM_dU5gBNXg>