Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 25 Oct 2024 08:13:24 -0400
From:      Cheng Cui <cc@freebsd.org>
To:        void <void@f-m.fm>
Cc:        freebsd-net@freebsd.org
Subject:   Re: Performance test for CUBIC in stable/14
Message-ID:  <CAGaXuiKaUurcPwwB-gJYMU2ce3i3cn7mT0JFjZCkM_dU5gBNXg@mail.gmail.com>
In-Reply-To: <Zxlt-dQHwz_Gl_Sz@int21h>
References:  <ZxJe8e8sRU9NCHv4@vm2> <CAGaXuiKD-b4PGrqfxy9zk-BRxU==HMc9KshqyJGzH8saeOLf1A@mail.gmail.com> <ZxaccxFblDt0UQWR@int21h> <CAGaXui%2BQ7wCM1dAKis%2BvNaNJ5uODeiC765hoXWT4OBtT7npprw@mail.gmail.com> <ZxfFRg3tYtdQt0hM@vm2> <CAGaXuiJvC2i5yxEaaDmHMoadPzkk3oqQOzg1yiqBuhTR%2B=R9Sg@mail.gmail.com> <ZxgHOlRaCR6Joqdv@vm2> <CAGaXui%2BEYmRhOdwOqFRbVsboCkrrWmnHnWRSqsSAgzbn5ug6bg@mail.gmail.com> <ZxkQiHWlDz28az-N@vm2> <CAGaXuiLuuXW_gFMq=--1Z2rKZ4ZZUiV52BnW10FxDYZc6vBZ-Q@mail.gmail.com> <Zxlt-dQHwz_Gl_Sz@int21h>

next in thread | previous in thread | raw e-mail | index | archive | help
--000000000000e8353806254c0df1
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Here is my example. I am using two 6-core/12-threads desktops for my bhyve
servers.
CPU: AMD Ryzen 5 5560U with Radeon Graphics          (2295.75-MHz K8-class
CPU)

You can find test results on VMs from my wiki:
https://wiki.freebsd.org/chengcui/testD46046

All the CPU utilization results are low, especially for these throughput
over 900 Mb/s.

cc

On Wed, Oct 23, 2024 at 5:43=E2=80=AFPM void <void@f-m.fm> wrote:

> On Wed, Oct 23, 2024 at 03:14:08PM -0400, Cheng Cui wrote:
> >I see. The result of `newreno` vs. `cubic` shows non-constant/infrequent
> >packet
> >retransmission. So TCP congestion control has little impact on improving
> the
> >performance.
> >
> >The performance bottleneck may come from somewhere else. For example, th=
e
> >sender CPU shows 97.7% utilization. Would there be any way to reduce CPU
> >usage?
>
> There are 11 VMs running on the bhyve server. None of them are very busy
> but the
> server shows
> % uptime
>   9:54p.m.  up 8 days,  6:08, 22 users, load averages: 0.82, 1.25, 1.74
>
> The test vm vm4-fbsd14s:
> % uptime
>   9:55PM  up 2 days,  3:12, 5 users, load averages: 0.35, 0.31, 0.21
>
> It has
> % sysctl hw.ncpu
> hw.ncpu: 8
>
> and
> avail memory =3D 66843062272 (63746 MB)
>
> so it's not short of resources.
>
> A test just now gave these results:
> - - - - - - - - - - - - - - - - - - - - - - - - -
> Test Complete. Summary Results:
> [ ID] Interval           Transfer     Bitrate         Retr
> [  5]   0.00-20.04  sec  1.31 GBytes   563 Mbits/sec    0
>  sender
> [  5]   0.00-20.06  sec  1.31 GBytes   563 Mbits/sec
> receiver
> CPU Utilization: local/sender 94.1% (0.1%u/94.1%s), remote/receiver 15.5%
> (1.5%u/13.9%s)
> snd_tcp_congestion cubic
> rcv_tcp_congestion cubic
>
> iperf Done.
>
> so I'm not sure how the utilization figure was synthesised, unless it's
> derived
> from something like 'top' where 1.00 is 100%. Load when running the test
> got to
> 0.83 as observed in 'top' in another terminal. Five mins after the test,
> load in the vm is: 0.32, 0.31, 0.26
> on the bhyve host: 0.39, 0.61, 1.11
>
> Before we began testing, I was looking at the speed issue as being caused
> by
> something to do with interrupts and/or polling, and/or HZ, somehting that
> linux
> handles differently and gives better results on the same bhyve host.
> Maybe rebuilding the kernel with a different scheduler on both the host
> and the
> freebsd vms will give a better result for freebsd if tweaking sysctls
> doesn't
> make much of a difference.
>
> In terms of real-world bandwidth, I found that the combination of your
> modified
> cc_cubic + rack gave the best results in terms of overall throughput in a
> speedtest context, although it's slower to get to its max throughput than
> cubic
> alone. I'm still testing with a webdav/rsync context (cubic against
> cubic+rack)
>
> The next lot of testing after changing the scheduler will be on a KVM
> host,
> with various *BSDs as guests.
>
> There may be a tradeoff of stability against speed I guess.
> --
>
>

--=20
Best Regards,
Cheng Cui

--000000000000e8353806254c0df1
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>Here is my example. I am using two 6-core/12-threads =
desktops for my bhyve servers.</div><div><span style=3D"color:rgb(242,242,2=
42);font-family:Monaco;background-color:rgb(0,0,0)">CPU: AMD Ryzen 5 5560U =
with Radeon Graphics</span><span class=3D"gmail-Apple-converted-space" styl=
e=3D"color:rgb(242,242,242);font-family:Monaco">=C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 </span><span style=3D"color:rgb(242,242,242);font-family:Monaco;bac=
kground-color:rgb(0,0,0)">(2295.75-MHz K8-class CPU)</span></div><div><br><=
/div><div>You can find test results on VMs from my wiki:=C2=A0<a href=3D"ht=
tps://wiki.freebsd.org/chengcui/testD46046">https://wiki.freebsd.org/chengc=
ui/testD46046</a></div><div><br></div><div>All the CPU utilization results =
are low, especially for these throughput over 900 Mb/s.</div><div><br></div=
><div>cc</div><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmai=
l_attr">On Wed, Oct 23, 2024 at 5:43=E2=80=AFPM void &lt;<a href=3D"mailto:=
void@f-m.fm">void@f-m.fm</a>&gt; wrote:<br></div><blockquote class=3D"gmail=
_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left=
-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">On Wed, O=
ct 23, 2024 at 03:14:08PM -0400, Cheng Cui wrote:<br>
&gt;I see. The result of `newreno` vs. `cubic` shows non-constant/infrequen=
t<br>
&gt;packet<br>
&gt;retransmission. So TCP congestion control has little impact on improvin=
g the<br>
&gt;performance.<br>
&gt;<br>
&gt;The performance bottleneck may come from somewhere else. For example, t=
he<br>
&gt;sender CPU shows 97.7% utilization. Would there be any way to reduce CP=
U<br>
&gt;usage?<br>
<br>
There are 11 VMs running on the bhyve server. None of them are very busy bu=
t the<br>
server shows <br>
% uptime<br>
=C2=A0 9:54p.m.=C2=A0 up 8 days,=C2=A0 6:08, 22 users, load averages: 0.82,=
 1.25, 1.74<br>
<br>
The test vm vm4-fbsd14s:<br>
% uptime<br>
=C2=A0 9:55PM=C2=A0 up 2 days,=C2=A0 3:12, 5 users, load averages: 0.35, 0.=
31, 0.21<br>
<br>
It has <br>
% sysctl hw.ncpu<br>
hw.ncpu: 8<br>
<br>
and<br>
avail memory =3D 66843062272 (63746 MB)<br>
<br>
so it&#39;s not short of resources.<br>
<br>
A test just now gave these results:<br>
- - - - - - - - - - - - - - - - - - - - - - - - -<br>
Test Complete. Summary Results:<br>
[ ID] Interval=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Transfer=C2=A0 =C2=
=A0 =C2=A0Bitrate=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Retr<br>
[=C2=A0 5]=C2=A0 =C2=A00.00-20.04=C2=A0 sec=C2=A0 1.31 GBytes=C2=A0 =C2=A05=
63 Mbits/sec=C2=A0 =C2=A0 0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
sender<br>
[=C2=A0 5]=C2=A0 =C2=A00.00-20.06=C2=A0 sec=C2=A0 1.31 GBytes=C2=A0 =C2=A05=
63 Mbits/sec=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
receiver<br>
CPU Utilization: local/sender 94.1% (0.1%u/94.1%s), remote/receiver 15.5%<b=
r>
(1.5%u/13.9%s)<br>
snd_tcp_congestion cubic<br>
rcv_tcp_congestion cubic<br>
<br>
iperf Done.<br>
<br>
so I&#39;m not sure how the utilization figure was synthesised, unless it&#=
39;s derived<br>
from something like &#39;top&#39; where 1.00 is 100%. Load when running the=
 test got to<br>
0.83 as observed in &#39;top&#39; in another terminal. Five mins after the =
test, <br>
load in the vm is: 0.32, 0.31, 0.26<br>
on the bhyve host: 0.39, 0.61, 1.11<br>
<br>
Before we began testing, I was looking at the speed issue as being caused b=
y<br>
something to do with interrupts and/or polling, and/or HZ, somehting that l=
inux<br>
handles differently and gives better results on the same bhyve host.<br>
Maybe rebuilding the kernel with a different scheduler on both the host and=
 the<br>
freebsd vms will give a better result for freebsd if tweaking sysctls doesn=
&#39;t<br>
make much of a difference.<br>
<br>
In terms of real-world bandwidth, I found that the combination of your modi=
fied<br>
cc_cubic + rack gave the best results in terms of overall throughput in a<b=
r>
speedtest context, although it&#39;s slower to get to its max throughput th=
an cubic<br>
alone. I&#39;m still testing with a webdav/rsync context (cubic against cub=
ic+rack)<br>
<br>
The next lot of testing after changing the scheduler will be on a KVM host,=
 <br>
with various *BSDs as guests.<br>
<br>
There may be a tradeoff of stability against speed I guess.<br>
-- <br>
<br>
</blockquote></div><br clear=3D"all"><div><br></div><span class=3D"gmail_si=
gnature_prefix">-- </span><br><div dir=3D"ltr" class=3D"gmail_signature"><d=
iv dir=3D"ltr"><div></div>Best Regards,<div>Cheng Cui</div></div></div></di=
v>

--000000000000e8353806254c0df1--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAGaXuiKaUurcPwwB-gJYMU2ce3i3cn7mT0JFjZCkM_dU5gBNXg>