Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 29 Feb 2024 00:02:45 +0300
From:      Vitaliy Gusev <gusev.vitaliy@gmail.com>
To:        Matthew Grooms <mgrooms@shrew.net>
Cc:        virtualization@freebsd.org
Subject:   Re: bhyve disk performance issue
Message-ID:  <3850080E-EBD1-4414-9C4E-DD89611C9F58@gmail.com>
In-Reply-To: <b353b39a-56d3-4757-a607-3c612944b509@shrew.net>
References:  <6a128904-a4c1-41ec-a83d-56da56871ceb@shrew.net> <28ea168c-1211-4104-b8b4-daed0e60950d@app.fastmail.com> <0ff6f30a-b53a-4d0f-ac21-eaf701d35d00@shrew.net> <6f6b71ac-2349-4045-9eaf-5c50d42b89be@shrew.net> <50614ea4-f0f9-44a2-b5e6-ebb33cfffbc4@shrew.net> <6a4e7e1d-cca5-45d4-a268-1805a15d9819@shrew.net> <f01a9bca-7023-40c0-93f2-8cdbe4cd8078@tubnor.net> <edb80fff-561b-4dc5-95ee-204e0c6d95df@shrew.net> <a07d070b-4dc1-40c9-bc80-163cd59a5bfc@Duedinghausen.eu> <e45c95df-4858-48aa-a274-ba1bf8e599d5@shrew.net> <BE794E98-7B69-4626-BB66-B56F23D6A67E@gmail.com> <25ddf43d-f700-4cb5-af2a-1fe669d1e24b@shrew.net> <1DAEB435-A613-4A04-B63F-D7AF7A0B7C0A@gmail.com> <b353b39a-56d3-4757-a607-3c612944b509@shrew.net>

next in thread | previous in thread | raw e-mail | index | archive | help

--Apple-Mail=_8118B541-F589-4E79-AF6C-3E98D8AADC93
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=utf-8



> On 28 Feb 2024, at 23:03, Matthew Grooms <mgrooms@shrew.net> wrote:
>=20
> ...
> The virtual disks were provisioned with either a 128G disk image or a =
1TB raw partition, so I don't think space was an issue.
> Trim is definitely not an issue. I'm using a tiny fraction of the 32TB =
array have tried both heavily under-provisioned HW RAID10 and SW RAID10 =
using GEOM. The latter was tested after sending full trim resets to all =
drives individually.
>=20
It could be then TRIM/UNMAP is not used, zvol (for the instance) becomes =
full for the while. ZFS considers it as all blocks are used and write =
operations could  have troubles. I believe it was recently fixed.

Also look at this one:

    GuestFS->UNMAP->bhyve->Host-FS->PhysicalDisk

The problem of UNMAP that it could have unpredictable slowdown at any =
time. So I would suggest to check results with enabled and disabled =
UNMAP in a guest.

> I will try to incorporate the rest of your feedback into my next round =
of testing. If I can find a benchmark tool that works with a raw block =
device, that would be ideal.
>=20
>=20
Use =E2=80=9Cdd=E2=80=9D as the first step for read testing;

   ~# dd if=3D/dev/nvme0n2 of=3D/dev/null bs=3D1M status=3Dprogress =
flag=3Ddirect
   ~# dd if=3D/dev/nvme0n2 of=3D/dev/null bs=3D1M status=3Dprogress

Compare results with directio and without.

=E2=80=9Cfio=E2=80=9D tool.=20
=20
  1) write prepare:

       ~# fio  --name=3Dprep --rw=3Dwrite --verify=3Dcrc32 --loop=3D1 =
--numjobs=3D2  --time_based --thread  --bs=3D1M --iodepth=3D32  =
--ioengine=3Dlibaio --direct=3D1  --group_reporting  --size=3D20G  =
--filename=3D/dev/nvme0n2


 2)  read test:

      ~# fio  --name=3Dreadtest --rw=3Dread =E2=80=94loop=3D30 =
--numjobs=3D2  --time_based --thread  =E2=80=94bs=3D256K --iodepth=3D32  =
--ioengine=3Dlibaio --direct=3D1  --group_reporting  --size=3D20G  =
--filename=3D/dev/nvme0n2
    =20
=E2=80=94
Vitaliy =20
> Thanks,
>=20
> -Matthew
>=20
>=20
>=20
>> =E2=80=94=E2=80=94
>> Vitaliy
>>=20
>>> On 28 Feb 2024, at 21:29, Matthew Grooms <mgrooms@shrew.net> =
<mailto:mgrooms@shrew.net> wrote:
>>>=20
>>> On 2/27/24 04:21, Vitaliy Gusev wrote:
>>>> Hi,
>>>>=20
>>>>=20
>>>>> On 23 Feb 2024, at 18:37, Matthew Grooms <mgrooms@shrew.net> =
<mailto:mgrooms@shrew.net> wrote:
>>>>>=20
>>>>>> ...
>>>>> The problem occurs when an image file is used on either ZFS or =
UFS. The problem also occurs when the virtual disk is backed by a raw =
disk partition or a ZVOL. This issue isn't related to a specific =
underlying filesystem.
>>>>>=20
>>>>=20
>>>> Do I understand right, you ran testing inside VM inside guest VM  =
on ext4 filesystem? If so you should be aware about additional overhead =
in comparison when you were running tests on the hosts.
>>>>=20
>>> Hi Vitaliy,
>>>=20
>>> I appreciate you providing the feedback and suggestions. I spent =
over a week trying as many combinations of host and guest options as =
possible to narrow this issue down to a specific host storage or a guest =
device model option. Unfortunately the problem occurred with every =
combination I tested while running Linux as the guest. Note, I only =
tested RHEL8 & RHEL9 compatible distributions ( Alma & Rocky ). The =
problem did not occur when I ran FreeBSD as the guest. The problem did =
not occur when I ran KVM in the host and Linux as the guest.
>>>=20
>>>> I would suggest to run fio (or even dd) on raw disk device inside =
VM, i.e. without filesystem at all.  Just do not forget do =E2=80=9Cecho =
3 > /proc/sys/vm/drop_caches=E2=80=9D in Linux Guest VM before you run =
tests.=20
>>> The two servers I was using to test with are are no longer =
available. However, I'll have two more identical servers arriving in the =
next week or so. I'll try to run additional tests and report back here. =
I used bonnie++ as that was easily installed from the package repos on =
all the systems I tested.
>>>=20
>>>>=20
>>>> Could you also give more information about:
>>>>=20
>>>>  1. What results did you get (decode bonnie++ output)?
>>> If you look back at this email thread, there are many examples of =
running bonnie++ on the guest. I first ran the tests on the host system =
using Linux + ext4 and FreeBSD 14 + UFS & ZFS to get a baseline of =
performance. Then I ran bonnie++ tests using bhyve as the hypervisor and =
Linux & FreeBSD as the guest. The combination of host and guest storage =
options included ...
>>>=20
>>> 1) block device + virtio blk
>>> 2) block device + nvme
>>> 3) UFS disk image + virtio blk
>>> 4) UFS disk image + nvme
>>> 5) ZFS disk image + virtio blk
>>> 6) ZFS disk image + nvme
>>> 7) ZVOL + virtio blk
>>> 8) ZVOL + nvme
>>>=20
>>> In every instance, I observed the Linux guest disk IO often perform =
very well for some time after the guest was first booted. Then the =
performance of the guest would drop to a fraction of the original =
performance. The benchmark test was run every 5 or 10 minutes in a cron =
job. Sometimes the guest would perform well for up to an hour before =
performance would drop off. Most of the time it would only perform well =
for a few cycles ( 10 - 30 mins ) before performance would drop off. The =
only way to restore the performance was to reboot the guest. Once I =
determined that the problem was not specific to a particular host or =
guest storage option, I switched my testing to only use a block device =
as backing storage on the host to avoid hitting any system disk caches.
>>>=20
>>> Here is the test script I used in the cron job ...
>>>=20
>>> #!/bin/sh
>>> FNAME=3D'output.txt'
>>>=20
>>> echo =
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D >> $FNAME
>>> echo Begin @ `/usr/bin/date` >> $FNAME
>>> echo >> $FNAME
>>> /usr/sbin/bonnie++ 2>&1 | /usr/bin/grep -v 'done\|,' >> $FNAME
>>> echo >> $FNAME
>>> echo End @ `/usr/bin/date` >> $FNAME
>>>=20
>>> As you can see, I'm calling bonnie++ with the system defaults. That =
uses a data set size that's 2x the guest RAM in an attempt to minimize =
the effect of filesystem cache on results. Here is an example of the =
output that bonnie++ produces ...
>>>=20
>>> Version  2.00       ------Sequential Output------ --Sequential =
Input- --Random-
>>>                     -Per Chr- --Block-- -Rewrite- -Per Chr- =
--Block-- --Seeks--
>>> Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec =
%CP  /sec %CP
>>> linux-blk    63640M  694k  99  1.6g  99  737m  76  985k  99  1.3g  =
69 +++++ +++
>>> Latency             11579us     535us   11889us    8597us   21819us  =
  8238us
>>> Version  2.00       ------Sequential Create------ --------Random =
Create--------
>>> linux-blk           -Create-- --Read--- -Delete-- -Create-- =
--Read--- -Delete--
>>>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec =
%CP  /sec %CP
>>>                  16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ =
+++ +++++ +++
>>> Latency              7620us     126us    1648us     151us      15us  =
   633us
>>>=20
>>> --------------------------------- speed drop =
---------------------------------
>>>=20
>>> Version  2.00       ------Sequential Output------ --Sequential =
Input- --Random-
>>>                     -Per Chr- --Block-- -Rewrite- -Per Chr- =
--Block-- --Seeks--
>>> Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec =
%CP  /sec %CP
>>> linux-blk    63640M  676k  99  451m  99  314m  93  951k  99  402m  =
99 15167 530
>>> Latency             11902us    8959us   24711us   10185us   20884us  =
  5831us
>>> Version  2.00       ------Sequential Create------ --------Random =
Create--------
>>> linux-blk           -Create-- --Read--- -Delete-- -Create-- =
--Read--- -Delete--
>>>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec =
%CP  /sec %CP
>>>                  16     0  96 +++++ +++ +++++ +++     0  96 +++++ =
+++     0  75
>>> Latency               343us     165us    1636us     113us      55us  =
  1836us
>>>=20
>>> In the example above, the benchmark test repeated about 20 times =
with results that were similar to the performance shown above the dotted =
line ( ~ 1.6g/s seq write and 1.3g/s seq read ). After that, the =
performance dropped to what's shown below the dotted line which is less =
than 1/4 the original speed ( ~ 451m/s seq write and 402m/s seq read ).=20=

>>>=20
>>>>  2. What results expecting?
>>>>=20
>>> What I expect is that, when I perform the same test with the same =
parameters, the results would stay more or less consistent over time. =
This is true when KVM is used as the hypervisor on the same hardware and =
guest options. That said, I'm not worried about bhyve being consistently =
slower than kvm or a FreeBSD guest being consistently slower than a =
Linux guest. I'm concerned that the performance drop over time is =
indicative of an issue with how bhyve interacts with non-freebsd guests.
>>>=20
>>>>  3. VM configuration, virtio-blk disk size, etc.
>>>>  4. Full command for tests (including size of test-set), bhyve, =
etc.
>>> I believe this was answered above. Please let me know if you have =
additional questions.
>>>=20
>>>>=20
>>>>  5. Did you pass virtio-blk as 512 or 4K ? If 512, probably you =
should try 4K.
>>>>=20
>>> The testing performed was not exclusively with virtio-blk.
>>>=20
>>>=20
>>>>  6. Linux has several read-ahead options for IO schedule, and it =
could be related too.
>>>>=20
>>> I suppose it's possible that bhyve could be somehow causing the disk =
scheduler in the Linux guest to act differently. I'll see if I can =
figure out how to disable that in future tests.
>>>=20
>>>=20
>>>> Additionally could also you play with =E2=80=9Csync=3Ddisabled=E2=80=9D=
 volume/zvol option? Of course it is only for write testing.
>>> The testing performed was not exclusively with zvols.
>>>=20
>>>=20
>>> Once I have more hardware available, I'll try to report back with =
more testing. It may be interesting to also see how a Windows guest =
performs compared to Linux & FreeBSD. I suspect that this issue may only =
be triggered when a fast disk array is in use on the host. My tests use =
a 16x SSD RAID 10 array. It's also quite possible that the disk IO =
slowdown is only a symptom of another issue that's triggered by the disk =
IO test ( please see end of my last post related to scheduler priority =
observations ). All I can say for sure is that ...
>>>=20
>>> 1) There is a problem and it's reproducible across multiple hosts
>>> 2) It affects RHEL8 & RHEL9 guests but not FreeBSD guests
>>> 3) It is not specific to any host or guest storage option
>>>=20
>>> Thanks,
>>>=20
>>> -Matthew
>>>=20


--Apple-Mail=_8118B541-F589-4E79-AF6C-3E98D8AADC93
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=utf-8

<html><head><meta http-equiv=3D"content-type" content=3D"text/html; =
charset=3Dutf-8"></head><body style=3D"overflow-wrap: break-word; =
-webkit-nbsp-mode: space; line-break: after-white-space;"><br =
id=3D"lineBreakAtBeginningOfMessage"><div><br><blockquote =
type=3D"cite"><div>On 28 Feb 2024, at 23:03, Matthew Grooms =
&lt;mgrooms@shrew.net&gt; wrote:</div><br =
class=3D"Apple-interchange-newline"><div><meta charset=3D"UTF-8"><div =
class=3D"moz-cite-prefix" style=3D"caret-color: rgb(0, 0, 0); =
font-family: Helvetica; font-size: 14px; font-style: normal; =
font-variant-caps: normal; font-weight: 400; letter-spacing: normal; =
text-align: start; text-indent: 0px; text-transform: none; white-space: =
normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; =
text-decoration: none;">...</div></div></blockquote><blockquote =
type=3D"cite"><div><div class=3D"moz-cite-prefix" style=3D"caret-color: =
rgb(0, 0, 0); font-family: Helvetica; font-size: 14px; font-style: =
normal; font-variant-caps: normal; font-weight: 400; letter-spacing: =
normal; text-align: start; text-indent: 0px; text-transform: none; =
white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; =
text-decoration: none;">The virtual disks were provisioned with either a =
128G disk image or a 1TB raw partition, so I don't think space was an =
issue.</div><p style=3D"caret-color: rgb(0, 0, 0); font-family: =
Helvetica; font-size: 14px; font-style: normal; font-variant-caps: =
normal; font-weight: 400; letter-spacing: normal; text-align: start; =
text-indent: 0px; text-transform: none; white-space: normal; =
word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: =
none;">Trim is definitely not an issue. I'm using a tiny fraction of the =
32TB array have tried both heavily under-provisioned HW RAID10 and SW =
RAID10 using GEOM. The latter was tested after sending full trim resets =
to all drives individually.</p></div></blockquote><div>It could be then =
TRIM/UNMAP is not used, zvol (for the instance) becomes full for the =
while. ZFS considers it as all blocks are used and write operations =
could &nbsp;have troubles. I believe it was recently =
fixed.</div><div><br></div><div>Also look at this =
one:</div><div><br></div><div>&nbsp; &nbsp; =
GuestFS-&gt;UNMAP-&gt;bhyve-&gt;Host-FS-&gt;PhysicalDisk</div><div><br></d=
iv><div>The problem of UNMAP that it could have unpredictable slowdown =
at any time. So I would suggest to check results with enabled and =
disabled UNMAP in a guest.</div><br><blockquote type=3D"cite"><div><p =
style=3D"caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: =
14px; font-style: normal; font-variant-caps: normal; font-weight: 400; =
letter-spacing: normal; text-align: start; text-indent: 0px; =
text-transform: none; white-space: normal; word-spacing: 0px; =
-webkit-text-stroke-width: 0px; text-decoration: none;">I will try to =
incorporate the rest of your feedback into my next round of testing. If =
I can find a benchmark tool that works with a raw block device, that =
would be ideal.<br><br></p></div></blockquote><div>Use =E2=80=9Cdd=E2=80=9D=
 as the first step for read testing;</div><div><br></div><div>&nbsp; =
&nbsp;~# dd if=3D/dev/nvme0n2 of=3D/dev/null bs=3D1M status=3Dprogress =
flag=3Ddirect</div><div>&nbsp; &nbsp;~# dd if=3D/dev/nvme0n2 =
of=3D/dev/null bs=3D1M status=3Dprogress</div><div><br></div><div>Compare =
results with directio and without.</div><div><br></div><div>=E2=80=9Cfio=E2=
=80=9D tool.&nbsp;</div><div>&nbsp;</div><div>&nbsp; 1) write =
prepare:</div><div><br></div><div>&nbsp; &nbsp; &nbsp; =
&nbsp;~#&nbsp;<span style=3D"font-family: Menlo; font-size: 12px; =
background-color: rgb(231, 238, 238);">fio</span><span =
style=3D"font-family: Menlo; font-size: 12px; background-color: rgb(231, =
238, 238);">&nbsp; </span><span style=3D"background-color: rgb(231, 238, =
238);"><font face=3D"Menlo"><span style=3D"font-size: 12px;">--name=3Dprep=
 --rw=3Dwrite --verify=3Dcrc32 --loop=3D1 =
--numjobs=3D2</span></font></span><span style=3D"font-family: Menlo; =
font-size: 12px; background-color: rgb(231, 238, 238);">&nbsp; =
</span><span style=3D"font-family: Menlo; font-size: 12px; =
background-color: rgb(231, 238, 238);">--time_based --thread</span><span =
style=3D"font-family: Menlo; font-size: 12px; background-color: rgb(231, =
238, 238);">&nbsp; </span><span style=3D"font-family: Menlo; font-size: =
12px; background-color: rgb(231, 238, 238);">--bs=3D1M =
--iodepth=3D32</span><span style=3D"font-family: Menlo; font-size: 12px; =
background-color: rgb(231, 238, 238);">&nbsp; </span><span =
style=3D"font-family: Menlo; font-size: 12px; background-color: rgb(231, =
238, 238);">--ioengine=3Dlibaio --direct=3D1</span><span =
style=3D"font-family: Menlo; font-size: 12px; background-color: rgb(231, =
238, 238);">&nbsp; </span><span style=3D"font-family: Menlo; font-size: =
12px; background-color: rgb(231, 238, =
238);">--group_reporting</span><span style=3D"font-family: Menlo; =
font-size: 12px; background-color: rgb(231, 238, 238);">&nbsp; =
</span><span style=3D"font-family: Menlo; font-size: 12px; =
background-color: rgb(231, 238, 238);">--size=3D20G</span><span =
style=3D"font-family: Menlo; font-size: 12px; background-color: rgb(231, =
238, 238);">&nbsp; </span><span style=3D"font-family: Menlo; font-size: =
12px; background-color: rgb(231, 238, =
238);">--filename=3D/dev/nvme0n2</span></div><div><br></div><div><br></div=
><div>&nbsp;2) &nbsp;read test:</div><div><br></div><div>&nbsp; &nbsp; =
&nbsp; ~#&nbsp;<span style=3D"font-family: Menlo; font-size: 12px; =
background-color: rgb(231, 238, 238);">fio</span><span =
style=3D"font-family: Menlo; font-size: 12px; background-color: rgb(231, =
238, 238);">&nbsp; </span><span style=3D"background-color: rgb(231, 238, =
238);"><font face=3D"Menlo"><span style=3D"font-size: =
12px;">--name=3Dreadtest --rw=3Dread =E2=80=94loop=3D30 =
--numjobs=3D2</span></font></span><span style=3D"font-family: Menlo; =
font-size: 12px; background-color: rgb(231, 238, 238);">&nbsp; =
</span><span style=3D"font-family: Menlo; font-size: 12px; =
background-color: rgb(231, 238, 238);">--time_based --thread</span><span =
style=3D"font-family: Menlo; font-size: 12px; background-color: rgb(231, =
238, 238);">&nbsp;&nbsp;</span><span style=3D"background-color: rgb(231, =
238, 238);"><font face=3D"Menlo"><span style=3D"font-size: =
12px;">=E2=80=94bs=3D256K --iodepth=3D32</span></font></span><span =
style=3D"font-family: Menlo; font-size: 12px; background-color: rgb(231, =
238, 238);">&nbsp; </span><span style=3D"font-family: Menlo; font-size: =
12px; background-color: rgb(231, 238, 238);">--ioengine=3Dlibaio =
--direct=3D1</span><span style=3D"font-family: Menlo; font-size: 12px; =
background-color: rgb(231, 238, 238);">&nbsp; </span><span =
style=3D"font-family: Menlo; font-size: 12px; background-color: rgb(231, =
238, 238);">--group_reporting</span><span style=3D"font-family: Menlo; =
font-size: 12px; background-color: rgb(231, 238, 238);">&nbsp; =
</span><span style=3D"font-family: Menlo; font-size: 12px; =
background-color: rgb(231, 238, 238);">--size=3D20G</span><span =
style=3D"font-family: Menlo; font-size: 12px; background-color: rgb(231, =
238, 238);">&nbsp; </span><span style=3D"font-family: Menlo; font-size: =
12px; background-color: rgb(231, 238, =
238);">--filename=3D/dev/nvme0n2</span></div><div>&nbsp; &nbsp; =
&nbsp;</div><div>=E2=80=94</div><div>Vitaliy &nbsp;</div><blockquote =
type=3D"cite"><div><p style=3D"caret-color: rgb(0, 0, 0); font-family: =
Helvetica; font-size: 14px; font-style: normal; font-variant-caps: =
normal; font-weight: 400; letter-spacing: normal; text-align: start; =
text-indent: 0px; text-transform: none; white-space: normal; =
word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: =
none;">Thanks,<br><br>-Matthew<br></p><p style=3D"caret-color: rgb(0, 0, =
0); font-family: Helvetica; font-size: 14px; font-style: normal; =
font-variant-caps: normal; font-weight: 400; letter-spacing: normal; =
text-align: start; text-indent: 0px; text-transform: none; white-space: =
normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; =
text-decoration: none;"><br></p><blockquote type=3D"cite" =
cite=3D"mid:1DAEB435-A613-4A04-B63F-D7AF7A0B7C0A@gmail.com" =
style=3D"font-family: Helvetica; font-size: 14px; font-style: normal; =
font-variant-caps: normal; font-weight: 400; letter-spacing: normal; =
orphans: auto; text-align: start; text-indent: 0px; text-transform: =
none; white-space: normal; widows: auto; word-spacing: 0px; =
-webkit-text-stroke-width: 0px; text-decoration: none;"><div><span =
style=3D"font-size: 15px;">=E2=80=94=E2=80=94</span></div><div><span =
style=3D"font-size: 15px;">Vitaliy</span></div><div><div><br><blockquote =
type=3D"cite"><div>On 28 Feb 2024, at 21:29, Matthew Grooms<span =
class=3D"Apple-converted-space">&nbsp;</span><a =
class=3D"moz-txt-link-rfc2396E" =
href=3D"mailto:mgrooms@shrew.net">&lt;mgrooms@shrew.net&gt;</a><span =
class=3D"Apple-converted-space">&nbsp;</span>wrote:</div><br =
class=3D"Apple-interchange-newline"><div><div><div =
class=3D"moz-cite-prefix">On 2/27/24 04:21, Vitaliy Gusev =
wrote:<br></div><blockquote type=3D"cite" =
cite=3D"mid:BE794E98-7B69-4626-BB66-B56F23D6A67E@gmail.com">Hi,<div><br></=
div><div><div><br><blockquote type=3D"cite"><div>On 23 Feb 2024, at =
18:37, Matthew Grooms<span class=3D"Apple-converted-space">&nbsp;</span><a=
 class=3D"moz-txt-link-rfc2396E" href=3D"mailto:mgrooms@shrew.net" =
moz-do-not-send=3D"true">&lt;mgrooms@shrew.net&gt;</a><span =
class=3D"Apple-converted-space">&nbsp;</span>wrote:</div><br =
class=3D"Apple-interchange-newline"><div><div><blockquote =
type=3D"cite">...</blockquote>The problem occurs when an image file is =
used on either ZFS or UFS. The problem also occurs when the virtual disk =
is backed by a raw disk partition or a ZVOL. This issue isn't related to =
a specific underlying =
filesystem.<br><br></div></div></blockquote><div><br></div>Do I =
understand right, you ran testing inside VM inside guest VM &nbsp;on =
ext4 filesystem? If so you should be aware about additional overhead in =
comparison when you were running tests on the =
hosts.</div><div><br></div></div></blockquote><p>Hi Vitaliy,<br><br>I =
appreciate you providing the feedback and suggestions. I spent over a =
week trying as many combinations of host and guest options as possible =
to narrow this issue down to a specific host storage or a guest device =
model option. Unfortunately the problem occurred with every combination =
I tested while running Linux as the guest. Note, I only tested RHEL8 =
&amp; RHEL9 compatible distributions ( Alma &amp; Rocky ). The problem =
did not occur when I ran FreeBSD as the guest. The problem did not occur =
when I ran KVM in the host and Linux as the guest.<br></p><blockquote =
type=3D"cite" =
cite=3D"mid:BE794E98-7B69-4626-BB66-B56F23D6A67E@gmail.com"><div><div>I =
would suggest to run fio (or even dd) on raw disk device inside VM, i.e. =
without filesystem at all. &nbsp;Just do not forget do =E2=80=9C<span =
style=3D"font-family: Menlo; font-size: 12px; background-color: rgb(231, =
238, 238);">echo 3 &gt; /proc/sys/vm/drop_caches</span>=E2=80=9D in =
Linux Guest VM before you run tests.<span =
class=3D"Apple-converted-space">&nbsp;</span><br></div></div></blockquote>=
<p>The two servers I was using to test with are are no longer available. =
However, I'll have two more identical servers arriving in the next week =
or so. I'll try to run additional tests and report back here. I used =
bonnie++ as that was easily installed from the package repos on all the =
systems I tested.<br></p><blockquote type=3D"cite" =
cite=3D"mid:BE794E98-7B69-4626-BB66-B56F23D6A67E@gmail.com"><div><div><br>=
</div><div>Could you also give more information =
about:</div><div><br></div><div>&nbsp;1. What results did you get =
(decode bonnie++ output)?</div></div></blockquote><p>If you look back at =
this email thread, there are many examples of running bonnie++ on the =
guest. I first ran the tests on the host system using Linux + ext4 and =
FreeBSD 14 + UFS &amp; ZFS to get a baseline of performance. Then I ran =
bonnie++ tests using bhyve as the hypervisor and Linux &amp; FreeBSD as =
the guest. The combination of host and guest storage options included =
...<br><br>1) block device + virtio blk<br>2) block device + nvme<br>3) =
UFS disk image + virtio blk<br>4) UFS disk image + nvme<br>5) ZFS disk =
image + virtio blk<br>6) ZFS disk image + nvme<br>7) ZVOL + virtio =
blk<br>8) ZVOL + nvme<br><br>In every instance, I observed the Linux =
guest disk IO often perform very well for some time after the guest was =
first booted. Then the performance of the guest would drop to a fraction =
of the original performance. The benchmark test was run every 5 or 10 =
minutes in a cron job. Sometimes the guest would perform well for up to =
an hour before performance would drop off. Most of the time it would =
only perform well for a few cycles ( 10 - 30 mins ) before performance =
would drop off. The only way to restore the performance was to reboot =
the guest. Once I determined that the problem was not specific to a =
particular host or guest storage option, I switched my testing to only =
use a block device as backing storage on the host to avoid hitting any =
system disk caches.<br><br>Here is the test script I used in the cron =
job ...<br><br><font size=3D"2" =
face=3D"monospace">#!/bin/sh<br>FNAME=3D'output.txt'<br></font><font =
face=3D"monospace"><font size=3D"2"><br>echo =
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D &gt;&gt; $FNAME<br>echo Begin @ `/usr/bin/date` &gt;&gt; =
$FNAME<br>echo &gt;&gt; $FNAME<br>/usr/sbin/bonnie++ 2&gt;&amp;1 | =
/usr/bin/grep -v 'done\|,' &gt;&gt; $FNAME<br>echo &gt;&gt; =
$FNAME<br>echo End @ `/usr/bin/date` &gt;&gt; =
$FNAME</font><br></font><br>As you can see, I'm calling bonnie++ with =
the system defaults. That uses a data set size that's 2x the guest RAM =
in an attempt to minimize the effect of filesystem cache on results. =
Here is an example of the output that bonnie++ produces ...<br><br><font =
face=3D"monospace"><font size=3D"2">Version&nbsp; =
2.00&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ------Sequential Output------ =
--Sequential Input- =
--Random-<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span =
class=3D"Apple-converted-space">&nbsp;</span>-Per Chr- --Block-- =
-Rewrite- -Per Chr- --Block-- --Seeks--<br>Name:Size =
etc&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; /sec %CP&nbsp; /sec =
%CP&nbsp; /sec %CP&nbsp; /sec %CP&nbsp; /sec %CP&nbsp; /sec =
%CP<br>linux-blk&nbsp;&nbsp;&nbsp; 63640M&nbsp; 694k&nbsp; 99&nbsp; =
1.6g&nbsp; 99&nbsp; 737m&nbsp; 76&nbsp; 985k&nbsp; 99&nbsp; 1.3g&nbsp; =
69 +++++ =
+++<br>Latency&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp; 11579us&nbsp;&nbsp;&nbsp;&nbsp; 535us&nbsp;&nbsp; =
11889us&nbsp;&nbsp;&nbsp; 8597us&nbsp;&nbsp; 21819us&nbsp;&nbsp;&nbsp; =
8238us<br>Version&nbsp; 2.00&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
------Sequential Create------ --------Random =
Create--------<br>linux-blk&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp; -Create-- --Read--- -Delete-- -Create-- --Read--- =
-Delete--<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;<span =
class=3D"Apple-converted-space">&nbsp;</span>files&nbsp; /sec %CP&nbsp; =
/sec %CP&nbsp; /sec %CP&nbsp; /sec %CP&nbsp; /sec %CP&nbsp; /sec =
%CP<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span =
class=3D"Apple-converted-space">&nbsp;</span>16 +++++ +++ +++++ +++ =
+++++ +++ +++++ +++ +++++ +++ +++++ =
+++<br>Latency&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp; 7620us&nbsp;&nbsp;&nbsp;&nbsp; =
126us&nbsp;&nbsp;&nbsp; 1648us&nbsp;&nbsp;&nbsp;&nbsp; =
151us&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 15us&nbsp;&nbsp;&nbsp;&nbsp; =
633us<br><br>--------------------------------- speed drop =
---------------------------------<br><br>Version&nbsp; =
2.00&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ------Sequential Output------ =
--Sequential Input- =
--Random-<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span =
class=3D"Apple-converted-space">&nbsp;</span>-Per Chr- --Block-- =
-Rewrite- -Per Chr- --Block-- --Seeks--<br>Name:Size =
etc&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; /sec %CP&nbsp; /sec =
%CP&nbsp; /sec %CP&nbsp; /sec %CP&nbsp; /sec %CP&nbsp; /sec =
%CP<br>linux-blk&nbsp;&nbsp;&nbsp; 63640M&nbsp; 676k&nbsp; 99&nbsp; =
451m&nbsp; 99&nbsp; 314m&nbsp; 93&nbsp; 951k&nbsp; 99&nbsp; 402m&nbsp; =
99 15167 =
530<br>Latency&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp; 11902us&nbsp;&nbsp;&nbsp; 8959us&nbsp;&nbsp; =
24711us&nbsp;&nbsp; 10185us&nbsp;&nbsp; 20884us&nbsp;&nbsp;&nbsp; =
5831us<br>Version&nbsp; 2.00&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
------Sequential Create------ --------Random =
Create--------<br>linux-blk&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp; -Create-- --Read--- -Delete-- -Create-- --Read--- =
-Delete--<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;<span =
class=3D"Apple-converted-space">&nbsp;</span>files&nbsp; /sec %CP&nbsp; =
/sec %CP&nbsp; /sec %CP&nbsp; /sec %CP&nbsp; /sec %CP&nbsp; /sec =
%CP<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span =
class=3D"Apple-converted-space">&nbsp;</span>16&nbsp;&nbsp;&nbsp;&nbsp; =
0&nbsp; 96 +++++ +++ +++++ +++&nbsp;&nbsp;&nbsp;&nbsp; 0&nbsp; 96 +++++ =
+++&nbsp;&nbsp;&nbsp;&nbsp; 0&nbsp; =
75<br>Latency&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp; 343us&nbsp;&nbsp;&nbsp;&nbsp; =
165us&nbsp;&nbsp;&nbsp; 1636us&nbsp;&nbsp;&nbsp;&nbsp; =
113us&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 55us&nbsp;&nbsp;&nbsp; =
1836us<br></font></font><br>In the example above, the benchmark test =
repeated about 20 times with results that were similar to the =
performance shown above the dotted line ( ~ 1.6g/s seq write and 1.3g/s =
seq read ). After that, the performance dropped to what's shown below =
the dotted line which is less than 1/4 the original speed ( ~ 451m/s seq =
write and 402m/s seq read ).<span =
class=3D"Apple-converted-space">&nbsp;</span><br></p><blockquote =
type=3D"cite" =
cite=3D"mid:BE794E98-7B69-4626-BB66-B56F23D6A67E@gmail.com"><div><div>&nbs=
p;2. What results expecting?<br><br></div></div></blockquote><p>What I =
expect is that, when I perform the same test with the same parameters, =
the results would stay more or less consistent over time.&nbsp;This is =
true when KVM is used as the hypervisor on the same hardware and guest =
options. That said, I'm not worried about bhyve being consistently =
slower than kvm or a FreeBSD guest being consistently slower than a =
Linux guest. I'm concerned that the performance drop over time is =
indicative of an issue with how bhyve interacts with non-freebsd =
guests.<br></p><blockquote type=3D"cite" =
cite=3D"mid:BE794E98-7B69-4626-BB66-B56F23D6A67E@gmail.com"><div><div>&nbs=
p;3. VM configuration, virtio-blk disk size, etc.</div><div>&nbsp;4. =
Full command for tests (including size of test-set), bhyve, =
etc.<br></div></div></blockquote><p>I believe this was answered above. =
Please let me know if you have additional questions.<br></p><blockquote =
type=3D"cite" =
cite=3D"mid:BE794E98-7B69-4626-BB66-B56F23D6A67E@gmail.com"><div><div><br>=
</div><div>&nbsp;5. Did you pass virtio-blk as 512 or 4K ? If 512, =
probably you should try 4K.<br><br></div></div></blockquote><p>The =
testing performed was not exclusively with =
virtio-blk.<br><br></p><blockquote type=3D"cite" =
cite=3D"mid:BE794E98-7B69-4626-BB66-B56F23D6A67E@gmail.com"><div><div>&nbs=
p;6. Linux has several read-ahead options for IO schedule, and it could =
be related too.</div><div><br></div></div></blockquote><p>I suppose it's =
possible that bhyve could be somehow causing the disk scheduler in the =
Linux guest to act differently. I'll see if I can figure out how to =
disable that in future tests.<br><br></p><blockquote type=3D"cite" =
cite=3D"mid:BE794E98-7B69-4626-BB66-B56F23D6A67E@gmail.com"><div><div>Addi=
tionally could also you play with =E2=80=9Csync=3Ddisabled=E2=80=9D =
volume/zvol option? Of course it is only for write =
testing.<br></div></div></blockquote><p>The testing performed was not =
exclusively with zvols.<br><br></p><p>Once I have more hardware =
available, I'll try to report back with more testing. It may be =
interesting to also see how a Windows guest performs compared to Linux =
&amp; FreeBSD. I suspect that this issue may only be triggered when a =
fast disk array is in use on the host. My tests use a 16x SSD RAID 10 =
array. It's also quite possible that the disk IO slowdown is only a =
symptom of another issue that's triggered by the disk IO test ( please =
see end of my last post related to scheduler priority observations ). =
All I can say for sure is that ...<br><br>1) There is a problem and it's =
reproducible across multiple hosts<br>2) It affects RHEL8 &amp; RHEL9 =
guests but not FreeBSD guests<br>3) It is not specific to any host or =
guest storage =
option<br><br>Thanks,</p><p>-Matthew</p></div></div></blockquote></div></d=
iv></blockquote></div></blockquote></div><br></body></html>=

--Apple-Mail=_8118B541-F589-4E79-AF6C-3E98D8AADC93--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3850080E-EBD1-4414-9C4E-DD89611C9F58>