From nobody Wed Feb 28 19:31:06 2024
X-Original-To: virtualization@mlmmj.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
	by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4TlPc23DMrz5CHVb
	for <virtualization@mlmmj.nyi.freebsd.org>; Wed, 28 Feb 2024 19:31:30 +0000 (UTC)
	(envelope-from gusev.vitaliy@gmail.com)
Received: from mail-lj1-x22d.google.com (mail-lj1-x22d.google.com [IPv6:2a00:1450:4864:20::22d])
	(using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256
	 client-signature RSA-PSS (2048 bits) client-digest SHA256)
	(Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK))
	by mx1.freebsd.org (Postfix) with ESMTPS id 4TlPc16qzGz4PkR
	for <virtualization@freebsd.org>; Wed, 28 Feb 2024 19:31:29 +0000 (UTC)
	(envelope-from gusev.vitaliy@gmail.com)
Authentication-Results: mx1.freebsd.org;
	none
Received: by mail-lj1-x22d.google.com with SMTP id 38308e7fff4ca-2d2505352e6so888581fa.3
        for <virtualization@freebsd.org>; Wed, 28 Feb 2024 11:31:29 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1709148688; x=1709753488; darn=freebsd.org;
        h=references:to:cc:in-reply-to:date:subject:mime-version:message-id
         :from:from:to:cc:subject:date:message-id:reply-to;
        bh=Rc2Q9glJ978b46nPDzeonzO79RJwcZ2ldWtxkxfLT2g=;
        b=Pia8Fkl1tKcxzI8X0vaJ4rvCgOqvLy4CKm2wCwF5SUoGIa16a6jQoXpwSpqGnVfPv+
         3S79SPO9BVHIC0sdO85T2FDdgR9//Jnh/Vd+dO5UHnAp76fsU0i32bpkU66MXlpi9glt
         NI49PX6lmZMJ+EuwCxEpf9MC9JKxotAkPqjup6WKcphMq203uS0fHHU03wTbYArqnYKg
         0vCUQbuUcQBIwuzezzz6nq5XTiiTEVLO7XVAzxkHaSh4Pw49s4Si0ml+4Ic3sQTBt7Hb
         uebgtfK9FKmzW9ddl0afThFpk75+6rlMOY1lKAfpuDjTBVvIcB9QZmAvXPF2mIqLQkgM
         431A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1709148688; x=1709753488;
        h=references:to:cc:in-reply-to:date:subject:mime-version:message-id
         :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=Rc2Q9glJ978b46nPDzeonzO79RJwcZ2ldWtxkxfLT2g=;
        b=UvM7CYTql3xp6aZmOshpI8BeRcKFxFrMIrlkzGMs3p/hFjKhrFG9S7P5OwSYE4M+9q
         bSnBpiO2Rrra8Z8BvF1GkINTnkmIJjYWy0Bq4KLivqsl54zYQRzMd9poLjSqHdcvyjpc
         04KYPa7JmqTMcspKAnWJ2zyvnsLwOaxm5+BU79i5C3j9+YxrM8YIBDfozoD6iwcZMPxb
         z3RXzfYIBkhskFbML2RtoVFlSm3k+F0fiokW3vBkvXWbrqIJgnhZAupkQSLD/yWRhCZt
         ColmQPN54Mqw6d8uB1UCO4A63FoAGw2CnJxZmSpnUi977eBVwvABK4q/RHMUKT47iv2u
         gulA==
X-Gm-Message-State: AOJu0YxbQdgQwgRBr6dbFoYTdxOTnhYsaBmni+TeFf9I2qACDmU39Sl5
	unJ9uGyMHzZ5Ej86OKOOGqW8jc8fYI88yu4PxV3xkZp3mtIz+cJFLaIG3sMSHGQ1pA==
X-Google-Smtp-Source: AGHT+IEd++0ZNDtGg9MxUo030h7GPNPM9Uf6N0b3wRDwiqzUYZqHXK9uu+x2mhpS8S2aO4z1z9tzTg==
X-Received: by 2002:a2e:a274:0:b0:2d2:39e6:fa3f with SMTP id k20-20020a2ea274000000b002d239e6fa3fmr7515481ljm.31.1709148687745;
        Wed, 28 Feb 2024 11:31:27 -0800 (PST)
Received: from smtpclient.apple ([188.187.60.230])
        by smtp.gmail.com with ESMTPSA id l18-20020a2e9092000000b002d0ca6e0f9fsm12075ljg.15.2024.02.28.11.31.27
        (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
        Wed, 28 Feb 2024 11:31:27 -0800 (PST)
From: Vitaliy Gusev <gusev.vitaliy@gmail.com>
Message-Id: <1DAEB435-A613-4A04-B63F-D7AF7A0B7C0A@gmail.com>
Content-Type: multipart/alternative;
	boundary="Apple-Mail=_4C6B768C-854E-40D3-81D0-E10177927920"
List-Id: Discussion <freebsd-virtualization.freebsd.org>
List-Archive: https://lists.freebsd.org/archives/freebsd-virtualization
List-Help: <mailto:virtualization+help@freebsd.org>
List-Post: <mailto:virtualization@freebsd.org>
List-Subscribe: <mailto:virtualization+subscribe@freebsd.org>
List-Unsubscribe: <mailto:virtualization+unsubscribe@freebsd.org>
Sender: owner-freebsd-virtualization@freebsd.org
X-BeenThere: freebsd-virtualization@freebsd.org
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3774.400.31\))
Subject: Re: bhyve disk performance issue
Date: Wed, 28 Feb 2024 22:31:06 +0300
In-Reply-To: <25ddf43d-f700-4cb5-af2a-1fe669d1e24b@shrew.net>
Cc: virtualization@freebsd.org
To: Matthew Grooms <mgrooms@shrew.net>
References: <6a128904-a4c1-41ec-a83d-56da56871ceb@shrew.net>
 <28ea168c-1211-4104-b8b4-daed0e60950d@app.fastmail.com>
 <0ff6f30a-b53a-4d0f-ac21-eaf701d35d00@shrew.net>
 <6f6b71ac-2349-4045-9eaf-5c50d42b89be@shrew.net>
 <50614ea4-f0f9-44a2-b5e6-ebb33cfffbc4@shrew.net>
 <6a4e7e1d-cca5-45d4-a268-1805a15d9819@shrew.net>
 <f01a9bca-7023-40c0-93f2-8cdbe4cd8078@tubnor.net>
 <edb80fff-561b-4dc5-95ee-204e0c6d95df@shrew.net>
 <a07d070b-4dc1-40c9-bc80-163cd59a5bfc@Duedinghausen.eu>
 <e45c95df-4858-48aa-a274-ba1bf8e599d5@shrew.net>
 <BE794E98-7B69-4626-BB66-B56F23D6A67E@gmail.com>
 <25ddf43d-f700-4cb5-af2a-1fe669d1e24b@shrew.net>
X-Mailer: Apple Mail (2.3774.400.31)
X-Spamd-Bar: ----
X-Rspamd-Pre-Result: action=no action;
	module=replies;
	Message is reply to one we originated
X-Spamd-Result: default: False [-4.00 / 15.00];
	REPLY(-4.00)[];
	TAGGED_FROM(0.00)[];
	ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]
X-Rspamd-Queue-Id: 4TlPc16qzGz4PkR


--Apple-Mail=_4C6B768C-854E-40D3-81D0-E10177927920
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=utf-8

Hi,  Matthew.

I still do not know what command line was used for bhyve. I  couldn't =
find it through the thread, sorry. And I couldn't find virtual disk size =
that you used.

Could you, please, simplify bonnie++ output, it is hard to decode due to =
alignment and use exact numbers for:

READ seq  - I see you had 1.6GB/s for the good time and ~500MB/s for the =
worst.
WRITE seq  - ...

If you have slow results both for the read and write operations, you =
probably should perform testing only for READs and do not do anything =
until READs are fine.

Again, if you have slow performance for Ext4 Filesystem in guest VM =
placed on the passed disk image, you should try to test on the raw disk =
image, i.e. without Ext4, because it could be related.

If you run test inside VM on a filesystem, you can have deal with =
filesystem bottlenecks, bugs, fragmentation etc. Do you want to fix them =
all? I don=E2=80=99t think so.

For example, if you pass disk image 40G and create Ext4 filesystem, and =
during testing the filesystem becomes full over 80%, I/O could be =
performed not so fine.

You probably should eliminate that guest filesystem behaviour when you =
meet IO performance slowdown.

Also, please look at the TRIM operations when you perform WRITE testing. =
It could be also related to the slow write I/O.

=E2=80=94=E2=80=94
Vitaliy

> On 28 Feb 2024, at 21:29, Matthew Grooms <mgrooms@shrew.net> wrote:
>=20
> On 2/27/24 04:21, Vitaliy Gusev wrote:
>> Hi,
>>=20
>>=20
>>> On 23 Feb 2024, at 18:37, Matthew Grooms <mgrooms@shrew.net> =
<mailto:mgrooms@shrew.net> wrote:
>>>=20
>>>> ...
>>> The problem occurs when an image file is used on either ZFS or UFS. =
The problem also occurs when the virtual disk is backed by a raw disk =
partition or a ZVOL. This issue isn't related to a specific underlying =
filesystem.
>>>=20
>>=20
>> Do I understand right, you ran testing inside VM inside guest VM  on =
ext4 filesystem? If so you should be aware about additional overhead in =
comparison when you were running tests on the hosts.
>>=20
> Hi Vitaliy,
>=20
> I appreciate you providing the feedback and suggestions. I spent over =
a week trying as many combinations of host and guest options as possible =
to narrow this issue down to a specific host storage or a guest device =
model option. Unfortunately the problem occurred with every combination =
I tested while running Linux as the guest. Note, I only tested RHEL8 & =
RHEL9 compatible distributions ( Alma & Rocky ). The problem did not =
occur when I ran FreeBSD as the guest. The problem did not occur when I =
ran KVM in the host and Linux as the guest.
>=20
>> I would suggest to run fio (or even dd) on raw disk device inside VM, =
i.e. without filesystem at all.  Just do not forget do =E2=80=9Cecho 3 > =
/proc/sys/vm/drop_caches=E2=80=9D in Linux Guest VM before you run =
tests.
> The two servers I was using to test with are are no longer available. =
However, I'll have two more identical servers arriving in the next week =
or so. I'll try to run additional tests and report back here. I used =
bonnie++ as that was easily installed from the package repos on all the =
systems I tested.
>=20
>>=20
>> Could you also give more information about:
>>=20
>>  1. What results did you get (decode bonnie++ output)?
> If you look back at this email thread, there are many examples of =
running bonnie++ on the guest. I first ran the tests on the host system =
using Linux + ext4 and FreeBSD 14 + UFS & ZFS to get a baseline of =
performance. Then I ran bonnie++ tests using bhyve as the hypervisor and =
Linux & FreeBSD as the guest. The combination of host and guest storage =
options included ...
>=20
> 1) block device + virtio blk
> 2) block device + nvme
> 3) UFS disk image + virtio blk
> 4) UFS disk image + nvme
> 5) ZFS disk image + virtio blk
> 6) ZFS disk image + nvme
> 7) ZVOL + virtio blk
> 8) ZVOL + nvme
>=20
> In every instance, I observed the Linux guest disk IO often perform =
very well for some time after the guest was first booted. Then the =
performance of the guest would drop to a fraction of the original =
performance. The benchmark test was run every 5 or 10 minutes in a cron =
job. Sometimes the guest would perform well for up to an hour before =
performance would drop off. Most of the time it would only perform well =
for a few cycles ( 10 - 30 mins ) before performance would drop off. The =
only way to restore the performance was to reboot the guest. Once I =
determined that the problem was not specific to a particular host or =
guest storage option, I switched my testing to only use a block device =
as backing storage on the host to avoid hitting any system disk caches.
>=20
> Here is the test script I used in the cron job ...
>=20
> #!/bin/sh
> FNAME=3D'output.txt'
>=20
> echo =
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D >> $FNAME
> echo Begin @ `/usr/bin/date` >> $FNAME
> echo >> $FNAME
> /usr/sbin/bonnie++ 2>&1 | /usr/bin/grep -v 'done\|,' >> $FNAME
> echo >> $FNAME
> echo End @ `/usr/bin/date` >> $FNAME
>=20
> As you can see, I'm calling bonnie++ with the system defaults. That =
uses a data set size that's 2x the guest RAM in an attempt to minimize =
the effect of filesystem cache on results. Here is an example of the =
output that bonnie++ produces ...
>=20
> Version  2.00       ------Sequential Output------ --Sequential Input- =
--Random-
>                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- =
--Seeks--
> Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  =
/sec %CP
> linux-blk    63640M  694k  99  1.6g  99  737m  76  985k  99  1.3g  69 =
+++++ +++
> Latency             11579us     535us   11889us    8597us   21819us    =
8238us
> Version  2.00       ------Sequential Create------ --------Random =
Create--------
> linux-blk           -Create-- --Read--- -Delete-- -Create-- --Read--- =
-Delete--
>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  =
/sec %CP
>                  16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ =
+++++ +++
> Latency              7620us     126us    1648us     151us      15us    =
 633us
>=20
> --------------------------------- speed drop =
---------------------------------
>=20
> Version  2.00       ------Sequential Output------ --Sequential Input- =
--Random-
>                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- =
--Seeks--
> Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  =
/sec %CP
> linux-blk    63640M  676k  99  451m  99  314m  93  951k  99  402m  99 =
15167 530
> Latency             11902us    8959us   24711us   10185us   20884us    =
5831us
> Version  2.00       ------Sequential Create------ --------Random =
Create--------
> linux-blk           -Create-- --Read--- -Delete-- -Create-- --Read--- =
-Delete--
>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  =
/sec %CP
>                  16     0  96 +++++ +++ +++++ +++     0  96 +++++ +++  =
   0  75
> Latency               343us     165us    1636us     113us      55us    =
1836us
>=20
> In the example above, the benchmark test repeated about 20 times with =
results that were similar to the performance shown above the dotted line =
( ~ 1.6g/s seq write and 1.3g/s seq read ). After that, the performance =
dropped to what's shown below the dotted line which is less than 1/4 the =
original speed ( ~ 451m/s seq write and 402m/s seq read ).
>=20
>>  2. What results expecting?
>>=20
> What I expect is that, when I perform the same test with the same =
parameters, the results would stay more or less consistent over time. =
This is true when KVM is used as the hypervisor on the same hardware and =
guest options. That said, I'm not worried about bhyve being consistently =
slower than kvm or a FreeBSD guest being consistently slower than a =
Linux guest. I'm concerned that the performance drop over time is =
indicative of an issue with how bhyve interacts with non-freebsd guests.
>=20
>>  3. VM configuration, virtio-blk disk size, etc.
>>  4. Full command for tests (including size of test-set), bhyve, etc.
> I believe this was answered above. Please let me know if you have =
additional questions.
>=20
>>=20
>>  5. Did you pass virtio-blk as 512 or 4K ? If 512, probably you =
should try 4K.
>>=20
> The testing performed was not exclusively with virtio-blk.
>=20
>=20
>>  6. Linux has several read-ahead options for IO schedule, and it =
could be related too.
>>=20
> I suppose it's possible that bhyve could be somehow causing the disk =
scheduler in the Linux guest to act differently. I'll see if I can =
figure out how to disable that in future tests.
>=20
>=20
>> Additionally could also you play with =E2=80=9Csync=3Ddisabled=E2=80=9D=
 volume/zvol option? Of course it is only for write testing.
> The testing performed was not exclusively with zvols.
>=20
>=20
> Once I have more hardware available, I'll try to report back with more =
testing. It may be interesting to also see how a Windows guest performs =
compared to Linux & FreeBSD. I suspect that this issue may only be =
triggered when a fast disk array is in use on the host. My tests use a =
16x SSD RAID 10 array. It's also quite possible that the disk IO =
slowdown is only a symptom of another issue that's triggered by the disk =
IO test ( please see end of my last post related to scheduler priority =
observations ). All I can say for sure is that ...
>=20
> 1) There is a problem and it's reproducible across multiple hosts
> 2) It affects RHEL8 & RHEL9 guests but not FreeBSD guests
> 3) It is not specific to any host or guest storage option
>=20
> Thanks,
>=20
> -Matthew
>=20


--Apple-Mail=_4C6B768C-854E-40D3-81D0-E10177927920
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=utf-8

<html><head><meta http-equiv=3D"content-type" content=3D"text/html; =
charset=3Dutf-8"></head><body style=3D"overflow-wrap: break-word; =
-webkit-nbsp-mode: space; line-break: after-white-space;"><span =
style=3D"font-size: 15px;">Hi, &nbsp;Matthew.</span><div><span =
style=3D"font-size: 15px;"><br></span></div><div><span style=3D"font-size:=
 15px;">I still do not know what command line was used for bhyve. I =
&nbsp;couldn't find it through the thread, sorry. And =
I&nbsp;</span><span style=3D"font-size: 15px;">couldn't find virtual =
disk size that you used.</span></div><div><span style=3D"font-size: =
15px;"><br></span></div><div><span style=3D"font-size: 15px;">Could you, =
please, simplify bonnie++ output, it is hard to decode due to alignment =
and use exact numbers for:</span></div><div><span style=3D"font-size: =
15px;"><br></span></div><div><span style=3D"font-size: 15px;">READ seq =
&nbsp;- I see you had 1.6GB/s for the good time and ~500MB/s for the =
worst.</span></div><div><span style=3D"font-size: 15px;">WRITE seq =
&nbsp;- ...</span></div><div><span style=3D"font-size: =
15px;"><br></span></div><div><span style=3D"font-size: 15px;">If you =
have slow results both for the read and write operations, you probably =
should perform testing <u>only</u> for READs and do not do anything =
until READs are fine.</span></div><div><span style=3D"font-size: =
15px;"><br></span></div><div><span style=3D"font-size: 15px;">Again, if =
you have slow performance for Ext4 Filesystem in guest VM placed on the =
passed disk image, you should&nbsp;</span><span style=3D"font-size: =
15px;">try to test on the raw disk image, i.e. without Ext4, because it =
could be related.</span></div><div><span style=3D"font-size: =
15px;"><br></span></div><div><span style=3D"font-size: 15px;">If you run =
test inside VM on a filesystem, you can have deal with filesystem =
bottlenecks, bugs, fragmentation etc. Do you want to fix them all? I =
don=E2=80=99t think so.</span></div><div><span style=3D"font-size: =
15px;"><br></span></div><div><span style=3D"font-size: 15px;">For =
example, if you pass disk image 40G and create Ext4 filesystem, and =
during testing the filesystem becomes full over 80%, I/O could be =
performed not so fine.</span></div><div><span style=3D"font-size: =
15px;"><br></span></div><div><span style=3D"font-size: 15px;">You =
probably should eliminate that guest filesystem behaviour when you meet =
IO performance slowdown.</span></div><div><span style=3D"font-size: =
15px;"><br></span></div><div><span style=3D"font-size: 15px;">Also, =
please look at the TRIM operations when you perform WRITE testing. It =
could be also related to the slow write I/O.</span></div><div><span =
style=3D"font-size: 15px;"><br></span></div><div><span style=3D"font-size:=
 15px;">=E2=80=94=E2=80=94</span></div><div><span style=3D"font-size: =
15px;">Vitaliy</span></div><div><div><br><blockquote type=3D"cite"><div>On=
 28 Feb 2024, at 21:29, Matthew Grooms &lt;mgrooms@shrew.net&gt; =
wrote:</div><br class=3D"Apple-interchange-newline"><div>

 =20
    <meta http-equiv=3D"Content-Type" content=3D"text/html; =
charset=3DUTF-8">
 =20
  <div>
    <div class=3D"moz-cite-prefix">On 2/27/24 04:21, Vitaliy Gusev =
wrote:<br>
    </div>
    <blockquote type=3D"cite" =
cite=3D"mid:BE794E98-7B69-4626-BB66-B56F23D6A67E@gmail.com">
      <meta http-equiv=3D"content-type" content=3D"text/html; =
charset=3DUTF-8">
      Hi,
      <div><br>
      </div>
      <div>
        <div><br>
          <blockquote type=3D"cite">
            <div>On 23 Feb 2024, at 18:37, Matthew Grooms
              <a class=3D"moz-txt-link-rfc2396E" =
href=3D"mailto:mgrooms@shrew.net">&lt;mgrooms@shrew.net&gt;</a> =
wrote:</div>
            <br class=3D"Apple-interchange-newline">
            <div>
              <div>
                <blockquote type=3D"cite">...</blockquote>
                The problem occurs when an image file is used on either
                ZFS or UFS. The problem also occurs when the virtual
                disk is backed by a raw disk partition or a ZVOL. This
                issue isn't related to a specific underlying =
filesystem.<br>
                <br>
              </div>
            </div>
          </blockquote>
          <div><br>
          </div>
          Do I understand right, you ran testing inside VM inside guest
          VM &nbsp;on ext4 filesystem? If so you should be aware about
          additional overhead in comparison when you were running tests
          on the hosts.</div>
        <div><br>
        </div>
      </div>
    </blockquote><p>Hi Vitaliy,<br>
      <br>
      I appreciate you providing the feedback and suggestions. I spent
      over a week trying as many combinations of host and guest options
      as possible to narrow this issue down to a specific host storage
      or a guest device model option. Unfortunately the problem occurred
      with every combination I tested while running Linux as the guest.
      Note, I only tested RHEL8 &amp; RHEL9 compatible distributions (
      Alma &amp; Rocky ). The problem did not occur when I ran FreeBSD
      as the guest. The problem did not occur when I ran KVM in the host
      and Linux as the guest.<br>
    </p>
    <blockquote type=3D"cite" =
cite=3D"mid:BE794E98-7B69-4626-BB66-B56F23D6A67E@gmail.com">
      <div>
        <div>I would suggest to run fio (or even dd) on raw disk device
          inside VM, i.e. without filesystem at all. &nbsp;Just do not =
forget
          do =E2=80=9C<span style=3D"font-family: Menlo; font-size: =
12px; background-color: rgb(231, 238, 238);">echo
            3 &gt; /proc/sys/vm/drop_caches</span>=E2=80=9D in Linux =
Guest VM
          before you run tests. <br>
        </div>
      </div>
    </blockquote><p>The two servers I was using to test with are are no =
longer
      available. However, I'll have two more identical servers arriving
      in the next week or so. I'll try to run additional tests and
      report back here. I used bonnie++ as that was easily installed
      from the package repos on all the systems I tested.<br>
    </p>
    <blockquote type=3D"cite" =
cite=3D"mid:BE794E98-7B69-4626-BB66-B56F23D6A67E@gmail.com">
      <div>
        <div><br>
        </div>
        <div>Could you also give more information about:</div>
        <div><br>
        </div>
        <div>&nbsp;1. What results did you get (decode bonnie++ =
output)?</div>
      </div>
    </blockquote><p>If you look back at this email thread, there are =
many examples of
      running bonnie++ on the guest. I first ran the tests on the host
      system using Linux + ext4 and FreeBSD 14 + UFS &amp; ZFS to get a
      baseline of performance. Then I ran bonnie++ tests using bhyve as
      the hypervisor and Linux &amp; FreeBSD as the guest. The
      combination of host and guest storage options included ...<br>
      <br>
      1) block device + virtio blk<br>
      2) block device + nvme<br>
      3) UFS disk image + virtio blk<br>
      4) UFS disk image + nvme<br>
      5) ZFS disk image + virtio blk<br>
      6) ZFS disk image + nvme<br>
      7) ZVOL + virtio blk<br>
      8) ZVOL + nvme<br>
      <br>
      In every instance, I observed the Linux guest disk IO often
      perform very well for some time after the guest was first booted.
      Then the performance of the guest would drop to a fraction of the
      original performance. The benchmark test was run every 5 or 10
      minutes in a cron job. Sometimes the guest would perform well for
      up to an hour before performance would drop off. Most of the time
      it would only perform well for a few cycles ( 10 - 30 mins )
      before performance would drop off. The only way to restore the
      performance was to reboot the guest. Once I determined that the
      problem was not specific to a particular host or guest storage
      option, I switched my testing to only use a block device as
      backing storage on the host to avoid hitting any system disk
      caches.<br>
      <br>
      Here is the test script I used in the cron job ...<br>
      <br>
      <font size=3D"2" face=3D"monospace">#!/bin/sh<br>
        FNAME=3D'output.txt'<br>
      </font>
      <font face=3D"monospace"><font size=3D"2"><br>
          echo
=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D
          &gt;&gt; $FNAME<br>
          echo Begin @ `/usr/bin/date` &gt;&gt; $FNAME<br>
          echo &gt;&gt; $FNAME<br>
          /usr/sbin/bonnie++ 2&gt;&amp;1 | /usr/bin/grep -v 'done\|,'
          &gt;&gt; $FNAME<br>
          echo &gt;&gt; $FNAME<br>
          echo End @ `/usr/bin/date` &gt;&gt; $FNAME</font><br>
      </font>
      <br>
      As you can see, I'm calling bonnie++ with the system defaults.
      That uses a data set size that's 2x the guest RAM in an attempt to
      minimize the effect of filesystem cache on results. Here is an
      example of the output that bonnie++ produces ...<br>
      <br>
      <font face=3D"monospace"><font size=3D"2">Version&nbsp; =
2.00&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
          ------Sequential Output------ --Sequential Input- =
--Random-<br>
          =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -Per Chr- --Block-- -Rewrite- =
-Per Chr-
          --Block-- --Seeks--<br>
          Name:Size etc&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; /sec =
%CP&nbsp; /sec %CP&nbsp; /sec %CP&nbsp; /sec %CP&nbsp;
          /sec %CP&nbsp; /sec %CP<br>
          linux-blk&nbsp;&nbsp;&nbsp; 63640M&nbsp; 694k&nbsp; 99&nbsp; =
1.6g&nbsp; 99&nbsp; 737m&nbsp; 76&nbsp; 985k&nbsp; 99&nbsp;
          1.3g&nbsp; 69 +++++ +++<br>
          =
Latency&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp; 11579us&nbsp;&nbsp;&nbsp;&nbsp; 535us&nbsp;&nbsp; =
11889us&nbsp;&nbsp;&nbsp; 8597us&nbsp;&nbsp;
          21819us&nbsp;&nbsp;&nbsp; 8238us<br>
          Version&nbsp; 2.00&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
------Sequential Create------
          --------Random Create--------<br>
          =
linux-blk&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
-Create-- --Read--- -Delete-- -Create--
          --Read--- -Delete--<br>
          =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp; files&nbsp; /sec %CP&nbsp; /sec %CP&nbsp; /sec %CP&nbsp; /sec =
%CP&nbsp;
          /sec %CP&nbsp; /sec %CP<br>
          =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp; 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++
          +++++ +++ +++++ +++<br>
          =
Latency&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp; 7620us&nbsp;&nbsp;&nbsp;&nbsp; 126us&nbsp;&nbsp;&nbsp; =
1648us&nbsp;&nbsp;&nbsp;&nbsp; 151us&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
          15us&nbsp;&nbsp;&nbsp;&nbsp; 633us<br>
          <br>
          --------------------------------- speed drop
          ---------------------------------<br>
          <br>
          Version&nbsp; 2.00&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
------Sequential Output------ --Sequential
          Input- --Random-<br>
          =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -Per Chr- --Block-- -Rewrite- =
-Per Chr-
          --Block-- --Seeks--<br>
          Name:Size etc&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; /sec =
%CP&nbsp; /sec %CP&nbsp; /sec %CP&nbsp; /sec %CP&nbsp;
          /sec %CP&nbsp; /sec %CP<br>
          linux-blk&nbsp;&nbsp;&nbsp; 63640M&nbsp; 676k&nbsp; 99&nbsp; =
451m&nbsp; 99&nbsp; 314m&nbsp; 93&nbsp; 951k&nbsp; 99&nbsp;
          402m&nbsp; 99 15167 530<br>
          =
Latency&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp; 11902us&nbsp;&nbsp;&nbsp; 8959us&nbsp;&nbsp; 24711us&nbsp;&nbsp; =
10185us&nbsp;&nbsp;
          20884us&nbsp;&nbsp;&nbsp; 5831us<br>
          Version&nbsp; 2.00&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
------Sequential Create------
          --------Random Create--------<br>
          =
linux-blk&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
-Create-- --Read--- -Delete-- -Create--
          --Read--- -Delete--<br>
          =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp; files&nbsp; /sec %CP&nbsp; /sec %CP&nbsp; /sec %CP&nbsp; /sec =
%CP&nbsp;
          /sec %CP&nbsp; /sec %CP<br>
          =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp; 16&nbsp;&nbsp;&nbsp;&nbsp; 0&nbsp; 96 +++++ +++ =
+++++ +++&nbsp;&nbsp;&nbsp;&nbsp; 0&nbsp; 96
          +++++ +++&nbsp;&nbsp;&nbsp;&nbsp; 0&nbsp; 75<br>
          =
Latency&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp; 343us&nbsp;&nbsp;&nbsp;&nbsp; 165us&nbsp;&nbsp;&nbsp; =
1636us&nbsp;&nbsp;&nbsp;&nbsp; 113us&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
          55us&nbsp;&nbsp;&nbsp; 1836us<br>
        </font></font><br>
      In the example above, the benchmark test repeated about 20 times
      with results that were similar to the performance shown above the
      dotted line ( ~ 1.6g/s seq write and 1.3g/s seq read ). After
      that, the performance dropped to what's shown below the dotted
      line which is less than 1/4 the original speed ( ~ 451m/s seq
      write and 402m/s seq read ). <br>
    </p>
    <blockquote type=3D"cite" =
cite=3D"mid:BE794E98-7B69-4626-BB66-B56F23D6A67E@gmail.com">
      <div>
        <div>&nbsp;2. What results expecting?<br>
          <br>
        </div>
      </div>
    </blockquote><p>What I expect is that, when I perform the same test =
with the same
      parameters, the results would stay more or less consistent over
      time.&nbsp;This is true when KVM is used as the hypervisor on the =
same
      hardware and guest options. That said, I'm not worried about bhyve
      being consistently slower than kvm or a FreeBSD guest being
      consistently slower than a Linux guest. I'm concerned that the
      performance drop over time is indicative of an issue with how
      bhyve interacts with non-freebsd guests.<br>
    </p>
    <blockquote type=3D"cite" =
cite=3D"mid:BE794E98-7B69-4626-BB66-B56F23D6A67E@gmail.com">
      <div>
        <div>&nbsp;3. VM configuration, virtio-blk disk size, etc.</div>
        <div>&nbsp;4. Full command for tests (including size of =
test-set),
          bhyve, etc.<br>
        </div>
      </div>
    </blockquote><p>I believe this was answered above. Please let me =
know if you have
      additional questions.<br>
    </p>
    <blockquote type=3D"cite" =
cite=3D"mid:BE794E98-7B69-4626-BB66-B56F23D6A67E@gmail.com">
      <div>
        <div><br>
        </div>
        <div>&nbsp;5. Did you pass virtio-blk as 512 or 4K ? If 512, =
probably
          you should try 4K.<br>
          <br>
        </div>
      </div>
    </blockquote><p>The testing performed was not exclusively with =
virtio-blk.<br>
      <br>
    </p>
    <blockquote type=3D"cite" =
cite=3D"mid:BE794E98-7B69-4626-BB66-B56F23D6A67E@gmail.com">
      <div>
        <div>&nbsp;6. Linux has several read-ahead options for IO =
schedule,
          and it could be related too.</div>
        <div><br>
        </div>
      </div>
    </blockquote><p>I suppose it's possible that bhyve could be somehow =
causing the
      disk scheduler in the Linux guest to act differently. I'll see if
      I can figure out how to disable that in future tests.<br>
      <br>
    </p>
    <blockquote type=3D"cite" =
cite=3D"mid:BE794E98-7B69-4626-BB66-B56F23D6A67E@gmail.com">
      <div>
        <div>Additionally could also you play with =E2=80=9Csync=3Ddisable=
d=E2=80=9D
          volume/zvol option? Of course it is only for write =
testing.<br>
        </div>
      </div>
    </blockquote><p>The testing performed was not exclusively with =
zvols.<br>
      <br>
    </p><p>Once I have more hardware available, I'll try to report back =
with
      more testing. It may be interesting to also see how a Windows
      guest performs compared to Linux &amp; FreeBSD. I suspect that
      this issue may only be triggered when a fast disk array is in use
      on the host. My tests use a 16x SSD RAID 10 array. It's also quite
      possible that the disk IO slowdown is only a symptom of another
      issue that's triggered by the disk IO test ( please see end of my
      last post related to scheduler priority observations ). All I can
      say for sure is that ...<br>
      <br>
      1) There is a problem and it's reproducible across multiple =
hosts<br>
      2) It affects RHEL8 &amp; RHEL9 guests but not FreeBSD guests<br>
      3) It is not specific to any host or guest storage option<br>
      <br>
      Thanks,</p><p>-Matthew<br>
    </p>
  </div>

</div></blockquote></div><br></div></body></html>=

--Apple-Mail=_4C6B768C-854E-40D3-81D0-E10177927920--