Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 28 Feb 2024 12:29:44 -0600
From:      Matthew Grooms <mgrooms@shrew.net>
To:        virtualization@freebsd.org
Subject:   Re: bhyve disk performance issue
Message-ID:  <25ddf43d-f700-4cb5-af2a-1fe669d1e24b@shrew.net>
In-Reply-To: <BE794E98-7B69-4626-BB66-B56F23D6A67E@gmail.com>
References:  <6a128904-a4c1-41ec-a83d-56da56871ceb@shrew.net> <28ea168c-1211-4104-b8b4-daed0e60950d@app.fastmail.com> <0ff6f30a-b53a-4d0f-ac21-eaf701d35d00@shrew.net> <6f6b71ac-2349-4045-9eaf-5c50d42b89be@shrew.net> <50614ea4-f0f9-44a2-b5e6-ebb33cfffbc4@shrew.net> <6a4e7e1d-cca5-45d4-a268-1805a15d9819@shrew.net> <f01a9bca-7023-40c0-93f2-8cdbe4cd8078@tubnor.net> <edb80fff-561b-4dc5-95ee-204e0c6d95df@shrew.net> <a07d070b-4dc1-40c9-bc80-163cd59a5bfc@Duedinghausen.eu> <e45c95df-4858-48aa-a274-ba1bf8e599d5@shrew.net> <BE794E98-7B69-4626-BB66-B56F23D6A67E@gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.
--------------W0i8tGWHhvTJGsHdz01WWBm5
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit

On 2/27/24 04:21, Vitaliy Gusev wrote:
> Hi,
>
>
>> On 23 Feb 2024, at 18:37, Matthew Grooms <mgrooms@shrew.net> wrote:
>>
>>> ...
>> The problem occurs when an image file is used on either ZFS or UFS. 
>> The problem also occurs when the virtual disk is backed by a raw disk 
>> partition or a ZVOL. This issue isn't related to a specific 
>> underlying filesystem.
>>
>
> Do I understand right, you ran testing inside VM inside guest VM  on 
> ext4 filesystem? If so you should be aware about additional overhead 
> in comparison when you were running tests on the hosts.
>
Hi Vitaliy,

I appreciate you providing the feedback and suggestions. I spent over a 
week trying as many combinations of host and guest options as possible 
to narrow this issue down to a specific host storage or a guest device 
model option. Unfortunately the problem occurred with every combination 
I tested while running Linux as the guest. Note, I only tested RHEL8 & 
RHEL9 compatible distributions ( Alma & Rocky ). The problem did not 
occur when I ran FreeBSD as the guest. The problem did not occur when I 
ran KVM in the host and Linux as the guest.

> I would suggest to run fio (or even dd) on raw disk device inside VM, 
> i.e. without filesystem at all.  Just do not forget do “echo 3 > 
> /proc/sys/vm/drop_caches” in Linux Guest VM before you run tests.

The two servers I was using to test with are are no longer available. 
However, I'll have two more identical servers arriving in the next week 
or so. I'll try to run additional tests and report back here. I used 
bonnie++ as that was easily installed from the package repos on all the 
systems I tested.

>
> Could you also give more information about:
>
>  1. What results did you get (decode bonnie++ output)?

If you look back at this email thread, there are many examples of 
running bonnie++ on the guest. I first ran the tests on the host system 
using Linux + ext4 and FreeBSD 14 + UFS & ZFS to get a baseline of 
performance. Then I ran bonnie++ tests using bhyve as the hypervisor and 
Linux & FreeBSD as the guest. The combination of host and guest storage 
options included ...

1) block device + virtio blk
2) block device + nvme
3) UFS disk image + virtio blk
4) UFS disk image + nvme
5) ZFS disk image + virtio blk
6) ZFS disk image + nvme
7) ZVOL + virtio blk
8) ZVOL + nvme

In every instance, I observed the Linux guest disk IO often perform very 
well for some time after the guest was first booted. Then the 
performance of the guest would drop to a fraction of the original 
performance. The benchmark test was run every 5 or 10 minutes in a cron 
job. Sometimes the guest would perform well for up to an hour before 
performance would drop off. Most of the time it would only perform well 
for a few cycles ( 10 - 30 mins ) before performance would drop off. The 
only way to restore the performance was to reboot the guest. Once I 
determined that the problem was not specific to a particular host or 
guest storage option, I switched my testing to only use a block device 
as backing storage on the host to avoid hitting any system disk caches.

Here is the test script I used in the cron job ...

#!/bin/sh
FNAME='output.txt'

echo 
================================================================================ 
 >> $FNAME
echo Begin @ `/usr/bin/date` >> $FNAME
echo >> $FNAME
/usr/sbin/bonnie++ 2>&1 | /usr/bin/grep -v 'done\|,' >> $FNAME
echo >> $FNAME
echo End @ `/usr/bin/date` >> $FNAME

As you can see, I'm calling bonnie++ with the system defaults. That uses 
a data set size that's 2x the guest RAM in an attempt to minimize the 
effect of filesystem cache on results. Here is an example of the output 
that bonnie++ produces ...

Version  2.00 ------Sequential Output------ --Sequential Input- --Random-
                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
--Seeks--
Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec %CP  
/sec %CP
linux-blk    63640M  694k  99  1.6g  99  737m  76  985k  99 1.3g  69 
+++++ +++
Latency             11579us     535us   11889us    8597us 21819us    8238us
Version  2.00       ------Sequential Create------ --------Random 
Create--------
linux-blk           -Create-- --Read--- -Delete-- -Create-- --Read--- 
-Delete--
               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec %CP  
/sec %CP
                  16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ 
+++++ +++
Latency              7620us     126us    1648us     151us 15us     633us

--------------------------------- speed drop 
---------------------------------

Version  2.00       ------Sequential Output------ --Sequential Input- 
--Random-
                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
--Seeks--
Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec %CP  
/sec %CP
linux-blk    63640M  676k  99  451m  99  314m  93  951k  99 402m  99 
15167 530
Latency             11902us    8959us   24711us   10185us 20884us    5831us
Version  2.00       ------Sequential Create------ --------Random 
Create--------
linux-blk           -Create-- --Read--- -Delete-- -Create-- --Read--- 
-Delete--
               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec %CP  
/sec %CP
                  16     0  96 +++++ +++ +++++ +++     0  96 +++++ 
+++     0  75
Latency               343us     165us    1636us     113us 55us    1836us

In the example above, the benchmark test repeated about 20 times with 
results that were similar to the performance shown above the dotted line 
( ~ 1.6g/s seq write and 1.3g/s seq read ). After that, the performance 
dropped to what's shown below the dotted line which is less than 1/4 the 
original speed ( ~ 451m/s seq write and 402m/s seq read ).

>  2. What results expecting?
>
What I expect is that, when I perform the same test with the same 
parameters, the results would stay more or less consistent over 
time. This is true when KVM is used as the hypervisor on the same 
hardware and guest options. That said, I'm not worried about bhyve being 
consistently slower than kvm or a FreeBSD guest being consistently 
slower than a Linux guest. I'm concerned that the performance drop over 
time is indicative of an issue with how bhyve interacts with non-freebsd 
guests.

>  3. VM configuration, virtio-blk disk size, etc.
>  4. Full command for tests (including size of test-set), bhyve, etc.

I believe this was answered above. Please let me know if you have 
additional questions.

>
>  5. Did you pass virtio-blk as 512 or 4K ? If 512, probably you should 
> try 4K.
>
The testing performed was not exclusively with virtio-blk.

>  6. Linux has several read-ahead options for IO schedule, and it could 
> be related too.
>
I suppose it's possible that bhyve could be somehow causing the disk 
scheduler in the Linux guest to act differently. I'll see if I can 
figure out how to disable that in future tests.

> Additionally could also you play with “sync=disabled” volume/zvol 
> option? Of course it is only for write testing.

The testing performed was not exclusively with zvols.

Once I have more hardware available, I'll try to report back with more 
testing. It may be interesting to also see how a Windows guest performs 
compared to Linux & FreeBSD. I suspect that this issue may only be 
triggered when a fast disk array is in use on the host. My tests use a 
16x SSD RAID 10 array. It's also quite possible that the disk IO 
slowdown is only a symptom of another issue that's triggered by the disk 
IO test ( please see end of my last post related to scheduler priority 
observations ). All I can say for sure is that ...

1) There is a problem and it's reproducible across multiple hosts
2) It affects RHEL8 & RHEL9 guests but not FreeBSD guests
3) It is not specific to any host or guest storage option

Thanks,

-Matthew

--------------W0i8tGWHhvTJGsHdz01WWBm5
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: 8bit

<!DOCTYPE html>
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <div class="moz-cite-prefix">On 2/27/24 04:21, Vitaliy Gusev wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:BE794E98-7B69-4626-BB66-B56F23D6A67E@gmail.com">
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      Hi,
      <div><br>
      </div>
      <div>
        <div><br>
          <blockquote type="cite">
            <div>On 23 Feb 2024, at 18:37, Matthew Grooms
              <a class="moz-txt-link-rfc2396E" href="mailto:mgrooms@shrew.net">&lt;mgrooms@shrew.net&gt;</a> wrote:</div>
            <br class="Apple-interchange-newline">
            <div>
              <div>
                <blockquote type="cite">...</blockquote>
                The problem occurs when an image file is used on either
                ZFS or UFS. The problem also occurs when the virtual
                disk is backed by a raw disk partition or a ZVOL. This
                issue isn't related to a specific underlying filesystem.<br>
                <br>
              </div>
            </div>
          </blockquote>
          <div><br>
          </div>
          Do I understand right, you ran testing inside VM inside guest
          VM  on ext4 filesystem? If so you should be aware about
          additional overhead in comparison when you were running tests
          on the hosts.</div>
        <div><br>
        </div>
      </div>
    </blockquote>
    <p>Hi Vitaliy,<br>
      <br>
      I appreciate you providing the feedback and suggestions. I spent
      over a week trying as many combinations of host and guest options
      as possible to narrow this issue down to a specific host storage
      or a guest device model option. Unfortunately the problem occurred
      with every combination I tested while running Linux as the guest.
      Note, I only tested RHEL8 &amp; RHEL9 compatible distributions (
      Alma &amp; Rocky ). The problem did not occur when I ran FreeBSD
      as the guest. The problem did not occur when I ran KVM in the host
      and Linux as the guest.<br>
    </p>
    <blockquote type="cite"
      cite="mid:BE794E98-7B69-4626-BB66-B56F23D6A67E@gmail.com">
      <div>
        <div>I would suggest to run fio (or even dd) on raw disk device
          inside VM, i.e. without filesystem at all.  Just do not forget
          do “<span
style="font-family: Menlo; font-size: 12px; background-color: rgb(231, 238, 238);">echo
            3 &gt; /proc/sys/vm/drop_caches</span>” in Linux Guest VM
          before you run tests. <br>
        </div>
      </div>
    </blockquote>
    <p>The two servers I was using to test with are are no longer
      available. However, I'll have two more identical servers arriving
      in the next week or so. I'll try to run additional tests and
      report back here. I used bonnie++ as that was easily installed
      from the package repos on all the systems I tested.<br>
    </p>
    <blockquote type="cite"
      cite="mid:BE794E98-7B69-4626-BB66-B56F23D6A67E@gmail.com">
      <div>
        <div><br>
        </div>
        <div>Could you also give more information about:</div>
        <div><br>
        </div>
        <div> 1. What results did you get (decode bonnie++ output)?</div>
      </div>
    </blockquote>
    <p>If you look back at this email thread, there are many examples of
      running bonnie++ on the guest. I first ran the tests on the host
      system using Linux + ext4 and FreeBSD 14 + UFS &amp; ZFS to get a
      baseline of performance. Then I ran bonnie++ tests using bhyve as
      the hypervisor and Linux &amp; FreeBSD as the guest. The
      combination of host and guest storage options included ...<br>
      <br>
      1) block device + virtio blk<br>
      2) block device + nvme<br>
      3) UFS disk image + virtio blk<br>
      4) UFS disk image + nvme<br>
      5) ZFS disk image + virtio blk<br>
      6) ZFS disk image + nvme<br>
      7) ZVOL + virtio blk<br>
      8) ZVOL + nvme<br>
      <br>
      In every instance, I observed the Linux guest disk IO often
      perform very well for some time after the guest was first booted.
      Then the performance of the guest would drop to a fraction of the
      original performance. The benchmark test was run every 5 or 10
      minutes in a cron job. Sometimes the guest would perform well for
      up to an hour before performance would drop off. Most of the time
      it would only perform well for a few cycles ( 10 - 30 mins )
      before performance would drop off. The only way to restore the
      performance was to reboot the guest. Once I determined that the
      problem was not specific to a particular host or guest storage
      option, I switched my testing to only use a block device as
      backing storage on the host to avoid hitting any system disk
      caches.<br>
      <br>
      Here is the test script I used in the cron job ...<br>
      <br>
      <font size="2" face="monospace">#!/bin/sh<br>
        FNAME='output.txt'<br>
      </font>
      <font face="monospace"><font size="2"><br>
          echo
================================================================================
          &gt;&gt; $FNAME<br>
          echo Begin @ `/usr/bin/date` &gt;&gt; $FNAME<br>
          echo &gt;&gt; $FNAME<br>
          /usr/sbin/bonnie++ 2&gt;&amp;1 | /usr/bin/grep -v 'done\|,'
          &gt;&gt; $FNAME<br>
          echo &gt;&gt; $FNAME<br>
          echo End @ `/usr/bin/date` &gt;&gt; $FNAME</font><br>
      </font>
      <br>
      As you can see, I'm calling bonnie++ with the system defaults.
      That uses a data set size that's 2x the guest RAM in an attempt to
      minimize the effect of filesystem cache on results. Here is an
      example of the output that bonnie++ produces ...<br>
      <br>
      <font face="monospace"><font size="2">Version  2.00      
          ------Sequential Output------ --Sequential Input- --Random-<br>
                              -Per Chr- --Block-- -Rewrite- -Per Chr-
          --Block-- --Seeks--<br>
          Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP 
          /sec %CP  /sec %CP<br>
          linux-blk    63640M  694k  99  1.6g  99  737m  76  985k  99 
          1.3g  69 +++++ +++<br>
          Latency             11579us     535us   11889us    8597us  
          21819us    8238us<br>
          Version  2.00       ------Sequential Create------
          --------Random Create--------<br>
          linux-blk           -Create-- --Read--- -Delete-- -Create--
          --Read--- -Delete--<br>
                        files  /sec %CP  /sec %CP  /sec %CP  /sec %CP 
          /sec %CP  /sec %CP<br>
                           16 +++++ +++ +++++ +++ +++++ +++ +++++ +++
          +++++ +++ +++++ +++<br>
          Latency              7620us     126us    1648us     151us     
          15us     633us<br>
          <br>
          --------------------------------- speed drop
          ---------------------------------<br>
          <br>
          Version  2.00       ------Sequential Output------ --Sequential
          Input- --Random-<br>
                              -Per Chr- --Block-- -Rewrite- -Per Chr-
          --Block-- --Seeks--<br>
          Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP 
          /sec %CP  /sec %CP<br>
          linux-blk    63640M  676k  99  451m  99  314m  93  951k  99 
          402m  99 15167 530<br>
          Latency             11902us    8959us   24711us   10185us  
          20884us    5831us<br>
          Version  2.00       ------Sequential Create------
          --------Random Create--------<br>
          linux-blk           -Create-- --Read--- -Delete-- -Create--
          --Read--- -Delete--<br>
                        files  /sec %CP  /sec %CP  /sec %CP  /sec %CP 
          /sec %CP  /sec %CP<br>
                           16     0  96 +++++ +++ +++++ +++     0  96
          +++++ +++     0  75<br>
          Latency               343us     165us    1636us     113us     
          55us    1836us<br>
        </font></font><br>
      In the example above, the benchmark test repeated about 20 times
      with results that were similar to the performance shown above the
      dotted line ( ~ 1.6g/s seq write and 1.3g/s seq read ). After
      that, the performance dropped to what's shown below the dotted
      line which is less than 1/4 the original speed ( ~ 451m/s seq
      write and 402m/s seq read ). <br>
    </p>
    <blockquote type="cite"
      cite="mid:BE794E98-7B69-4626-BB66-B56F23D6A67E@gmail.com">
      <div>
        <div> 2. What results expecting?<br>
          <br>
        </div>
      </div>
    </blockquote>
    <p>What I expect is that, when I perform the same test with the same
      parameters, the results would stay more or less consistent over
      time. This is true when KVM is used as the hypervisor on the same
      hardware and guest options. That said, I'm not worried about bhyve
      being consistently slower than kvm or a FreeBSD guest being
      consistently slower than a Linux guest. I'm concerned that the
      performance drop over time is indicative of an issue with how
      bhyve interacts with non-freebsd guests.<br>
    </p>
    <blockquote type="cite"
      cite="mid:BE794E98-7B69-4626-BB66-B56F23D6A67E@gmail.com">
      <div>
        <div> 3. VM configuration, virtio-blk disk size, etc.</div>
        <div> 4. Full command for tests (including size of test-set),
          bhyve, etc.<br>
        </div>
      </div>
    </blockquote>
    <p>I believe this was answered above. Please let me know if you have
      additional questions.<br>
    </p>
    <blockquote type="cite"
      cite="mid:BE794E98-7B69-4626-BB66-B56F23D6A67E@gmail.com">
      <div>
        <div><br>
        </div>
        <div> 5. Did you pass virtio-blk as 512 or 4K ? If 512, probably
          you should try 4K.<br>
          <br>
        </div>
      </div>
    </blockquote>
    <p>The testing performed was not exclusively with virtio-blk.<br>
      <br>
    </p>
    <blockquote type="cite"
      cite="mid:BE794E98-7B69-4626-BB66-B56F23D6A67E@gmail.com">
      <div>
        <div> 6. Linux has several read-ahead options for IO schedule,
          and it could be related too.</div>
        <div><br>
        </div>
      </div>
    </blockquote>
    <p>I suppose it's possible that bhyve could be somehow causing the
      disk scheduler in the Linux guest to act differently. I'll see if
      I can figure out how to disable that in future tests.<br>
      <br>
    </p>
    <blockquote type="cite"
      cite="mid:BE794E98-7B69-4626-BB66-B56F23D6A67E@gmail.com">
      <div>
        <div>Additionally could also you play with “sync=disabled”
          volume/zvol option? Of course it is only for write testing.<br>
        </div>
      </div>
    </blockquote>
    <p>The testing performed was not exclusively with zvols.<br>
      <br>
    </p>
    <p>Once I have more hardware available, I'll try to report back with
      more testing. It may be interesting to also see how a Windows
      guest performs compared to Linux &amp; FreeBSD. I suspect that
      this issue may only be triggered when a fast disk array is in use
      on the host. My tests use a 16x SSD RAID 10 array. It's also quite
      possible that the disk IO slowdown is only a symptom of another
      issue that's triggered by the disk IO test ( please see end of my
      last post related to scheduler priority observations ). All I can
      say for sure is that ...<br>
      <br>
      1) There is a problem and it's reproducible across multiple hosts<br>
      2) It affects RHEL8 &amp; RHEL9 guests but not FreeBSD guests<br>
      3) It is not specific to any host or guest storage option<br>
      <br>
      Thanks,</p>
    <p>-Matthew<br>
    </p>
  </body>
</html>

--------------W0i8tGWHhvTJGsHdz01WWBm5--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?25ddf43d-f700-4cb5-af2a-1fe669d1e24b>