Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 28 Feb 2024 14:03:03 -0600
From:      Matthew Grooms <mgrooms@shrew.net>
To:        Vitaliy Gusev <gusev.vitaliy@gmail.com>
Cc:        virtualization@freebsd.org
Subject:   Re: bhyve disk performance issue
Message-ID:  <b353b39a-56d3-4757-a607-3c612944b509@shrew.net>
In-Reply-To: <1DAEB435-A613-4A04-B63F-D7AF7A0B7C0A@gmail.com>
References:  <6a128904-a4c1-41ec-a83d-56da56871ceb@shrew.net> <28ea168c-1211-4104-b8b4-daed0e60950d@app.fastmail.com> <0ff6f30a-b53a-4d0f-ac21-eaf701d35d00@shrew.net> <6f6b71ac-2349-4045-9eaf-5c50d42b89be@shrew.net> <50614ea4-f0f9-44a2-b5e6-ebb33cfffbc4@shrew.net> <6a4e7e1d-cca5-45d4-a268-1805a15d9819@shrew.net> <f01a9bca-7023-40c0-93f2-8cdbe4cd8078@tubnor.net> <edb80fff-561b-4dc5-95ee-204e0c6d95df@shrew.net> <a07d070b-4dc1-40c9-bc80-163cd59a5bfc@Duedinghausen.eu> <e45c95df-4858-48aa-a274-ba1bf8e599d5@shrew.net> <BE794E98-7B69-4626-BB66-B56F23D6A67E@gmail.com> <25ddf43d-f700-4cb5-af2a-1fe669d1e24b@shrew.net> <1DAEB435-A613-4A04-B63F-D7AF7A0B7C0A@gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.
--------------6Zic9841NtG2gfiqcz3dfntV
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit

On 2/28/24 13:31, Vitaliy Gusev wrote:
> Hi,  Matthew.
>
HI Vitaliy,

Thanks for the pointers.

> I still do not know what command line was used for bhyve. I  couldn't 
> find it through the thread, sorry. And I couldn't find virtual disk 
> size that you used.
>
Sorry about that. I'll try to get you the exact command line invocation 
used to launch the guest process once I have test hardware again.

>
> Could you, please, simplify bonnie++ output, it is hard to decode due 
> to alignment and use exact numbers for:
>
> READ seq  - I see you had 1.6GB/s for the good time and ~500MB/s for 
> the worst.
> WRITE seq  - ...
>
I summarized the output for you. Here it is again:

Fast: ~ 1.6g/s seq write and 1.3g/s seq read
Slow: ~ 451m/s seq write and 402m/s seq read

> If you have slow results both for the read and write operations, you 
> probably should perform testing _only_ for READs and do not do 
> anything until READs are fine.
>
> Again, if you have slow performance for Ext4 Filesystem in guest VM 
> placed on the passed disk image, you should try to test on the raw 
> disk image, i.e. without Ext4, because it could be related.
>
> If you run test inside VM on a filesystem, you can have deal with 
> filesystem bottlenecks, bugs, fragmentation etc. Do you want to fix 
> them all? I don’t think so.
>
> For example, if you pass disk image 40G and create Ext4 filesystem, 
> and during testing the filesystem becomes full over 80%, I/O could be 
> performed not so fine.
>
> You probably should eliminate that guest filesystem behaviour when you 
> meet IO performance slowdown.
>
> Also, please look at the TRIM operations when you perform WRITE 
> testing. It could be also related to the slow write I/O.
>
The virtual disks were provisioned with either a 128G disk image or a 
1TB raw partition, so I don't think space was an issue.

Trim is definitely not an issue. I'm using a tiny fraction of the 32TB 
array have tried both heavily under-provisioned HW RAID10 and SW RAID10 
using GEOM. The latter was tested after sending full trim resets to all 
drives individually.

I will try to incorporate the rest of your feedback into my next round 
of testing. If I can find a benchmark tool that works with a raw block 
device, that would be ideal.

Thanks,

-Matthew


> ——
> Vitaliy
>
>> On 28 Feb 2024, at 21:29, Matthew Grooms <mgrooms@shrew.net> wrote:
>>
>> On 2/27/24 04:21, Vitaliy Gusev wrote:
>>> Hi,
>>>
>>>
>>>> On 23 Feb 2024, at 18:37, Matthew Grooms <mgrooms@shrew.net> wrote:
>>>>
>>>>> ...
>>>> The problem occurs when an image file is used on either ZFS or UFS. 
>>>> The problem also occurs when the virtual disk is backed by a raw 
>>>> disk partition or a ZVOL. This issue isn't related to a specific 
>>>> underlying filesystem.
>>>>
>>>
>>> Do I understand right, you ran testing inside VM inside guest VM  on 
>>> ext4 filesystem? If so you should be aware about additional overhead 
>>> in comparison when you were running tests on the hosts.
>>>
>> Hi Vitaliy,
>>
>> I appreciate you providing the feedback and suggestions. I spent over 
>> a week trying as many combinations of host and guest options as 
>> possible to narrow this issue down to a specific host storage or a 
>> guest device model option. Unfortunately the problem occurred with 
>> every combination I tested while running Linux as the guest. Note, I 
>> only tested RHEL8 & RHEL9 compatible distributions ( Alma & Rocky ). 
>> The problem did not occur when I ran FreeBSD as the guest. The 
>> problem did not occur when I ran KVM in the host and Linux as the guest.
>>
>>> I would suggest to run fio (or even dd) on raw disk device inside 
>>> VM, i.e. without filesystem at all.  Just do not forget do “echo 3 > 
>>> /proc/sys/vm/drop_caches” in Linux Guest VM before you run tests.
>>
>> The two servers I was using to test with are are no longer available. 
>> However, I'll have two more identical servers arriving in the next 
>> week or so. I'll try to run additional tests and report back here. I 
>> used bonnie++ as that was easily installed from the package repos on 
>> all the systems I tested.
>>
>>>
>>> Could you also give more information about:
>>>
>>>  1. What results did you get (decode bonnie++ output)?
>>
>> If you look back at this email thread, there are many examples of 
>> running bonnie++ on the guest. I first ran the tests on the host 
>> system using Linux + ext4 and FreeBSD 14 + UFS & ZFS to get a 
>> baseline of performance. Then I ran bonnie++ tests using bhyve as the 
>> hypervisor and Linux & FreeBSD as the guest. The combination of host 
>> and guest storage options included ...
>>
>> 1) block device + virtio blk
>> 2) block device + nvme
>> 3) UFS disk image + virtio blk
>> 4) UFS disk image + nvme
>> 5) ZFS disk image + virtio blk
>> 6) ZFS disk image + nvme
>> 7) ZVOL + virtio blk
>> 8) ZVOL + nvme
>>
>> In every instance, I observed the Linux guest disk IO often perform 
>> very well for some time after the guest was first booted. Then the 
>> performance of the guest would drop to a fraction of the original 
>> performance. The benchmark test was run every 5 or 10 minutes in a 
>> cron job. Sometimes the guest would perform well for up to an hour 
>> before performance would drop off. Most of the time it would only 
>> perform well for a few cycles ( 10 - 30 mins ) before performance 
>> would drop off. The only way to restore the performance was to reboot 
>> the guest. Once I determined that the problem was not specific to a 
>> particular host or guest storage option, I switched my testing to 
>> only use a block device as backing storage on the host to avoid 
>> hitting any system disk caches.
>>
>> Here is the test script I used in the cron job ...
>>
>> #!/bin/sh
>> FNAME='output.txt'
>>
>> echo 
>> ================================================================================ 
>> >> $FNAME
>> echo Begin @ `/usr/bin/date` >> $FNAME
>> echo >> $FNAME
>> /usr/sbin/bonnie++ 2>&1 | /usr/bin/grep -v 'done\|,' >> $FNAME
>> echo >> $FNAME
>> echo End @ `/usr/bin/date` >> $FNAME
>>
>> As you can see, I'm calling bonnie++ with the system defaults. That 
>> uses a data set size that's 2x the guest RAM in an attempt to 
>> minimize the effect of filesystem cache on results. Here is an 
>> example of the output that bonnie++ produces ...
>>
>> Version 2.00       ------Sequential Output------ --Sequential Input- 
>> --Random-
>>                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
>> --Seeks--
>> Name:Size etc        /sec %CP  /sec %CP  /sec %CP /sec %CP  /sec %CP  
>> /sec %CP
>> linux-blk    63640M  694k  99  1.6g  99  737m  76 985k  99  1.3g  69 
>> +++++ +++
>> Latency             11579us     535us   11889us 8597us   21819us    
>> 8238us
>> Version  2.00       ------Sequential Create------ --------Random 
>> Create--------
>> linux-blk           -Create-- --Read--- -Delete-- -Create-- --Read--- 
>> -Delete--
>>               files  /sec %CP  /sec %CP  /sec %CP /sec %CP  /sec %CP  
>> /sec %CP
>>                  16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ 
>> +++++ +++
>> Latency              7620us     126us 1648us     151us      15us     
>> 633us
>>
>> --------------------------------- speed drop 
>> ---------------------------------
>>
>> Version  2.00       ------Sequential Output------ --Sequential Input- 
>> --Random-
>>                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
>> --Seeks--
>> Name:Size etc        /sec %CP  /sec %CP  /sec %CP /sec %CP  /sec %CP  
>> /sec %CP
>> linux-blk    63640M  676k  99  451m  99  314m  93 951k  99  402m  99 
>> 15167 530
>> Latency             11902us    8959us   24711us 10185us   20884us    
>> 5831us
>> Version  2.00       ------Sequential Create------ --------Random 
>> Create--------
>> linux-blk           -Create-- --Read--- -Delete-- -Create-- --Read--- 
>> -Delete--
>>               files  /sec %CP  /sec %CP  /sec %CP /sec %CP  /sec %CP  
>> /sec %CP
>>                  16     0  96 +++++ +++ +++++ +++     0  96 +++++ 
>> +++     0  75
>> Latency               343us     165us 1636us     113us      55us    
>> 1836us
>>
>> In the example above, the benchmark test repeated about 20 times with 
>> results that were similar to the performance shown above the dotted 
>> line ( ~ 1.6g/s seq write and 1.3g/s seq read ). After that, the 
>> performance dropped to what's shown below the dotted line which is 
>> less than 1/4 the original speed ( ~ 451m/s seq write and 402m/s seq 
>> read ).
>>
>>>  2. What results expecting?
>>>
>> What I expect is that, when I perform the same test with the same 
>> parameters, the results would stay more or less consistent over 
>> time. This is true when KVM is used as the hypervisor on the same 
>> hardware and guest options. That said, I'm not worried about bhyve 
>> being consistently slower than kvm or a FreeBSD guest being 
>> consistently slower than a Linux guest. I'm concerned that the 
>> performance drop over time is indicative of an issue with how bhyve 
>> interacts with non-freebsd guests.
>>
>>>  3. VM configuration, virtio-blk disk size, etc.
>>>  4. Full command for tests (including size of test-set), bhyve, etc.
>>
>> I believe this was answered above. Please let me know if you have 
>> additional questions.
>>
>>>
>>>  5. Did you pass virtio-blk as 512 or 4K ? If 512, probably you 
>>> should try 4K.
>>>
>> The testing performed was not exclusively with virtio-blk.
>>
>>>  6. Linux has several read-ahead options for IO schedule, and it 
>>> could be related too.
>>>
>> I suppose it's possible that bhyve could be somehow causing the disk 
>> scheduler in the Linux guest to act differently. I'll see if I can 
>> figure out how to disable that in future tests.
>>
>>> Additionally could also you play with “sync=disabled” volume/zvol 
>>> option? Of course it is only for write testing.
>>
>> The testing performed was not exclusively with zvols.
>>
>> Once I have more hardware available, I'll try to report back with 
>> more testing. It may be interesting to also see how a Windows guest 
>> performs compared to Linux & FreeBSD. I suspect that this issue may 
>> only be triggered when a fast disk array is in use on the host. My 
>> tests use a 16x SSD RAID 10 array. It's also quite possible that the 
>> disk IO slowdown is only a symptom of another issue that's triggered 
>> by the disk IO test ( please see end of my last post related to 
>> scheduler priority observations ). All I can say for sure is that ...
>>
>> 1) There is a problem and it's reproducible across multiple hosts
>> 2) It affects RHEL8 & RHEL9 guests but not FreeBSD guests
>> 3) It is not specific to any host or guest storage option
>>
>> Thanks,
>>
>> -Matthew
>>
>
--------------6Zic9841NtG2gfiqcz3dfntV
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: 8bit

<!DOCTYPE html>
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <div class="moz-cite-prefix">On 2/28/24 13:31, Vitaliy Gusev wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:1DAEB435-A613-4A04-B63F-D7AF7A0B7C0A@gmail.com">
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      <span style="font-size: 15px;">Hi,  Matthew.</span>
      <div><span style="font-size: 15px;"><br>
        </span></div>
    </blockquote>
    <p>HI Vitaliy,</p>
    <p>Thanks for the pointers.<br>
    </p>
    <blockquote type="cite"
      cite="mid:1DAEB435-A613-4A04-B63F-D7AF7A0B7C0A@gmail.com">
      <div><span style="font-size: 15px;">I still do not know what
          command line was used for bhyve. I  couldn't find it through
          the thread, sorry. And I </span><span style="font-size: 15px;">couldn't
          find virtual disk size that you used.</span></div>
      <div><span style="font-size: 15px;"><br>
        </span></div>
    </blockquote>
    <p>Sorry about that. I'll try to get you the exact command line
      invocation used to launch the guest process once I have test
      hardware again.<br>
    </p>
    <blockquote type="cite"
      cite="mid:1DAEB435-A613-4A04-B63F-D7AF7A0B7C0A@gmail.com">
      <div><span style="font-size: 15px;"><br>
        </span></div>
      <div><span style="font-size: 15px;">Could you, please, simplify
          bonnie++ output, it is hard to decode due to alignment and use
          exact numbers for:</span></div>
      <div><span style="font-size: 15px;"><br>
        </span></div>
      <div><span style="font-size: 15px;">READ seq  - I see you had
          1.6GB/s for the good time and ~500MB/s for the worst.</span></div>
      <div><span style="font-size: 15px;">WRITE seq  - ...</span></div>
      <div><span style="font-size: 15px;"><br>
        </span></div>
    </blockquote>
    <p>I summarized the output for you. Here it is again:<br>
      <br>
      Fast: ~ 1.6g/s seq write and 1.3g/s seq read<br>
      Slow: ~ 451m/s seq write and 402m/s seq read<br>
      <br>
    </p>
    <blockquote type="cite"
      cite="mid:1DAEB435-A613-4A04-B63F-D7AF7A0B7C0A@gmail.com">
      <div><span style="font-size: 15px;">If you have slow results both
          for the read and write operations, you probably should perform
          testing <u>only</u> for READs and do not do anything until
          READs are fine.</span></div>
      <div><span style="font-size: 15px;"><br>
        </span></div>
      <div><span style="font-size: 15px;">Again, if you have slow
          performance for Ext4 Filesystem in guest VM placed on the
          passed disk image, you should </span><span
          style="font-size: 15px;">try to test on the raw disk image,
          i.e. without Ext4, because it could be related.</span></div>
      <div><span style="font-size: 15px;"><br>
        </span></div>
      <div><span style="font-size: 15px;">If you run test inside VM on a
          filesystem, you can have deal with filesystem bottlenecks,
          bugs, fragmentation etc. Do you want to fix them all? I don’t
          think so.</span></div>
      <div><span style="font-size: 15px;"><br>
        </span></div>
      <div><span style="font-size: 15px;">For example, if you pass disk
          image 40G and create Ext4 filesystem, and during testing the
          filesystem becomes full over 80%, I/O could be performed not
          so fine.</span></div>
      <div><span style="font-size: 15px;"><br>
        </span></div>
      <div><span style="font-size: 15px;">You probably should eliminate
          that guest filesystem behaviour when you meet IO performance
          slowdown.</span></div>
      <div><span style="font-size: 15px;"><br>
        </span></div>
      <div><span style="font-size: 15px;">Also, please look at the TRIM
          operations when you perform WRITE testing. It could be also
          related to the slow write I/O.</span></div>
      <div><span style="font-size: 15px;"><br>
        </span></div>
    </blockquote>
    <p>The virtual disks were provisioned with either a 128G disk image
      or a 1TB raw partition, so I don't think space was an issue.</p>
    <p>Trim is definitely not an issue. I'm using a tiny fraction of the
      32TB array have tried both heavily under-provisioned HW RAID10 and
      SW RAID10 using GEOM. The latter was tested after sending full
      trim resets to all drives individually.</p>
    <p>I will try to incorporate the rest of your feedback into my next
      round of testing. If I can find a benchmark tool that works with a
      raw block device, that would be ideal.<br>
      <br>
      Thanks,<br>
      <br>
      -Matthew<br>
    </p>
    <p><br>
    </p>
    <blockquote type="cite"
      cite="mid:1DAEB435-A613-4A04-B63F-D7AF7A0B7C0A@gmail.com">
      <div><span style="font-size: 15px;">——</span></div>
      <div><span style="font-size: 15px;">Vitaliy</span></div>
      <div>
        <div><br>
          <blockquote type="cite">
            <div>On 28 Feb 2024, at 21:29, Matthew Grooms
              <a class="moz-txt-link-rfc2396E" href="mailto:mgrooms@shrew.net">&lt;mgrooms@shrew.net&gt;</a> wrote:</div>
            <br class="Apple-interchange-newline">
            <div>
              <meta http-equiv="Content-Type"
                content="text/html; charset=UTF-8">
              <div>
                <div class="moz-cite-prefix">On 2/27/24 04:21, Vitaliy
                  Gusev wrote:<br>
                </div>
                <blockquote type="cite"
cite="mid:BE794E98-7B69-4626-BB66-B56F23D6A67E@gmail.com">
                  <meta http-equiv="content-type"
                    content="text/html; charset=UTF-8">
                  Hi,
                  <div><br>
                  </div>
                  <div>
                    <div><br>
                      <blockquote type="cite">
                        <div>On 23 Feb 2024, at 18:37, Matthew Grooms <a
                            class="moz-txt-link-rfc2396E"
                            href="mailto:mgrooms@shrew.net"
                            moz-do-not-send="true">&lt;mgrooms@shrew.net&gt;</a>
                          wrote:</div>
                        <br class="Apple-interchange-newline">
                        <div>
                          <div>
                            <blockquote type="cite">...</blockquote>
                            The problem occurs when an image file is
                            used on either ZFS or UFS. The problem also
                            occurs when the virtual disk is backed by a
                            raw disk partition or a ZVOL. This issue
                            isn't related to a specific underlying
                            filesystem.<br>
                            <br>
                          </div>
                        </div>
                      </blockquote>
                      <div><br>
                      </div>
                      Do I understand right, you ran testing inside VM
                      inside guest VM  on ext4 filesystem? If so you
                      should be aware about additional overhead in
                      comparison when you were running tests on the
                      hosts.</div>
                    <div><br>
                    </div>
                  </div>
                </blockquote>
                <p>Hi Vitaliy,<br>
                  <br>
                  I appreciate you providing the feedback and
                  suggestions. I spent over a week trying as many
                  combinations of host and guest options as possible to
                  narrow this issue down to a specific host storage or a
                  guest device model option. Unfortunately the problem
                  occurred with every combination I tested while running
                  Linux as the guest. Note, I only tested RHEL8 &amp;
                  RHEL9 compatible distributions ( Alma &amp; Rocky ).
                  The problem did not occur when I ran FreeBSD as the
                  guest. The problem did not occur when I ran KVM in the
                  host and Linux as the guest.<br>
                </p>
                <blockquote type="cite"
cite="mid:BE794E98-7B69-4626-BB66-B56F23D6A67E@gmail.com">
                  <div>
                    <div>I would suggest to run fio (or even dd) on raw
                      disk device inside VM, i.e. without filesystem at
                      all.  Just do not forget do “<span
style="font-family: Menlo; font-size: 12px; background-color: rgb(231, 238, 238);">echo
                        3 &gt; /proc/sys/vm/drop_caches</span>” in Linux
                      Guest VM before you run tests. <br>
                    </div>
                  </div>
                </blockquote>
                <p>The two servers I was using to test with are are no
                  longer available. However, I'll have two more
                  identical servers arriving in the next week or so.
                  I'll try to run additional tests and report back here.
                  I used bonnie++ as that was easily installed from the
                  package repos on all the systems I tested.<br>
                </p>
                <blockquote type="cite"
cite="mid:BE794E98-7B69-4626-BB66-B56F23D6A67E@gmail.com">
                  <div>
                    <div><br>
                    </div>
                    <div>Could you also give more information about:</div>
                    <div><br>
                    </div>
                    <div> 1. What results did you get (decode bonnie++
                      output)?</div>
                  </div>
                </blockquote>
                <p>If you look back at this email thread, there are many
                  examples of running bonnie++ on the guest. I first ran
                  the tests on the host system using Linux + ext4 and
                  FreeBSD 14 + UFS &amp; ZFS to get a baseline of
                  performance. Then I ran bonnie++ tests using bhyve as
                  the hypervisor and Linux &amp; FreeBSD as the guest.
                  The combination of host and guest storage options
                  included ...<br>
                  <br>
                  1) block device + virtio blk<br>
                  2) block device + nvme<br>
                  3) UFS disk image + virtio blk<br>
                  4) UFS disk image + nvme<br>
                  5) ZFS disk image + virtio blk<br>
                  6) ZFS disk image + nvme<br>
                  7) ZVOL + virtio blk<br>
                  8) ZVOL + nvme<br>
                  <br>
                  In every instance, I observed the Linux guest disk IO
                  often perform very well for some time after the guest
                  was first booted. Then the performance of the guest
                  would drop to a fraction of the original performance.
                  The benchmark test was run every 5 or 10 minutes in a
                  cron job. Sometimes the guest would perform well for
                  up to an hour before performance would drop off. Most
                  of the time it would only perform well for a few
                  cycles ( 10 - 30 mins ) before performance would drop
                  off. The only way to restore the performance was to
                  reboot the guest. Once I determined that the problem
                  was not specific to a particular host or guest storage
                  option, I switched my testing to only use a block
                  device as backing storage on the host to avoid hitting
                  any system disk caches.<br>
                  <br>
                  Here is the test script I used in the cron job ...<br>
                  <br>
                  <font size="2" face="monospace">#!/bin/sh<br>
                    FNAME='output.txt'<br>
                  </font> <font face="monospace"><font size="2"><br>
                      echo
================================================================================
                      &gt;&gt; $FNAME<br>
                      echo Begin @ `/usr/bin/date` &gt;&gt; $FNAME<br>
                      echo &gt;&gt; $FNAME<br>
                      /usr/sbin/bonnie++ 2&gt;&amp;1 | /usr/bin/grep -v
                      'done\|,' &gt;&gt; $FNAME<br>
                      echo &gt;&gt; $FNAME<br>
                      echo End @ `/usr/bin/date` &gt;&gt; $FNAME</font><br>
                  </font> <br>
                  As you can see, I'm calling bonnie++ with the system
                  defaults. That uses a data set size that's 2x the
                  guest RAM in an attempt to minimize the effect of
                  filesystem cache on results. Here is an example of the
                  output that bonnie++ produces ...<br>
                  <br>
                  <font face="monospace"><font size="2">Version 
                      2.00       ------Sequential Output------
                      --Sequential Input- --Random-<br>
                                          -Per Chr- --Block-- -Rewrite-
                      -Per Chr- --Block-- --Seeks--<br>
                      Name:Size etc        /sec %CP  /sec %CP  /sec %CP 
                      /sec %CP  /sec %CP  /sec %CP<br>
                      linux-blk    63640M  694k  99  1.6g  99  737m  76 
                      985k  99  1.3g  69 +++++ +++<br>
                      Latency             11579us     535us   11889us   
                      8597us   21819us    8238us<br>
                      Version  2.00       ------Sequential Create------
                      --------Random Create--------<br>
                      linux-blk           -Create-- --Read--- -Delete--
                      -Create-- --Read--- -Delete--<br>
                                    files  /sec %CP  /sec %CP  /sec %CP 
                      /sec %CP  /sec %CP  /sec %CP<br>
                                       16 +++++ +++ +++++ +++ +++++ +++
                      +++++ +++ +++++ +++ +++++ +++<br>
                      Latency              7620us     126us   
                      1648us     151us      15us     633us<br>
                      <br>
                      --------------------------------- speed drop
                      ---------------------------------<br>
                      <br>
                      Version  2.00       ------Sequential Output------
                      --Sequential Input- --Random-<br>
                                          -Per Chr- --Block-- -Rewrite-
                      -Per Chr- --Block-- --Seeks--<br>
                      Name:Size etc        /sec %CP  /sec %CP  /sec %CP 
                      /sec %CP  /sec %CP  /sec %CP<br>
                      linux-blk    63640M  676k  99  451m  99  314m  93 
                      951k  99  402m  99 15167 530<br>
                      Latency             11902us    8959us   24711us  
                      10185us   20884us    5831us<br>
                      Version  2.00       ------Sequential Create------
                      --------Random Create--------<br>
                      linux-blk           -Create-- --Read--- -Delete--
                      -Create-- --Read--- -Delete--<br>
                                    files  /sec %CP  /sec %CP  /sec %CP 
                      /sec %CP  /sec %CP  /sec %CP<br>
                                       16     0  96 +++++ +++ +++++
                      +++     0  96 +++++ +++     0  75<br>
                      Latency               343us     165us   
                      1636us     113us      55us    1836us<br>
                    </font></font><br>
                  In the example above, the benchmark test repeated
                  about 20 times with results that were similar to the
                  performance shown above the dotted line ( ~ 1.6g/s seq
                  write and 1.3g/s seq read ). After that, the
                  performance dropped to what's shown below the dotted
                  line which is less than 1/4 the original speed ( ~
                  451m/s seq write and 402m/s seq read ). <br>
                </p>
                <blockquote type="cite"
cite="mid:BE794E98-7B69-4626-BB66-B56F23D6A67E@gmail.com">
                  <div>
                    <div> 2. What results expecting?<br>
                      <br>
                    </div>
                  </div>
                </blockquote>
                <p>What I expect is that, when I perform the same test
                  with the same parameters, the results would stay more
                  or less consistent over time. This is true when KVM is
                  used as the hypervisor on the same hardware and guest
                  options. That said, I'm not worried about bhyve being
                  consistently slower than kvm or a FreeBSD guest being
                  consistently slower than a Linux guest. I'm concerned
                  that the performance drop over time is indicative of
                  an issue with how bhyve interacts with non-freebsd
                  guests.<br>
                </p>
                <blockquote type="cite"
cite="mid:BE794E98-7B69-4626-BB66-B56F23D6A67E@gmail.com">
                  <div>
                    <div> 3. VM configuration, virtio-blk disk size,
                      etc.</div>
                    <div> 4. Full command for tests (including size of
                      test-set), bhyve, etc.<br>
                    </div>
                  </div>
                </blockquote>
                <p>I believe this was answered above. Please let me know
                  if you have additional questions.<br>
                </p>
                <blockquote type="cite"
cite="mid:BE794E98-7B69-4626-BB66-B56F23D6A67E@gmail.com">
                  <div>
                    <div><br>
                    </div>
                    <div> 5. Did you pass virtio-blk as 512 or 4K ? If
                      512, probably you should try 4K.<br>
                      <br>
                    </div>
                  </div>
                </blockquote>
                <p>The testing performed was not exclusively with
                  virtio-blk.<br>
                  <br>
                </p>
                <blockquote type="cite"
cite="mid:BE794E98-7B69-4626-BB66-B56F23D6A67E@gmail.com">
                  <div>
                    <div> 6. Linux has several read-ahead options for IO
                      schedule, and it could be related too.</div>
                    <div><br>
                    </div>
                  </div>
                </blockquote>
                <p>I suppose it's possible that bhyve could be somehow
                  causing the disk scheduler in the Linux guest to act
                  differently. I'll see if I can figure out how to
                  disable that in future tests.<br>
                  <br>
                </p>
                <blockquote type="cite"
cite="mid:BE794E98-7B69-4626-BB66-B56F23D6A67E@gmail.com">
                  <div>
                    <div>Additionally could also you play with
                      “sync=disabled” volume/zvol option? Of course it
                      is only for write testing.<br>
                    </div>
                  </div>
                </blockquote>
                <p>The testing performed was not exclusively with zvols.<br>
                  <br>
                </p>
                <p>Once I have more hardware available, I'll try to
                  report back with more testing. It may be interesting
                  to also see how a Windows guest performs compared to
                  Linux &amp; FreeBSD. I suspect that this issue may
                  only be triggered when a fast disk array is in use on
                  the host. My tests use a 16x SSD RAID 10 array. It's
                  also quite possible that the disk IO slowdown is only
                  a symptom of another issue that's triggered by the
                  disk IO test ( please see end of my last post related
                  to scheduler priority observations ). All I can say
                  for sure is that ...<br>
                  <br>
                  1) There is a problem and it's reproducible across
                  multiple hosts<br>
                  2) It affects RHEL8 &amp; RHEL9 guests but not FreeBSD
                  guests<br>
                  3) It is not specific to any host or guest storage
                  option<br>
                  <br>
                  Thanks,</p>
                <p>-Matthew<br>
                </p>
              </div>
            </div>
          </blockquote>
        </div>
        <br>
      </div>
    </blockquote>
  </body>
</html>

--------------6Zic9841NtG2gfiqcz3dfntV--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?b353b39a-56d3-4757-a607-3c612944b509>