Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 16 Feb 2024 11:19:40 -0600
From:      Matthew Grooms <mgrooms@shrew.net>
To:        FreeBSD virtualization <freebsd-virtualization@freebsd.org>
Subject:   bhyve disk performance issue
Message-ID:  <6a128904-a4c1-41ec-a83d-56da56871ceb@shrew.net>

next in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.
--------------1LSnpPp4X6zx2rrYZP7xUqyR
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit

Hi All,

I'm in the middle of a project that involves building out a handful of 
servers to host virtual Linux instances. Part of that includes testing 
bhyve to see how it performs. The intent is to compare host storage 
options such as raw vs zvol block devices and ufs vs zfs disk images 
using hardware raid vs zfs managed disks. It would also involve testing 
different guest options such as nvme vs virtio block storage. 
Unfortunately I hit a road block due to a performance issue that I can't 
explain and would like to bounce it off the list. Here are the hardware 
specs for the systems ...

Intel Xeon 6338 CPU ( 32c/64t )
256G 2400 ECC RAM
16x 4TB Samsung SATA3 SSDs
Avago 9361-16i ( mrsas - HW RAID10 )
Avago 9305-16i ( mpr - zpool RAID10 )

I started by performing some bonnie++ benchmarks on the host system 
running AlmaLinux 9.3 and FreeBSD 14 to get a baseline using HW RAID10. 
The disk controllers are 8x PCIe v3 but that should be adequate 
considering the 6gbit disk interfaces ...

RHEL9 + EXT4
----------------------------------------------------------------------------------------------------
Version  2.00       ------Sequential Output------ --Sequential Input- 
--Random-
                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
--Seeks--
Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec %CP  
/sec %CP
localhost.loca 502G 2224k  99  2.4g  96  967m  33 3929k  93 1.6g  33 
+++++ +++
Latency              4403us   30844us   69444us   27015us 22675us    8754us
Version  2.00       ------Sequential Create------ --------Random 
Create--------
localhost.localdoma -Create-- --Read--- -Delete-- -Create-- --Read--- 
-Delete--
               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec %CP  
/sec %CP
                  16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ 
+++++ +++
Latency               118us     108us     829us 101us       6us     393us

FreeBSD14 + UFS
----------------------------------------------------------------------------------------------------
Version  1.98       ------Sequential Output------ --Sequential Input- 
--Random-
                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
--Seeks--
Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec %CP  
/sec %CP
test.shrew. 523440M  759k  99  2.0g  99  1.1g  61 1945k  99 1.3g  42 
264.8  99
Latency             11106us   31930us     423ms    4824us 321ms   12881us
Version  1.98       ------Sequential Create------ --------Random 
Create--------
test.shrew.lab      -Create-- --Read--- -Delete-- -Create-- --Read--- 
-Delete--
               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec %CP  
/sec %CP
                  16 13095.927203  20 +++++ +++ 25358.227072 24 
13573.129095  19 +++++ +++ 25354.222712  23
Latency              4382us      13us      99us 3125us       5us      67us

Good enough. The next thing I tried was running the same benchmark in a 
RHEL9 guest to test the different storage config options, but that's 
when I started to encounter difficulty repeating tests that produced 
consistent results. At first the results appeared somewhat random but, 
after a few days of trial and error, I started to identify a pattern. 
The guest would sometimes perform well for a while, usually after a 
restart, and then hit a sharp drop off in performance over time. For 
example:

Version  2.00 ------Sequential Output------ --Sequential Input- --Random-
                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
--Seeks--
Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec %CP  
/sec %CP
linux-blk    63640M  694k  99  1.6g  99  737m  76  985k  99 1.3g  69 
+++++ +++
Latency             11579us     535us   11889us    8597us 21819us    8238us
Version  2.00       ------Sequential Create------ --------Random 
Create--------
linux-blk           -Create-- --Read--- -Delete-- -Create-- --Read--- 
-Delete--
               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec %CP  
/sec %CP
                  16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ 
+++++ +++
Latency              7620us     126us    1648us     151us 15us     633us

--------------------------------- speed drop 
---------------------------------

Version  2.00       ------Sequential Output------ --Sequential Input- 
--Random-
                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
--Seeks--
Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec %CP  
/sec %CP
linux-blk    63640M  676k  99  451m  99  314m  93  951k  99 402m  99 
15167 530
Latency             11902us    8959us   24711us   10185us 20884us    5831us
Version  2.00       ------Sequential Create------ --------Random 
Create--------
linux-blk           -Create-- --Read--- -Delete-- -Create-- --Read--- 
-Delete--
               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec %CP  
/sec %CP
                  16     0  96 +++++ +++ +++++ +++     0  96 +++++ 
+++     0  75
Latency               343us     165us    1636us     113us 55us    1836us

The above test ran 6 times for roughly 20 mins producing higher speed 
results before slowing to the lower speed result. The time to complete 
the benchmark also increased from a about 2.5 mins to about 8 minutes. 
To ensure I didn't miss something in my baseline, I repeated that 
benchmark on the host system in a loop for about an hour but the output 
was consistent with my original testing. To ensure performance didn't 
bounce back after it slowed, I repeated the benchmark in a loop on the 
guest for about 4 hours but the output was also consistent. I then tried 
switching between block device, img on ufs, img on zfs dataset and zvols 
as well as switching between virtio block and nvme in the guest. All of 
these options appeared to suffer from the same problem albeit with 
slightly different performance numbers. I also tried swapping out the 
storage controller and running some benchmarks using a zpool over the 
disk array to see if that was any better. Same issue. I also tried 
pinning the guest CPU to specific cores ( -p x:y ). No improvement.

Here is a list of a few other things I'd like to try:

1) Wiring guest memory ( unlikely as it's 32G of 256G )
2) Downgrading the host to 13.2-RELEASE
3) Test a different guest OSs other than RHEL8 & RHEL9
4) Test a different model of RAID/SAS controller
5) Test xen vs bhyve disk performance

At this point I thought it prudent to post here for some help. Does 
anyone have an idea of what might cause this issue? Does anyone have 
experience testing bhyve with an SSD disk array of this size or larger? 
I'm happy to provide more data points on request.

Thanks,

-Matthew

--------------1LSnpPp4X6zx2rrYZP7xUqyR
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: 8bit

<!DOCTYPE html>
<html>
  <head>

    <meta http-equiv="content-type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <p>Hi All,</p>
    <p>I'm in the middle of a project that involves building out a
      handful of servers to host virtual Linux instances. Part of that
      includes testing bhyve to see how it performs. The intent is to
      compare host storage options such as raw vs zvol block devices and
      ufs vs zfs disk images using hardware raid vs zfs managed disks.
      It would also involve testing different guest options such as nvme
      vs virtio block storage. Unfortunately I hit a road block due to a
      performance issue that I can't explain and would like to bounce it
      off the list. Here are the hardware specs for the systems ...<br>
      <br>
      Intel Xeon 6338 CPU ( 32c/64t )<br>
      256G 2400 ECC RAM<br>
      16x 4TB Samsung SATA3 SSDs<br>
      Avago 9361-16i ( mrsas - HW RAID10 )<br>
      Avago 9305-16i ( mpr - zpool RAID10 )<br>
      <br>
      I started by performing some bonnie++ benchmarks on the host
      system running AlmaLinux 9.3 and FreeBSD 14 to get a baseline
      using HW RAID10. The disk controllers are 8x PCIe v3 but that
      should be adequate considering the 6gbit disk interfaces ...<br>
      <br>
      <font face="monospace"><font size="2">RHEL9 + EXT4<br>
----------------------------------------------------------------------------------------------------<br>
          Version  2.00       ------Sequential Output------ --Sequential
          Input- --Random-<br>
                              -Per Chr- --Block-- -Rewrite- -Per Chr-
          --Block-- --Seeks--<br>
          Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP 
          /sec %CP  /sec %CP<br>
          localhost.loca 502G 2224k  99  2.4g  96  967m  33 3929k  93 
          1.6g  33 +++++ +++<br>
          Latency              4403us   30844us   69444us   27015us  
          22675us    8754us<br>
          Version  2.00       ------Sequential Create------
          --------Random Create--------<br>
          localhost.localdoma -Create-- --Read--- -Delete-- -Create--
          --Read--- -Delete--<br>
                        files  /sec %CP  /sec %CP  /sec %CP  /sec %CP 
          /sec %CP  /sec %CP<br>
                           16 +++++ +++ +++++ +++ +++++ +++ +++++ +++
          +++++ +++ +++++ +++<br>
          Latency               118us     108us     829us    
          101us       6us     393us<br>
          <br>
          FreeBSD14 + UFS<br>
----------------------------------------------------------------------------------------------------<br>
          Version  1.98       ------Sequential Output------ --Sequential
          Input- --Random-<br>
                              -Per Chr- --Block-- -Rewrite- -Per Chr-
          --Block-- --Seeks--<br>
          Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP 
          /sec %CP  /sec %CP<br>
          test.shrew. 523440M  759k  99  2.0g  99  1.1g  61 1945k  99 
          1.3g  42 264.8  99<br>
          Latency             11106us   31930us     423ms    4824us    
          321ms   12881us<br>
          Version  1.98       ------Sequential Create------
          --------Random Create--------<br>
          test.shrew.lab      -Create-- --Read--- -Delete-- -Create--
          --Read--- -Delete--<br>
                        files  /sec %CP  /sec %CP  /sec %CP  /sec %CP 
          /sec %CP  /sec %CP<br>
                           16 13095.927203  20 +++++ +++ 25358.227072 
          24 13573.129095  19 +++++ +++ 25354.222712  23<br>
          Latency              4382us      13us      99us   
          3125us       5us      67us</font><br>
      </font><br>
      Good enough. The next thing I tried was running the same benchmark
      in a RHEL9 guest to test the different storage config options, but
      that's when I started to encounter difficulty repeating tests that
      produced consistent results. At first the results appeared
      somewhat random but, after a few days of trial and error, I
      started to identify a pattern. The guest would sometimes perform
      well for a while, usually after a restart, and then hit a sharp
      drop off in performance over time. For example:<br>
      <br>
      <font face="monospace"><font size="2">Version  2.00      
          ------Sequential Output------ --Sequential Input- --Random-<br>
                              -Per Chr- --Block-- -Rewrite- -Per Chr-
          --Block-- --Seeks--<br>
          Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP 
          /sec %CP  /sec %CP<br>
          linux-blk    63640M  694k  99  1.6g  99  737m  76  985k  99 
          1.3g  69 +++++ +++<br>
          Latency             11579us     535us   11889us    8597us  
          21819us    8238us<br>
          Version  2.00       ------Sequential Create------
          --------Random Create--------<br>
          linux-blk           -Create-- --Read--- -Delete-- -Create--
          --Read--- -Delete--<br>
                        files  /sec %CP  /sec %CP  /sec %CP  /sec %CP 
          /sec %CP  /sec %CP<br>
                           16 +++++ +++ +++++ +++ +++++ +++ +++++ +++
          +++++ +++ +++++ +++<br>
          Latency              7620us     126us    1648us     151us     
          15us     633us<br>
          <br>
          --------------------------------- speed drop
          ---------------------------------<br>
          <br>
          Version  2.00       ------Sequential Output------ --Sequential
          Input- --Random-<br>
                              -Per Chr- --Block-- -Rewrite- -Per Chr-
          --Block-- --Seeks--<br>
          Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP 
          /sec %CP  /sec %CP<br>
          linux-blk    63640M  676k  99  451m  99  314m  93  951k  99 
          402m  99 15167 530<br>
          Latency             11902us    8959us   24711us   10185us  
          20884us    5831us<br>
          Version  2.00       ------Sequential Create------
          --------Random Create--------<br>
          linux-blk           -Create-- --Read--- -Delete-- -Create--
          --Read--- -Delete--<br>
                        files  /sec %CP  /sec %CP  /sec %CP  /sec %CP 
          /sec %CP  /sec %CP<br>
                           16     0  96 +++++ +++ +++++ +++     0  96
          +++++ +++     0  75<br>
          Latency               343us     165us    1636us     113us     
          55us    1836us</font><br>
      </font><br>
      The above test ran 6 times for roughly 20 mins producing higher
      speed results before slowing to the lower speed result. The time
      to complete the benchmark also increased from a about 2.5 mins to
      about 8 minutes. To ensure I didn't miss something in my baseline,
      I repeated that benchmark on the host system in a loop for about
      an hour but the output was consistent with my original testing. To
      ensure performance didn't bounce back after it slowed, I repeated
      the benchmark in a loop on the guest for about 4 hours but the
      output was also consistent. I then tried switching between block
      device, img on ufs, img on zfs dataset and zvols as well as
      switching between virtio block and nvme in the guest. All of these
      options appeared to suffer from the same problem albeit with
      slightly different performance numbers. I also tried swapping out
      the storage controller and running some benchmarks using a zpool
      over the disk array to see if that was any better. Same issue. I
      also tried pinning the guest CPU to specific cores ( -p x:y ). No
      improvement.<br>
      <br>
      Here is a list of a few other things I'd like to try:<br>
      <br>
      1) Wiring guest memory ( unlikely as it's 32G of 256G )<br>
      2) Downgrading the host to 13.2-RELEASE<br>
      3) Test a different guest OSs other than RHEL8 &amp; RHEL9<br>
      4) Test a different model of RAID/SAS controller<br>
      5) Test xen vs bhyve disk performance<br>
      <br>
      At this point I thought it prudent to post here for some help.
      Does anyone have an idea of what might cause this issue? Does
      anyone have experience testing bhyve with an SSD disk array of
      this size or larger? I'm happy to provide more data points on
      request.<br>
      <br>
      Thanks,</p>
    <p>-Matthew<br>
    </p>
  </body>
</html>

--------------1LSnpPp4X6zx2rrYZP7xUqyR--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6a128904-a4c1-41ec-a83d-56da56871ceb>