From nobody Fri Feb 16 17:19:40 2024 X-Original-To: freebsd-virtualization@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4TbzFg37hFz53b3x for ; Fri, 16 Feb 2024 17:19:51 +0000 (UTC) (envelope-from mgrooms@shrew.net) Received: from mx2.shrew.net (mx2.shrew.net [204.27.62.58]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4TbzFd6nB1z52vq for ; Fri, 16 Feb 2024 17:19:49 +0000 (UTC) (envelope-from mgrooms@shrew.net) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=shrew.net header.s=default header.b=UfxduSPy; spf=pass (mx1.freebsd.org: domain of mgrooms@shrew.net designates 204.27.62.58 as permitted sender) smtp.mailfrom=mgrooms@shrew.net; dmarc=none Received: from mail.shrew.net (mail1.shrew.prv [10.26.2.18]) by mx2.shrew.net (8.17.1/8.17.1) with ESMTP id 41GHJgjl035859 for ; Fri, 16 Feb 2024 11:19:42 -0600 (CST) (envelope-from mgrooms@shrew.net) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shrew.net; s=default; t=1708103982; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type; bh=8hxDWyBHvUACXcCSNXQf5Ue7XndQs1pno8K0b8SuGFQ=; b=UfxduSPy4NReXzOok1dPqxRB/gdead/gMz8jo5XzlfNLoi9RACAUqIOrXe6KxMWuTFasY0 9yBW3JsJlnnoXIwBKgSGUKbhME0rPqq15MwIPZzSSCBS3EyE1LApovgUhDjb07Qq3NpSRK rOUBPzNMZFeqTjjcPAFOUHZ8HwgJ2RbnWbV24vdBCT2AzJ/s1CxcNxxBMIuViAaly6bOvw U1cB3sxqPlqoKJ1tczxirwEx086DPHZX2Np+2QBB00eY10otaW3I0IpqKxiw+65jSd3LsA UjA1GuAJ+I9Z4hEh9g0lcdktS/c/gAUe8jXCPgB1b1FCKE4BkM0DoK6RNocNqQ== Received: from [10.22.200.32] (unknown [136.62.156.42]) by mail.shrew.net (Postfix) with ESMTPSA id 6E2FF3B587 for ; Fri, 16 Feb 2024 11:19:42 -0600 (CST) Content-Type: multipart/alternative; boundary="------------1LSnpPp4X6zx2rrYZP7xUqyR" Message-ID: <6a128904-a4c1-41ec-a83d-56da56871ceb@shrew.net> Date: Fri, 16 Feb 2024 11:19:40 -0600 List-Id: Discussion List-Archive: https://lists.freebsd.org/archives/freebsd-virtualization List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-virtualization@freebsd.org X-BeenThere: freebsd-virtualization@freebsd.org MIME-Version: 1.0 User-Agent: Mozilla Thunderbird To: FreeBSD virtualization Content-Language: en-US From: Matthew Grooms Subject: bhyve disk performance issue X-Rspamd-Queue-Id: 4TbzFd6nB1z52vq X-Spamd-Bar: --- X-Spamd-Result: default: False [-3.42 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.93)[-0.931]; R_SPF_ALLOW(-0.20)[+mx]; R_DKIM_ALLOW(-0.20)[shrew.net:s=default]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; XM_UA_NO_VERSION(0.01)[]; RCPT_COUNT_ONE(0.00)[1]; MIME_TRACE(0.00)[0:+,1:+,2:~]; RCVD_TLS_LAST(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; ASN(0.00)[asn:19969, ipnet:204.27.56.0/21, country:US]; FROM_HAS_DN(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; ARC_NA(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; FROM_EQ_ENVFROM(0.00)[]; DMARC_NA(0.00)[shrew.net]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MLMMJ_DEST(0.00)[freebsd-virtualization@freebsd.org]; PREVIOUSLY_DELIVERED(0.00)[freebsd-virtualization@freebsd.org]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[shrew.net:+] This is a multi-part message in MIME format. --------------1LSnpPp4X6zx2rrYZP7xUqyR Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Hi All, I'm in the middle of a project that involves building out a handful of servers to host virtual Linux instances. Part of that includes testing bhyve to see how it performs. The intent is to compare host storage options such as raw vs zvol block devices and ufs vs zfs disk images using hardware raid vs zfs managed disks. It would also involve testing different guest options such as nvme vs virtio block storage. Unfortunately I hit a road block due to a performance issue that I can't explain and would like to bounce it off the list. Here are the hardware specs for the systems ... Intel Xeon 6338 CPU ( 32c/64t ) 256G 2400 ECC RAM 16x 4TB Samsung SATA3 SSDs Avago 9361-16i ( mrsas - HW RAID10 ) Avago 9305-16i ( mpr - zpool RAID10 ) I started by performing some bonnie++ benchmarks on the host system running AlmaLinux 9.3 and FreeBSD 14 to get a baseline using HW RAID10. The disk controllers are 8x PCIe v3 but that should be adequate considering the 6gbit disk interfaces ... RHEL9 + EXT4 ---------------------------------------------------------------------------------------------------- Version  2.00       ------Sequential Output------ --Sequential Input- --Random-                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec %CP  /sec %CP localhost.loca 502G 2224k  99  2.4g  96  967m  33 3929k  93 1.6g  33 +++++ +++ Latency              4403us   30844us   69444us   27015us 22675us    8754us Version  2.00       ------Sequential Create------ --------Random Create-------- localhost.localdoma -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec %CP  /sec %CP                  16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ Latency               118us     108us     829us 101us       6us     393us FreeBSD14 + UFS ---------------------------------------------------------------------------------------------------- Version  1.98       ------Sequential Output------ --Sequential Input- --Random-                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec %CP  /sec %CP test.shrew. 523440M  759k  99  2.0g  99  1.1g  61 1945k  99 1.3g  42 264.8  99 Latency             11106us   31930us     423ms    4824us 321ms   12881us Version  1.98       ------Sequential Create------ --------Random Create-------- test.shrew.lab      -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec %CP  /sec %CP                  16 13095.927203  20 +++++ +++ 25358.227072 24 13573.129095  19 +++++ +++ 25354.222712  23 Latency              4382us      13us      99us 3125us       5us      67us Good enough. The next thing I tried was running the same benchmark in a RHEL9 guest to test the different storage config options, but that's when I started to encounter difficulty repeating tests that produced consistent results. At first the results appeared somewhat random but, after a few days of trial and error, I started to identify a pattern. The guest would sometimes perform well for a while, usually after a restart, and then hit a sharp drop off in performance over time. For example: Version  2.00 ------Sequential Output------ --Sequential Input- --Random-                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec %CP  /sec %CP linux-blk    63640M  694k  99  1.6g  99  737m  76  985k  99 1.3g  69 +++++ +++ Latency             11579us     535us   11889us    8597us 21819us    8238us Version  2.00       ------Sequential Create------ --------Random Create-------- linux-blk           -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec %CP  /sec %CP                  16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ Latency              7620us     126us    1648us     151us 15us     633us --------------------------------- speed drop --------------------------------- Version  2.00       ------Sequential Output------ --Sequential Input- --Random-                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec %CP  /sec %CP linux-blk    63640M  676k  99  451m  99  314m  93  951k  99 402m  99 15167 530 Latency             11902us    8959us   24711us   10185us 20884us    5831us Version  2.00       ------Sequential Create------ --------Random Create-------- linux-blk           -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec %CP  /sec %CP                  16     0  96 +++++ +++ +++++ +++     0  96 +++++ +++     0  75 Latency               343us     165us    1636us     113us 55us    1836us The above test ran 6 times for roughly 20 mins producing higher speed results before slowing to the lower speed result. The time to complete the benchmark also increased from a about 2.5 mins to about 8 minutes. To ensure I didn't miss something in my baseline, I repeated that benchmark on the host system in a loop for about an hour but the output was consistent with my original testing. To ensure performance didn't bounce back after it slowed, I repeated the benchmark in a loop on the guest for about 4 hours but the output was also consistent. I then tried switching between block device, img on ufs, img on zfs dataset and zvols as well as switching between virtio block and nvme in the guest. All of these options appeared to suffer from the same problem albeit with slightly different performance numbers. I also tried swapping out the storage controller and running some benchmarks using a zpool over the disk array to see if that was any better. Same issue. I also tried pinning the guest CPU to specific cores ( -p x:y ). No improvement. Here is a list of a few other things I'd like to try: 1) Wiring guest memory ( unlikely as it's 32G of 256G ) 2) Downgrading the host to 13.2-RELEASE 3) Test a different guest OSs other than RHEL8 & RHEL9 4) Test a different model of RAID/SAS controller 5) Test xen vs bhyve disk performance At this point I thought it prudent to post here for some help. Does anyone have an idea of what might cause this issue? Does anyone have experience testing bhyve with an SSD disk array of this size or larger? I'm happy to provide more data points on request. Thanks, -Matthew --------------1LSnpPp4X6zx2rrYZP7xUqyR Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit

Hi All,

I'm in the middle of a project that involves building out a handful of servers to host virtual Linux instances. Part of that includes testing bhyve to see how it performs. The intent is to compare host storage options such as raw vs zvol block devices and ufs vs zfs disk images using hardware raid vs zfs managed disks. It would also involve testing different guest options such as nvme vs virtio block storage. Unfortunately I hit a road block due to a performance issue that I can't explain and would like to bounce it off the list. Here are the hardware specs for the systems ...

Intel Xeon 6338 CPU ( 32c/64t )
256G 2400 ECC RAM
16x 4TB Samsung SATA3 SSDs
Avago 9361-16i ( mrsas - HW RAID10 )
Avago 9305-16i ( mpr - zpool RAID10 )

I started by performing some bonnie++ benchmarks on the host system running AlmaLinux 9.3 and FreeBSD 14 to get a baseline using HW RAID10. The disk controllers are 8x PCIe v3 but that should be adequate considering the 6gbit disk interfaces ...

RHEL9 + EXT4
----------------------------------------------------------------------------------------------------
Version  2.00       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
localhost.loca 502G 2224k  99  2.4g  96  967m  33 3929k  93  1.6g  33 +++++ +++
Latency              4403us   30844us   69444us   27015us   22675us    8754us
Version  2.00       ------Sequential Create------ --------Random Create--------
localhost.localdoma -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
Latency               118us     108us     829us     101us       6us     393us

FreeBSD14 + UFS
----------------------------------------------------------------------------------------------------
Version  1.98       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
test.shrew. 523440M  759k  99  2.0g  99  1.1g  61 1945k  99  1.3g  42 264.8  99
Latency             11106us   31930us     423ms    4824us     321ms   12881us
Version  1.98       ------Sequential Create------ --------Random Create--------
test.shrew.lab      -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 13095.927203  20 +++++ +++ 25358.227072  24 13573.129095  19 +++++ +++ 25354.222712  23
Latency              4382us      13us      99us    3125us       5us      67us


Good enough. The next thing I tried was running the same benchmark in a RHEL9 guest to test the different storage config options, but that's when I started to encounter difficulty repeating tests that produced consistent results. At first the results appeared somewhat random but, after a few days of trial and error, I started to identify a pattern. The guest would sometimes perform well for a while, usually after a restart, and then hit a sharp drop off in performance over time. For example:

Version  2.00       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
linux-blk    63640M  694k  99  1.6g  99  737m  76  985k  99  1.3g  69 +++++ +++
Latency             11579us     535us   11889us    8597us   21819us    8238us
Version  2.00       ------Sequential Create------ --------Random Create--------
linux-blk           -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
Latency              7620us     126us    1648us     151us      15us     633us

--------------------------------- speed drop ---------------------------------

Version  2.00       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
linux-blk    63640M  676k  99  451m  99  314m  93  951k  99  402m  99 15167 530
Latency             11902us    8959us   24711us   10185us   20884us    5831us
Version  2.00       ------Sequential Create------ --------Random Create--------
linux-blk           -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16     0  96 +++++ +++ +++++ +++     0  96 +++++ +++     0  75
Latency               343us     165us    1636us     113us      55us    1836us


The above test ran 6 times for roughly 20 mins producing higher speed results before slowing to the lower speed result. The time to complete the benchmark also increased from a about 2.5 mins to about 8 minutes. To ensure I didn't miss something in my baseline, I repeated that benchmark on the host system in a loop for about an hour but the output was consistent with my original testing. To ensure performance didn't bounce back after it slowed, I repeated the benchmark in a loop on the guest for about 4 hours but the output was also consistent. I then tried switching between block device, img on ufs, img on zfs dataset and zvols as well as switching between virtio block and nvme in the guest. All of these options appeared to suffer from the same problem albeit with slightly different performance numbers. I also tried swapping out the storage controller and running some benchmarks using a zpool over the disk array to see if that was any better. Same issue. I also tried pinning the guest CPU to specific cores ( -p x:y ). No improvement.

Here is a list of a few other things I'd like to try:

1) Wiring guest memory ( unlikely as it's 32G of 256G )
2) Downgrading the host to 13.2-RELEASE
3) Test a different guest OSs other than RHEL8 & RHEL9
4) Test a different model of RAID/SAS controller
5) Test xen vs bhyve disk performance

At this point I thought it prudent to post here for some help. Does anyone have an idea of what might cause this issue? Does anyone have experience testing bhyve with an SSD disk array of this size or larger? I'm happy to provide more data points on request.

Thanks,

-Matthew

--------------1LSnpPp4X6zx2rrYZP7xUqyR--