Date: Fri, 16 Feb 2024 11:19:40 -0600 From: Matthew Grooms <mgrooms@shrew.net> To: FreeBSD virtualization <freebsd-virtualization@freebsd.org> Subject: bhyve disk performance issue Message-ID: <6a128904-a4c1-41ec-a83d-56da56871ceb@shrew.net>
next in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format. --------------1LSnpPp4X6zx2rrYZP7xUqyR Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Hi All, I'm in the middle of a project that involves building out a handful of servers to host virtual Linux instances. Part of that includes testing bhyve to see how it performs. The intent is to compare host storage options such as raw vs zvol block devices and ufs vs zfs disk images using hardware raid vs zfs managed disks. It would also involve testing different guest options such as nvme vs virtio block storage. Unfortunately I hit a road block due to a performance issue that I can't explain and would like to bounce it off the list. Here are the hardware specs for the systems ... Intel Xeon 6338 CPU ( 32c/64t ) 256G 2400 ECC RAM 16x 4TB Samsung SATA3 SSDs Avago 9361-16i ( mrsas - HW RAID10 ) Avago 9305-16i ( mpr - zpool RAID10 ) I started by performing some bonnie++ benchmarks on the host system running AlmaLinux 9.3 and FreeBSD 14 to get a baseline using HW RAID10. The disk controllers are 8x PCIe v3 but that should be adequate considering the 6gbit disk interfaces ... RHEL9 + EXT4 ---------------------------------------------------------------------------------------------------- Version 2.00 ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Name:Size etc /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP localhost.loca 502G 2224k 99 2.4g 96 967m 33 3929k 93 1.6g 33 +++++ +++ Latency 4403us 30844us 69444us 27015us 22675us 8754us Version 2.00 ------Sequential Create------ --------Random Create-------- localhost.localdoma -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ Latency 118us 108us 829us 101us 6us 393us FreeBSD14 + UFS ---------------------------------------------------------------------------------------------------- Version 1.98 ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Name:Size etc /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP test.shrew. 523440M 759k 99 2.0g 99 1.1g 61 1945k 99 1.3g 42 264.8 99 Latency 11106us 31930us 423ms 4824us 321ms 12881us Version 1.98 ------Sequential Create------ --------Random Create-------- test.shrew.lab -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 13095.927203 20 +++++ +++ 25358.227072 24 13573.129095 19 +++++ +++ 25354.222712 23 Latency 4382us 13us 99us 3125us 5us 67us Good enough. The next thing I tried was running the same benchmark in a RHEL9 guest to test the different storage config options, but that's when I started to encounter difficulty repeating tests that produced consistent results. At first the results appeared somewhat random but, after a few days of trial and error, I started to identify a pattern. The guest would sometimes perform well for a while, usually after a restart, and then hit a sharp drop off in performance over time. For example: Version 2.00 ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Name:Size etc /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP linux-blk 63640M 694k 99 1.6g 99 737m 76 985k 99 1.3g 69 +++++ +++ Latency 11579us 535us 11889us 8597us 21819us 8238us Version 2.00 ------Sequential Create------ --------Random Create-------- linux-blk -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ Latency 7620us 126us 1648us 151us 15us 633us --------------------------------- speed drop --------------------------------- Version 2.00 ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Name:Size etc /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP linux-blk 63640M 676k 99 451m 99 314m 93 951k 99 402m 99 15167 530 Latency 11902us 8959us 24711us 10185us 20884us 5831us Version 2.00 ------Sequential Create------ --------Random Create-------- linux-blk -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 0 96 +++++ +++ +++++ +++ 0 96 +++++ +++ 0 75 Latency 343us 165us 1636us 113us 55us 1836us The above test ran 6 times for roughly 20 mins producing higher speed results before slowing to the lower speed result. The time to complete the benchmark also increased from a about 2.5 mins to about 8 minutes. To ensure I didn't miss something in my baseline, I repeated that benchmark on the host system in a loop for about an hour but the output was consistent with my original testing. To ensure performance didn't bounce back after it slowed, I repeated the benchmark in a loop on the guest for about 4 hours but the output was also consistent. I then tried switching between block device, img on ufs, img on zfs dataset and zvols as well as switching between virtio block and nvme in the guest. All of these options appeared to suffer from the same problem albeit with slightly different performance numbers. I also tried swapping out the storage controller and running some benchmarks using a zpool over the disk array to see if that was any better. Same issue. I also tried pinning the guest CPU to specific cores ( -p x:y ). No improvement. Here is a list of a few other things I'd like to try: 1) Wiring guest memory ( unlikely as it's 32G of 256G ) 2) Downgrading the host to 13.2-RELEASE 3) Test a different guest OSs other than RHEL8 & RHEL9 4) Test a different model of RAID/SAS controller 5) Test xen vs bhyve disk performance At this point I thought it prudent to post here for some help. Does anyone have an idea of what might cause this issue? Does anyone have experience testing bhyve with an SSD disk array of this size or larger? I'm happy to provide more data points on request. Thanks, -Matthew --------------1LSnpPp4X6zx2rrYZP7xUqyR Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit <!DOCTYPE html> <html> <head> <meta http-equiv="content-type" content="text/html; charset=UTF-8"> </head> <body> <p>Hi All,</p> <p>I'm in the middle of a project that involves building out a handful of servers to host virtual Linux instances. Part of that includes testing bhyve to see how it performs. The intent is to compare host storage options such as raw vs zvol block devices and ufs vs zfs disk images using hardware raid vs zfs managed disks. It would also involve testing different guest options such as nvme vs virtio block storage. Unfortunately I hit a road block due to a performance issue that I can't explain and would like to bounce it off the list. Here are the hardware specs for the systems ...<br> <br> Intel Xeon 6338 CPU ( 32c/64t )<br> 256G 2400 ECC RAM<br> 16x 4TB Samsung SATA3 SSDs<br> Avago 9361-16i ( mrsas - HW RAID10 )<br> Avago 9305-16i ( mpr - zpool RAID10 )<br> <br> I started by performing some bonnie++ benchmarks on the host system running AlmaLinux 9.3 and FreeBSD 14 to get a baseline using HW RAID10. The disk controllers are 8x PCIe v3 but that should be adequate considering the 6gbit disk interfaces ...<br> <br> <font face="monospace"><font size="2">RHEL9 + EXT4<br> ----------------------------------------------------------------------------------------------------<br> Version 2.00 ------Sequential Output------ --Sequential Input- --Random-<br> -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--<br> Name:Size etc /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP<br> localhost.loca 502G 2224k 99 2.4g 96 967m 33 3929k 93 1.6g 33 +++++ +++<br> Latency 4403us 30844us 69444us 27015us 22675us 8754us<br> Version 2.00 ------Sequential Create------ --------Random Create--------<br> localhost.localdoma -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--<br> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP<br> 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++<br> Latency 118us 108us 829us 101us 6us 393us<br> <br> FreeBSD14 + UFS<br> ----------------------------------------------------------------------------------------------------<br> Version 1.98 ------Sequential Output------ --Sequential Input- --Random-<br> -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--<br> Name:Size etc /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP<br> test.shrew. 523440M 759k 99 2.0g 99 1.1g 61 1945k 99 1.3g 42 264.8 99<br> Latency 11106us 31930us 423ms 4824us 321ms 12881us<br> Version 1.98 ------Sequential Create------ --------Random Create--------<br> test.shrew.lab -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--<br> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP<br> 16 13095.927203 20 +++++ +++ 25358.227072 24 13573.129095 19 +++++ +++ 25354.222712 23<br> Latency 4382us 13us 99us 3125us 5us 67us</font><br> </font><br> Good enough. The next thing I tried was running the same benchmark in a RHEL9 guest to test the different storage config options, but that's when I started to encounter difficulty repeating tests that produced consistent results. At first the results appeared somewhat random but, after a few days of trial and error, I started to identify a pattern. The guest would sometimes perform well for a while, usually after a restart, and then hit a sharp drop off in performance over time. For example:<br> <br> <font face="monospace"><font size="2">Version 2.00 ------Sequential Output------ --Sequential Input- --Random-<br> -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--<br> Name:Size etc /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP<br> linux-blk 63640M 694k 99 1.6g 99 737m 76 985k 99 1.3g 69 +++++ +++<br> Latency 11579us 535us 11889us 8597us 21819us 8238us<br> Version 2.00 ------Sequential Create------ --------Random Create--------<br> linux-blk -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--<br> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP<br> 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++<br> Latency 7620us 126us 1648us 151us 15us 633us<br> <br> --------------------------------- speed drop ---------------------------------<br> <br> Version 2.00 ------Sequential Output------ --Sequential Input- --Random-<br> -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--<br> Name:Size etc /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP<br> linux-blk 63640M 676k 99 451m 99 314m 93 951k 99 402m 99 15167 530<br> Latency 11902us 8959us 24711us 10185us 20884us 5831us<br> Version 2.00 ------Sequential Create------ --------Random Create--------<br> linux-blk -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--<br> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP<br> 16 0 96 +++++ +++ +++++ +++ 0 96 +++++ +++ 0 75<br> Latency 343us 165us 1636us 113us 55us 1836us</font><br> </font><br> The above test ran 6 times for roughly 20 mins producing higher speed results before slowing to the lower speed result. The time to complete the benchmark also increased from a about 2.5 mins to about 8 minutes. To ensure I didn't miss something in my baseline, I repeated that benchmark on the host system in a loop for about an hour but the output was consistent with my original testing. To ensure performance didn't bounce back after it slowed, I repeated the benchmark in a loop on the guest for about 4 hours but the output was also consistent. I then tried switching between block device, img on ufs, img on zfs dataset and zvols as well as switching between virtio block and nvme in the guest. All of these options appeared to suffer from the same problem albeit with slightly different performance numbers. I also tried swapping out the storage controller and running some benchmarks using a zpool over the disk array to see if that was any better. Same issue. I also tried pinning the guest CPU to specific cores ( -p x:y ). No improvement.<br> <br> Here is a list of a few other things I'd like to try:<br> <br> 1) Wiring guest memory ( unlikely as it's 32G of 256G )<br> 2) Downgrading the host to 13.2-RELEASE<br> 3) Test a different guest OSs other than RHEL8 & RHEL9<br> 4) Test a different model of RAID/SAS controller<br> 5) Test xen vs bhyve disk performance<br> <br> At this point I thought it prudent to post here for some help. Does anyone have an idea of what might cause this issue? Does anyone have experience testing bhyve with an SSD disk array of this size or larger? I'm happy to provide more data points on request.<br> <br> Thanks,</p> <p>-Matthew<br> </p> </body> </html> --------------1LSnpPp4X6zx2rrYZP7xUqyR--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6a128904-a4c1-41ec-a83d-56da56871ceb>