From owner-freebsd-virtualization@freebsd.org Sat Oct 27 18:03:46 2018 Return-Path: Delivered-To: freebsd-virtualization@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A8D5D10CEF21 for ; Sat, 27 Oct 2018 18:03:46 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mx0.gentlemail.de (mx0.gentlemail.de [IPv6:2a00:e10:2800::a130]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 2B3F1752C1 for ; Sat, 27 Oct 2018 18:03:45 +0000 (UTC) (envelope-from freebsd@omnilan.de) Received: from mh0.gentlemail.de (mh0.gentlemail.de [78.138.80.135]) by mx0.gentlemail.de (8.14.5/8.14.5) with ESMTP id w9RI3hk3043106 for ; Sat, 27 Oct 2018 20:03:43 +0200 (CEST) (envelope-from freebsd@omnilan.de) Received: from titan.inop.mo1.omnilan.net (s1.omnilan.de [217.91.127.234]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mh0.gentlemail.de (Postfix) with ESMTPSA id 76144636 for ; Sat, 27 Oct 2018 20:03:43 +0200 (CEST) Subject: Re: bhyve win-guest benchmark comparing From: Harry Schmalzbauer To: freebsd-virtualization@freebsd.org References: <9e7f4c01-6cd1-4045-1a5b-69c804b3881b@omnilan.de> Organization: OmniLAN Message-ID: Date: Sat, 27 Oct 2018 20:03:42 +0200 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <9e7f4c01-6cd1-4045-1a5b-69c804b3881b@omnilan.de> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Greylist: ACL 130 matched, not delayed by milter-greylist-4.2.7 (mx0.gentlemail.de [78.138.80.130]); Sat, 27 Oct 2018 20:03:43 +0200 (CEST) X-Milter: Spamilter (Reciever: mx0.gentlemail.de; Sender-ip: 78.138.80.135; Sender-helo: mh0.gentlemail.de; ) X-BeenThere: freebsd-virtualization@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Discussion of various virtualization techniques FreeBSD supports." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 27 Oct 2018 18:03:46 -0000 Am 22.10.2018 um 13:26 schrieb Harry Schmalzbauer: … > Test-Runs: > Each hypervisor had only the one bench-guest running, no other > tasks/guests were running besides system's native standard processes. > Since the time between powering up the guest and finishing logon > differed notably (~5s vs. ~20s) from one host to the other, I did a > quick synthetic IO-Test beforehand. > I'm using IOmeter since heise.de published a great test pattern called > IOmix – about 18 years ago I guess.  This access pattern has always > perfectly reflected the system performance for human computer usage > with non-caculation-centric applications, and still is my favourite, > despite throughput and latency changed by some orders of manitudes > during the last decade (and I had defined something for "fio" which > mimics IOmix and shows reasonable relational results; but I'm still > prefering IOmeter for homogenous IO benchmarking). > > The results is about factor 7 :-( > ~3800iops&69MB/s (CPU-guest-usage 42%IOmeter+12%irq) >                 vs. > ~29000iops&530MB/s (CPU-guest-usage 11%IOmeter+19%irq) > > >     [with debug kernel and debug-malloc, numbers are 3000iops&56MB/s, >      virtio-blk instead of ahci,hd: results in 5660iops&104MB/s with > non-debug kernel >      – much better, but even higher CPU load and still factor 4 slower] > > What I don't understand is, why the IOmeter process differs that much > in CPU utilization!?!  It's the same binary on the same OS (guest) > with the same OS-driver and the same underlying hardware – "just" the > AHCI emulation and the vmm differ... > > Unfortunately, the picture for virtio-net vs. vmxnet3 is similar sad. > Copying a single 5GB file from CIFS share to DB-ssd results in 100% > guest-CPU usage, where 40% are irqs and the throughput max out at > ~40MB/s. > When copying the same file from the same source with the same guest on > the same host but host booted ESXi, there's 20% guest-CPU usage while > transfering 111MB/s – the uplink GbE limit. > > These synthetic benchmark very well explain the "feelable" difference > when using a guest between the two hypervisors, but … To add an additional and rather surprinsing result, at least for me: Virtualbox provides 'VBoxManage internalcommands createrawvmdk -filename "testbench_da0.vmdk" -rawdisk /dev/da0' So I could use the exactly same test setup as for ESXi and bhyve. FreeBSD-Virtualbox (running on the same host installation like bhyve) performed quiet well, although it doesn't survive IOmix benchmark run when the "testbench_da0.vmdk" (the "raw" SSD-R0-array) is hooked up to the SATA controller. But connected to the emulated SAS controller(LSI1068), it runs without problems and results in 9600iops@185MB/s with 1%IOmeter+7%irq CPU utilization (yes, 1% vs. 42% for IOmeter load). Still far away from what ESXi provides, but almost double performance of virtio-blk with bhyve, and most important, much less load (host and guest show exactly the same low values as opposed to the very high loads which are shown on host and guest with bhyve:virtio-blk). The HDtune random access benchmark also shows the factor 2, linear over all block sizes. Virtualbox's virtio-net setup gives ~100MB/s with peaks at 111 and ~40% CPU load. Guest uses the same driver like with bhyve:virtio-blk, while backend of virtualbox:virtio-net is vboxnetflt utilizing netgraph and vboxnetadp.ko vs. tap(4). So not only the IO efficiency (lower throughput but also much lower CPU utilization) is remarbably better, but also the network performance.  Even low-bandwidth RDP sessions via GbE-LAN suffer from micro hangs under bhyve and virtio-net.  And 40MB/s transfers cause 100% CPU load on bhyve – both runs had exactly the same WIndows virtio-net driver in use (RedHat 141). Conclusion: Virtualbox vs. ESXi shows a 0.5% efficiency factor, while bhyve vs. ESXi shows 0.25% overall efficiency factor. I tried to provide a test environment with shortest hardware paths possible.  At least the benchmarks were run 100% reproducable with the same binaries. So I'm really interested if … > Are these (emulation(only?) related, I guess) performace issues well > known?  I mean, does somebody know what needs to be done in what area, > in order to catch up with the other results? So it's just a matter of > time/resources? > Or are these results surprising and extensive analysis must be done > before anybody can tell how to fix the IO limitations? > > Is the root cause for the problematic low virtio-net throughput > probably the same as for the disk IO limits?  Both really hurt in my > use case and the host is not idling in relation, but even showing > higher load with lower results.  So even if the lower > user-experience-performance would be considered as toleratable, the > guests/host ratio was only half dense. Thanks, -harry