From nobody Fri Feb 23 06:25:39 2024 X-Original-To: virtualization@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Th0Pr0SsZz5Br3q for ; Fri, 23 Feb 2024 06:25:52 +0000 (UTC) (envelope-from jason@tubnor.net) Received: from mail.tubnor.net (mail.tubnor.net [103.236.162.16]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA512) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4Th0Pq1BJPz40L0 for ; Fri, 23 Feb 2024 06:25:50 +0000 (UTC) (envelope-from jason@tubnor.net) Authentication-Results: mx1.freebsd.org; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tubnor.net; s=20220915; t=1708669539; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=64FQt5HOOw01uI5h5b76gxU+o/bhrMQPKX2/r48wb8M=; b=ncgOJ0iTQbTtBOj1rPpj/z0Fgo4kQwzfDWIuJZuZFmlgJZjvT9m+L+cIXePIVN7pHq9dD3 h6i4cVIxMUv65b47wDhe0XtyQ3pcmCDiIjEAjuy4Zz7BPu79mLiNFBclmoCliQ6q0Q9Czx OFZfjzmjaQV2NRbJ9n2hcbBg9yyRqVo= Received: from [IPV6:2403:5812:73e6:1:6023:4380:cf39:62a9] ( [2403:5812:73e6:1:6023:4380:cf39:62a9]) by mel01.ar18.net (OpenSMTPD) with ESMTPSA id a4abba43 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Fri, 23 Feb 2024 17:25:38 +1100 (AEDT) Message-ID: Date: Fri, 23 Feb 2024 17:25:39 +1100 List-Id: Discussion List-Archive: https://lists.freebsd.org/archives/freebsd-virtualization List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-virtualization@freebsd.org X-BeenThere: freebsd-virtualization@freebsd.org MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: bhyve disk performance issue Content-Language: en-US To: Matthew Grooms , virtualization@freebsd.org References: <6a128904-a4c1-41ec-a83d-56da56871ceb@shrew.net> <28ea168c-1211-4104-b8b4-daed0e60950d@app.fastmail.com> <0ff6f30a-b53a-4d0f-ac21-eaf701d35d00@shrew.net> <6f6b71ac-2349-4045-9eaf-5c50d42b89be@shrew.net> <50614ea4-f0f9-44a2-b5e6-ebb33cfffbc4@shrew.net> <6a4e7e1d-cca5-45d4-a268-1805a15d9819@shrew.net> From: Jason Tubnor In-Reply-To: <6a4e7e1d-cca5-45d4-a268-1805a15d9819@shrew.net> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spamd-Bar: ---- X-Rspamd-Queue-Id: 4Th0Pq1BJPz40L0 X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:133159, ipnet:103.236.162.0/23, country:AU] On 23/02/2024 4:51 pm, Matthew Grooms wrote: > On 2/18/24 09:21, Matthew Grooms wrote: >> On 2/17/24 15:53, Matthew Grooms wrote: >>> >>> Unfortunately same story with 13.2. I'm going to try and downgrade >>> to 12.4 and see if it gets any better. >>> >>> ================================================================================ >>> >>> Begin @ Sat Feb 17 11:00:01 CST 2024 >>> >>> Version  2.00       ------Sequential Output------ --Sequential >>> Input- --Random- >>>                     -Per Chr- --Block-- -Rewrite- -Per Chr- >>> --Block-- --Seeks-- >>> Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec >>> %CP  /sec %CP >>> localhost.lo 63640M  690k  99  1.5g  97  727m  78  950k  99 1.3g  68 >>> +++++ +++ >>> Latency             11759us   29114us    8098us    8649us 23413us    >>> 4540us >>> Version  2.00       ------Sequential Create------ --------Random >>> Create-------- >>> localhost.localdoma -Create-- --Read--- -Delete-- -Create-- >>> --Read--- -Delete-- >>>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec >>> %CP  /sec %CP >>>                  16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ >>> +++ +++++ +++ >>> Latency              7791us     131us    1671us     464us 15us    >>> 1811us >>> >>> End @ Sat Feb 17 11:03:13 CST 2024 >>> ================================================================================ >>> >>> Begin @ Sat Feb 17 11:10:01 CST 2024 >>> >>> Version  2.00       ------Sequential Output------ --Sequential >>> Input- --Random- >>>                     -Per Chr- --Block-- -Rewrite- -Per Chr- >>> --Block-- --Seeks-- >>> Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec >>> %CP  /sec %CP >>> localhost.lo 63640M  667k  99  449m  99  313m  94  940k  99 398m  99 >>> 16204 563 >>> Latency             12147us    1079us   24470us    8795us 17853us    >>> 4384us >>> Version  2.00       ------Sequential Create------ --------Random >>> Create-------- >>> localhost.localdoma -Create-- --Read--- -Delete-- -Create-- >>> --Read--- -Delete-- >>>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec >>> %CP  /sec %CP >>>                  16     0  93 +++++ +++ +++++ +++     0  96 +++++ >>> +++ +++++ +++ >>> Latency               118us     159us     269us     164us 54us     >>> 844us >>> >>> End @ Sat Feb 17 11:18:43 CST 2024 >>> >> I wasn't able to get a working 12.4 system in place due to lack of >> packages. However, I did fire up a FreeBSD 14 VM and let it run >> overnight on the same SSD array. It consistently ran at a much higher >> speed for 50+ runs @ 10m intervals ... >> >> ================================================================================ >> >> Begin @ Sun Feb 18 15:00:00 UTC 2024 >> >> Version  1.98       ------Sequential Output------ --Sequential Input- >> --Random- >>                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- >> --Seeks-- >> Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec %CP  >> /sec %CP >> freebsd.shrew.l 64G  628k  99  1.6g  98  831m  60 1278k  99 1.1g  42 >> +++++ +++ >> Latency             13447us   68490us     207ms    7187us 195ms   >> 17665us >> Version  1.98       ------Sequential Create------ --------Random >> Create-------- >> freebsd.shrew.lab   -Create-- --Read--- -Delete-- -Create-- --Read--- >> -Delete-- >>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP /sec %CP  >> /sec %CP >>                  16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ >> +++++ +++ >> Latency             18225us      18us      28us   18812us 18us      25us >> >> End @ Sun Feb 18 15:03:11 UTC 2024 >> >> I used identical options to run both that VM and the Linux VM I've >> been testing. The backing store for each VM has a 1TB partition and >> the guest interfaces are NVME. Now I'm really scratching my head. >> >> Chuck, were you testing disk performance in Linux VMs or only FreeBSD? >> >> Anyone have ideas on why Linux disk performance would drop off a >> cliff over time? >> > I took a detour trying out Xen but apparently that has it's own set of > performance issues related to the FreeBSD port missing newer features. > I did install KVM on the same hardware for comparison. I then tested a > guest provisioned identically to the bhyve VM with a virtio blk device > which ran for 2.5 hours with consistently impressive output ... > > ================================================================================ > > Begin @ Thu Feb 22 20:55:01 CST 2024 > > Version  2.00       ------Sequential Output------ --Sequential Input- > --Random- >                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- > --Seeks-- > Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  > /sec %CP > linux2.shrew.pr 31G 2191k  99  1.4g  55  1.1g  59 3484k  99  2.5g 83  > 7817 127 > Latency              4480us    2528us   17656us    2650us 102us 3568us > Version  2.00       ------Sequential Create------ --------Random > Create-------- > linux2.shrew.prv    -Create-- --Read--- -Delete-- -Create-- --Read--- > -Delete-- >               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  > /sec %CP >                  16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ > +++++ +++ > Latency              9722us      90us     123us      61us 20us 998us > > End @ Thu Feb 22 20:56:11 CST 2024 > ================================================================================ > > > > For comparison, here is the output from a recent run of a Linux VM on > bhyve using the virtio blk device. Note the performance drop off > between the first and second run ... > > > ================================================================================ > > Begin @ Thu Feb 22 22:00:02 CST 2024 > > Version  2.00       ------Sequential Output------ --Sequential Input- > --Random- >                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- > --Seeks-- > Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  > /sec %CP > linux.shrew. 63640M  694k  99  1.5g  97  772m  70  985k  99  1.4g 75  > 2302 115 > Latency             11557us   28959us   27540us    8308us 25227us    > 8605us > Version  2.00       ------Sequential Create------ --------Random > Create-------- > linux.shrew.lab     -Create-- --Read--- -Delete-- -Create-- --Read--- > -Delete-- >               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  > /sec %CP >                  16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ > +++++ +++ > Latency              4058us     125us    1651us      96us 23us 627us > > End @ Thu Feb 22 22:03:07 CST 2024 > ================================================================================ > > Begin @ Thu Feb 22 22:10:02 CST 2024 > > Version  2.00       ------Sequential Output------ --Sequential Input- > --Random- >                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- > --Seeks-- > Name:Size etc        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  > /sec %CP > linux.shrew. 63640M  676k  99  406m  99  300m  92  952k  99  373m 99  > 2145 158 > Latency             11871us     154us   20673us    9926us 22765us   > 14034us > Version  2.00       ------Sequential Create------ --------Random > Create-------- > linux.shrew.lab     -Create-- --Read--- -Delete-- -Create-- --Read--- > -Delete-- >               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  > /sec %CP >                  16     0  95 +++++ +++     0  79     0  96 +++++ > +++     0  76 > Latency              4426us     125us    1687us     576us 72us 654us > > End @ Thu Feb 22 22:19:19 CST 2024 > ================================================================================ > > > It certainly feels like a deficiency in bhyve that isn't tied to any > particular storage device model. I did notice a pattern in top that I > thought was peculiar. When watching the bhyve threads while the > benchmark test is running, I see several CPU threads running at the > top of the list followed by what I assume to be block device threads. > When the disk performance is high, it looks like this ... > >   PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU > COMMAND >  5628 root         68    0    32G    31G CPU24   24   0:53  88.90% > bhyve{vcpu 5} >  5628 root         36    0    32G    31G vmidle   7   0:18  17.86% > bhyve{vcpu 15} >  5628 root         34    0    32G    31G vmidle  26   1:06  16.76% > bhyve{vcpu 8} >  5628 root         21    0    32G    31G uwait   37   0:05   2.69% > bhyve{blk-4:0-2} >  5628 root         21    0    32G    31G uwait   60   0:04   2.64% > bhyve{blk-4:0-0} >  5628 root         21    0    32G    31G uwait   52   0:06   2.62% > bhyve{blk-4:0-1} >  5628 root         21    0    32G    31G uwait   14   0:05   2.58% > bhyve{blk-4:0-6} >  5628 root         21    0    32G    31G RUN     50   0:05   2.51% > bhyve{blk-4:0-4} >  5628 root         20    0    32G    31G uwait   38   0:05   2.51% > bhyve{blk-4:0-5} >  5628 root         21    0    32G    31G uwait   56   0:05   2.46% > bhyve{blk-4:0-3} >  5628 root         20    0    32G    31G uwait   22   0:06   2.45% > bhyve{blk-4:0-7} > > When disk performance drops off, I see that one of the bhyve CPU > threads shows it's PRI value climb quickly until it hits around 135. > At that point, the one CPU thread basically is pegged at 100% WCPU > until the test is done. Other bhyve threads are much less busy ... > >   PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU > COMMAND >  5518 root        135    0    32G    31G CPU59   59   6:49  99.97% > bhyve{vcpu 2} >  5518 root         26    0    32G    31G vmidle  36   5:40   8.80% > bhyve{vcpu 15} >  5518 root         23    0    32G    31G vmidle  18   0:41   4.74% > bhyve{vcpu 13} >  5518 root         20    0    32G    31G vmidle  37   0:09   0.85% > bhyve{vcpu 10} >  5518 root         20    0    32G    31G uwait    6   0:20   0.72% > bhyve{blk-4:0-2} >  5518 root         20    0    32G    31G uwait    8   0:20   0.71% > bhyve{blk-4:0-7} >  5518 root         20    0    32G    31G uwait   43   0:22   0.70% > bhyve{blk-4:0-0} >  5518 root         20    0    32G    31G uwait   12   0:20   0.70% > bhyve{blk-4:0-5} >  5518 root         20    0    32G    31G uwait   43   0:19   0.68% > bhyve{blk-4:0-3} >  5518 root         20    0    32G    31G uwait   36   0:21   0.68% > bhyve{blk-4:0-6} >  5518 root         20    0    32G    31G uwait   46   0:20   0.68% > bhyve{blk-4:0-4} >  5518 root         20    0    32G    31G uwait   32   0:20   0.64% > bhyve{blk-4:0-1} > > This pattern in top repeats each time the benchmark is run unless the > guest is rebooted. > > Does bhyve call pthread_attr_setschedparam at run time under certain > circumstances or is that the scheduler doing it's thing? Anyone have > any ideas? I'm pretty much out of them :/ > > Thanks in advance, > > -Matthew > > Hi Matthew, Have you tried using UFS2 or even memory backed host storage for testing yet? This will rule out if it is a ZFS or bhyve issue. Cheers, Jason.