Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 31 Oct 2022 16:03:17 +0800
From:      Zhenlei Huang <zlei.huang@gmail.com>
To:        Paul Procacci <pprocacci@gmail.com>
Cc:        FreeBSD virtualization <freebsd-virtualization@freebsd.org>
Subject:   Re: NFS in bhyve VM mounted via bridge interface
Message-ID:  <3858240B-7225-4ECB-B4A6-4DE006ED869D@gmail.com>
In-Reply-To: <CAFbbPuiHZokb_7Q=TpXy7fFBNfJGtN=9Dt3T2%2Bbx5OZUQOqjbg@mail.gmail.com>
References:  <A4F5B9EF-AA2B-4F34-8F62-A12ECE4E9566@jld3.net> <CAFbbPuiHZokb_7Q=TpXy7fFBNfJGtN=9Dt3T2%2Bbx5OZUQOqjbg@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

[-- Attachment #1 --]

> On Oct 31, 2022, at 2:02 PM, Paul Procacci <pprocacci@gmail.com> wrote:
> 
> 
> 
> On Mon, Oct 31, 2022 at 12:00 AM John Doherty <bsdlists@jld3.net <mailto:bsdlists@jld3.net>> wrote:
> I have a machine running FreeBSD 12.3-RELEASE with a zpool that consists 
> of 12 mirrored pairs of 14 TB disks.  I'll call this the "storage 
> server." On that machine, I can write to ZFS file systems at around 950 
> MB/s and read from them at around 1450 MB/s. I'm happy with that.
> 
> I have another machine running Alma linux 8.6 that mounts file systems 
> from the storage server via NFS over a 10 GbE network. On this machine, 
> I can write to and read from an NFS file system at around 450 MB/s. I 
> wish that this were better but it's OK.
> 
> I created a bhyve VM on the storage server that also runs Alma linux 
> 8.6. It has a vNIC that is bridged with the 10 GbE physical NIC and a 
> tap interface:
> 
> [root@ss3] # ifconfig vm-storage
> vm-storage: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 
> mtu 1500
>         ether 82:d3:46:17:4e:ee
>         id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
>         maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
>         root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
>         member: tap1 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
>                 ifmaxaddr 0 port 10 priority 128 path cost 2000000
>         member: ixl0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
>                 ifmaxaddr 0 port 5 priority 128 path cost 2000
>         groups: bridge vm-switch viid-ddece@
>         nd6 options=1<PERFORMNUD>
> 
> I mount file systems from the storage server on this VM via NFS. I can 
> write to those file systems at around 250 MB/s and read from them at 
> around 280 MB/s. This surprised me a little: I thought that this might 
> perform better than or at least as well as the physical 10 GbE network 
> but find that it performs significantly worse.
> 
> All my read and write tests here are stupidly simple, using dd to read 
> from /dev/zero and write to a file or to read from a file and write to 
> /dev/null.
> 
> Is anyone else either surprised or unsurprised by these results?
> 
> I have not yet tried passing a physical interface on the storage server 
> through to the VM with PCI passthrough, but the machine does have 
> another 10 GbE interface I could use for this. This stuff is all about 
> 3,200 miles away from me so I need to get someone to plug a cable in for 
> me. I'll be interested to see how that works out, though.
> 
> Any comments much appreciated. Thanks.
> 
> 
> 
> I was getting geared up to help you with this and then this happened:
> 
> Host:
> # dd if=17-04-27.mp4 of=/dev/null bs=4096
> 216616+1 records in
> 216616+1 records out
> 887263074 bytes transferred in 76.830892 secs (11548259 bytes/sec)
> 
> VM:
> dd if=17-04-27.mp4 of=/dev/null bs=4096
> 216616+1 records in
> 216616+1 records out
> 887263074 bytes transferred in 7.430017 secs (119416016 bytes/sec)
> 
> I'm totally flabbergasted.  These results are consistent and not at all what I expected to see.
> I even ran the tests on the VM first and the host second.  Call me confused.

I thinks you should bypass local cache while testing. Try iflag=direct , see dd(1) .

If the input file 17-04-27.mp4 is on NFS, then you could also verify the network IO by netstat.

> 
> Anyways, that's a problem for me to figure out.
> 
> Back to your problem, I had something typed out concerning checking rxsum's and txsum's are turned off on
> the interfaces, or at least see if that makes a difference, trying to use a disk type of nvme, and trying ng_bridge
> w/ netgraph interfaces but now I'm concluding my house is made of glass -- Hah! -- so until I get my house in
> order I'm going to refrain from providing details.
> 
> Sorry and thanks!
> ~Paul

Best regards,
Zhenlei
[-- Attachment #2 --]
<html><head><meta http-equiv="Content-Type" content="text/html; charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Oct 31, 2022, at 2:02 PM, Paul Procacci &lt;<a href="mailto:pprocacci@gmail.com" class="">pprocacci@gmail.com</a>&gt; wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div class=""><div class=""><div class=""><div class=""><div dir="ltr" class=""><br class=""></div><br class=""><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Oct 31, 2022 at 12:00 AM John Doherty &lt;<a href="mailto:bsdlists@jld3.net" class="">bsdlists@jld3.net</a>&gt; wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">I have a machine running FreeBSD 12.3-RELEASE with a zpool that consists <br class="">
of 12 mirrored pairs of 14 TB disks.&nbsp; I'll call this the "storage <br class="">
server." On that machine, I can write to ZFS file systems at around 950 <br class="">
MB/s and read from them at around 1450 MB/s. I'm happy with that.<br class="">
<br class="">
I have another machine running Alma linux 8.6 that mounts file systems <br class="">
from the storage server via NFS over a 10 GbE network. On this machine, <br class="">
I can write to and read from an NFS file system at around 450 MB/s. I <br class="">
wish that this were better but it's OK.<br class="">
<br class="">
I created a bhyve VM on the storage server that also runs Alma linux <br class="">
8.6. It has a vNIC that is bridged with the 10 GbE physical NIC and a <br class="">
tap interface:<br class="">
<br class="">
[root@ss3] # ifconfig vm-storage<br class="">
vm-storage: flags=8843&lt;UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST&gt; metric 0 <br class="">
mtu 1500<br class="">
&nbsp; &nbsp; &nbsp; &nbsp; ether 82:d3:46:17:4e:ee<br class="">
&nbsp; &nbsp; &nbsp; &nbsp; id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15<br class="">
&nbsp; &nbsp; &nbsp; &nbsp; maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200<br class="">
&nbsp; &nbsp; &nbsp; &nbsp; root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0<br class="">
&nbsp; &nbsp; &nbsp; &nbsp; member: tap1 flags=143&lt;LEARNING,DISCOVER,AUTOEDGE,AUTOPTP&gt;<br class="">
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ifmaxaddr 0 port 10 priority 128 path cost 2000000<br class="">
&nbsp; &nbsp; &nbsp; &nbsp; member: ixl0 flags=143&lt;LEARNING,DISCOVER,AUTOEDGE,AUTOPTP&gt;<br class="">
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ifmaxaddr 0 port 5 priority 128 path cost 2000<br class="">
&nbsp; &nbsp; &nbsp; &nbsp; groups: bridge vm-switch viid-ddece@<br class="">
&nbsp; &nbsp; &nbsp; &nbsp; nd6 options=1&lt;PERFORMNUD&gt;<br class="">
<br class="">
I mount file systems from the storage server on this VM via NFS. I can <br class="">
write to those file systems at around 250 MB/s and read from them at <br class="">
around 280 MB/s. This surprised me a little: I thought that this might <br class="">
perform better than or at least as well as the physical 10 GbE network <br class="">
but find that it performs significantly worse.<br class="">
<br class="">
All my read and write tests here are stupidly simple, using dd to read <br class="">
from /dev/zero and write to a file or to read from a file and write to <br class="">
/dev/null.<br class="">
<br class="">
Is anyone else either surprised or unsurprised by these results?<br class="">
<br class="">
I have not yet tried passing a physical interface on the storage server <br class="">
through to the VM with PCI passthrough, but the machine does have <br class="">
another 10 GbE interface I could use for this. This stuff is all about <br class="">
3,200 miles away from me so I need to get someone to plug a cable in for <br class="">
me. I'll be interested to see how that works out, though.<br class="">
<br class="">
Any comments much appreciated. Thanks.<br class="">
<br class="">
<br class="">
</blockquote></div><br class=""></div>I was getting geared up to help you with this and then this happened:<br class=""><br class=""></div><div class="">Host:<br class=""># dd if=17-04-27.mp4 of=/dev/null bs=4096<br class="">216616+1 records in<br class="">216616+1 records out<br class="">887263074 bytes transferred in 76.830892 secs (11548259 bytes/sec)<br class=""><br class=""></div><div class="">VM:<br class="">dd if=17-04-27.mp4 of=/dev/null bs=4096</div><div class="">216616+1 records in<br class="">216616+1 records out<br class=""></div><div class="">887263074 bytes transferred in 7.430017 secs (119416016 bytes/sec)<br class=""><br class=""></div><div class="">I'm totally flabbergasted.&nbsp; These results are consistent and not at all what I expected to see.<br class=""></div><div class="">I even ran the tests on the VM first and the host second.&nbsp; Call me confused.<br class=""></div></div></div></div></div></blockquote><div><br class=""></div><div>I thinks you should bypass local cache while testing. Try iflag=direct , see dd(1) .</div><div><br class=""></div><div>If the input file&nbsp;17-04-27.mp4 is on NFS, then you could also verify the network IO by netstat.</div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><div class=""><div class=""><br class=""></div><div class="">Anyways, that's a problem for me to figure out.<br class=""></div><div class=""><br class="">Back to your problem, I had something typed out concerning checking rxsum's and txsum's are turned off on<br class="">the interfaces, or at least see if that makes a difference, trying to use a disk type of nvme, and trying ng_bridge<br class="">w/ netgraph interfaces but now I'm concluding my house is made of glass -- Hah! -- so until I get my house in<br class="">order I'm going to refrain from providing details.<br class=""><br class=""></div><div class="">Sorry and thanks!<br class=""></div><div class="">~Paul<br class=""></div></div></div></div>
</div></blockquote></div><br class=""><div class=""><div style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);">Best regards,</div><div style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);">Zhenlei</div></div></body></html>

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3858240B-7225-4ECB-B4A6-4DE006ED869D>