Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 31 Oct 2022 16:03:17 +0800
From:      Zhenlei Huang <zlei.huang@gmail.com>
To:        Paul Procacci <pprocacci@gmail.com>
Cc:        FreeBSD virtualization <freebsd-virtualization@freebsd.org>
Subject:   Re: NFS in bhyve VM mounted via bridge interface
Message-ID:  <3858240B-7225-4ECB-B4A6-4DE006ED869D@gmail.com>
In-Reply-To: <CAFbbPuiHZokb_7Q=TpXy7fFBNfJGtN=9Dt3T2%2Bbx5OZUQOqjbg@mail.gmail.com>
References:  <A4F5B9EF-AA2B-4F34-8F62-A12ECE4E9566@jld3.net> <CAFbbPuiHZokb_7Q=TpXy7fFBNfJGtN=9Dt3T2%2Bbx5OZUQOqjbg@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--Apple-Mail=_4F10CFD6-B7D4-48A0-AF0F-0D620E51DF54
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii


> On Oct 31, 2022, at 2:02 PM, Paul Procacci <pprocacci@gmail.com> =
wrote:
>=20
>=20
>=20
> On Mon, Oct 31, 2022 at 12:00 AM John Doherty <bsdlists@jld3.net =
<mailto:bsdlists@jld3.net>> wrote:
> I have a machine running FreeBSD 12.3-RELEASE with a zpool that =
consists=20
> of 12 mirrored pairs of 14 TB disks.  I'll call this the "storage=20
> server." On that machine, I can write to ZFS file systems at around =
950=20
> MB/s and read from them at around 1450 MB/s. I'm happy with that.
>=20
> I have another machine running Alma linux 8.6 that mounts file systems=20=

> from the storage server via NFS over a 10 GbE network. On this =
machine,=20
> I can write to and read from an NFS file system at around 450 MB/s. I=20=

> wish that this were better but it's OK.
>=20
> I created a bhyve VM on the storage server that also runs Alma linux=20=

> 8.6. It has a vNIC that is bridged with the 10 GbE physical NIC and a=20=

> tap interface:
>=20
> [root@ss3] # ifconfig vm-storage
> vm-storage: flags=3D8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> =
metric 0=20
> mtu 1500
>         ether 82:d3:46:17:4e:ee
>         id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
>         maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
>         root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
>         member: tap1 flags=3D143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
>                 ifmaxaddr 0 port 10 priority 128 path cost 2000000
>         member: ixl0 flags=3D143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
>                 ifmaxaddr 0 port 5 priority 128 path cost 2000
>         groups: bridge vm-switch viid-ddece@
>         nd6 options=3D1<PERFORMNUD>
>=20
> I mount file systems from the storage server on this VM via NFS. I can=20=

> write to those file systems at around 250 MB/s and read from them at=20=

> around 280 MB/s. This surprised me a little: I thought that this might=20=

> perform better than or at least as well as the physical 10 GbE network=20=

> but find that it performs significantly worse.
>=20
> All my read and write tests here are stupidly simple, using dd to read=20=

> from /dev/zero and write to a file or to read from a file and write to=20=

> /dev/null.
>=20
> Is anyone else either surprised or unsurprised by these results?
>=20
> I have not yet tried passing a physical interface on the storage =
server=20
> through to the VM with PCI passthrough, but the machine does have=20
> another 10 GbE interface I could use for this. This stuff is all about=20=

> 3,200 miles away from me so I need to get someone to plug a cable in =
for=20
> me. I'll be interested to see how that works out, though.
>=20
> Any comments much appreciated. Thanks.
>=20
>=20
>=20
> I was getting geared up to help you with this and then this happened:
>=20
> Host:
> # dd if=3D17-04-27.mp4 of=3D/dev/null bs=3D4096
> 216616+1 records in
> 216616+1 records out
> 887263074 bytes transferred in 76.830892 secs (11548259 bytes/sec)
>=20
> VM:
> dd if=3D17-04-27.mp4 of=3D/dev/null bs=3D4096
> 216616+1 records in
> 216616+1 records out
> 887263074 bytes transferred in 7.430017 secs (119416016 bytes/sec)
>=20
> I'm totally flabbergasted.  These results are consistent and not at =
all what I expected to see.
> I even ran the tests on the VM first and the host second.  Call me =
confused.

I thinks you should bypass local cache while testing. Try iflag=3Ddirect =
, see dd(1) .

If the input file 17-04-27.mp4 is on NFS, then you could also verify the =
network IO by netstat.

>=20
> Anyways, that's a problem for me to figure out.
>=20
> Back to your problem, I had something typed out concerning checking =
rxsum's and txsum's are turned off on
> the interfaces, or at least see if that makes a difference, trying to =
use a disk type of nvme, and trying ng_bridge
> w/ netgraph interfaces but now I'm concluding my house is made of =
glass -- Hah! -- so until I get my house in
> order I'm going to refrain from providing details.
>=20
> Sorry and thanks!
> ~Paul

Best regards,
Zhenlei=

--Apple-Mail=_4F10CFD6-B7D4-48A0-AF0F-0D620E51DF54
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=us-ascii

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html; =
charset=3Dus-ascii"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; line-break: after-white-space;" class=3D""><br =
class=3D""><div><blockquote type=3D"cite" class=3D""><div class=3D"">On =
Oct 31, 2022, at 2:02 PM, Paul Procacci &lt;<a =
href=3D"mailto:pprocacci@gmail.com" class=3D"">pprocacci@gmail.com</a>&gt;=
 wrote:</div><br class=3D"Apple-interchange-newline"><div class=3D""><div =
dir=3D"ltr" class=3D""><div class=3D""><div class=3D""><div =
class=3D""><div class=3D""><div dir=3D"ltr" class=3D""><br =
class=3D""></div><br class=3D""><div class=3D"gmail_quote"><div =
dir=3D"ltr" class=3D"gmail_attr">On Mon, Oct 31, 2022 at 12:00 AM John =
Doherty &lt;<a href=3D"mailto:bsdlists@jld3.net" =
class=3D"">bsdlists@jld3.net</a>&gt; wrote:<br =
class=3D""></div><blockquote class=3D"gmail_quote" style=3D"margin:0px =
0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">I =
have a machine running FreeBSD 12.3-RELEASE with a zpool that consists =
<br class=3D"">
of 12 mirrored pairs of 14 TB disks.&nbsp; I'll call this the "storage =
<br class=3D"">
server." On that machine, I can write to ZFS file systems at around 950 =
<br class=3D"">
MB/s and read from them at around 1450 MB/s. I'm happy with that.<br =
class=3D"">
<br class=3D"">
I have another machine running Alma linux 8.6 that mounts file systems =
<br class=3D"">
from the storage server via NFS over a 10 GbE network. On this machine, =
<br class=3D"">
I can write to and read from an NFS file system at around 450 MB/s. I =
<br class=3D"">
wish that this were better but it's OK.<br class=3D"">
<br class=3D"">
I created a bhyve VM on the storage server that also runs Alma linux <br =
class=3D"">
8.6. It has a vNIC that is bridged with the 10 GbE physical NIC and a =
<br class=3D"">
tap interface:<br class=3D"">
<br class=3D"">
[root@ss3] # ifconfig vm-storage<br class=3D"">
vm-storage: flags=3D8843&lt;UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST&gt; =
metric 0 <br class=3D"">
mtu 1500<br class=3D"">
&nbsp; &nbsp; &nbsp; &nbsp; ether 82:d3:46:17:4e:ee<br class=3D"">
&nbsp; &nbsp; &nbsp; &nbsp; id 00:00:00:00:00:00 priority 32768 =
hellotime 2 fwddelay 15<br class=3D"">
&nbsp; &nbsp; &nbsp; &nbsp; maxage 20 holdcnt 6 proto rstp maxaddr 2000 =
timeout 1200<br class=3D"">
&nbsp; &nbsp; &nbsp; &nbsp; root id 00:00:00:00:00:00 priority 32768 =
ifcost 0 port 0<br class=3D"">
&nbsp; &nbsp; &nbsp; &nbsp; member: tap1 =
flags=3D143&lt;LEARNING,DISCOVER,AUTOEDGE,AUTOPTP&gt;<br class=3D"">
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ifmaxaddr 0 port =
10 priority 128 path cost 2000000<br class=3D"">
&nbsp; &nbsp; &nbsp; &nbsp; member: ixl0 =
flags=3D143&lt;LEARNING,DISCOVER,AUTOEDGE,AUTOPTP&gt;<br class=3D"">
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ifmaxaddr 0 port =
5 priority 128 path cost 2000<br class=3D"">
&nbsp; &nbsp; &nbsp; &nbsp; groups: bridge vm-switch viid-ddece@<br =
class=3D"">
&nbsp; &nbsp; &nbsp; &nbsp; nd6 options=3D1&lt;PERFORMNUD&gt;<br =
class=3D"">
<br class=3D"">
I mount file systems from the storage server on this VM via NFS. I can =
<br class=3D"">
write to those file systems at around 250 MB/s and read from them at <br =
class=3D"">
around 280 MB/s. This surprised me a little: I thought that this might =
<br class=3D"">
perform better than or at least as well as the physical 10 GbE network =
<br class=3D"">
but find that it performs significantly worse.<br class=3D"">
<br class=3D"">
All my read and write tests here are stupidly simple, using dd to read =
<br class=3D"">
from /dev/zero and write to a file or to read from a file and write to =
<br class=3D"">
/dev/null.<br class=3D"">
<br class=3D"">
Is anyone else either surprised or unsurprised by these results?<br =
class=3D"">
<br class=3D"">
I have not yet tried passing a physical interface on the storage server =
<br class=3D"">
through to the VM with PCI passthrough, but the machine does have <br =
class=3D"">
another 10 GbE interface I could use for this. This stuff is all about =
<br class=3D"">
3,200 miles away from me so I need to get someone to plug a cable in for =
<br class=3D"">
me. I'll be interested to see how that works out, though.<br class=3D"">
<br class=3D"">
Any comments much appreciated. Thanks.<br class=3D"">
<br class=3D"">
<br class=3D"">
</blockquote></div><br class=3D""></div>I was getting geared up to help =
you with this and then this happened:<br class=3D""><br =
class=3D""></div><div class=3D"">Host:<br class=3D""># dd =
if=3D17-04-27.mp4 of=3D/dev/null bs=3D4096<br class=3D"">216616+1 =
records in<br class=3D"">216616+1 records out<br class=3D"">887263074 =
bytes transferred in 76.830892 secs (11548259 bytes/sec)<br class=3D""><br=
 class=3D""></div><div class=3D"">VM:<br class=3D"">dd if=3D17-04-27.mp4 =
of=3D/dev/null bs=3D4096</div><div class=3D"">216616+1 records in<br =
class=3D"">216616+1 records out<br class=3D""></div><div =
class=3D"">887263074 bytes transferred in 7.430017 secs (119416016 =
bytes/sec)<br class=3D""><br class=3D""></div><div class=3D"">I'm =
totally flabbergasted.&nbsp; These results are consistent and not at all =
what I expected to see.<br class=3D""></div><div class=3D"">I even ran =
the tests on the VM first and the host second.&nbsp; Call me =
confused.<br =
class=3D""></div></div></div></div></div></blockquote><div><br =
class=3D""></div><div>I thinks you should bypass local cache while =
testing. Try iflag=3Ddirect , see dd(1) .</div><div><br =
class=3D""></div><div>If the input file&nbsp;17-04-27.mp4 is on NFS, =
then you could also verify the network IO by netstat.</div><br =
class=3D""><blockquote type=3D"cite" class=3D""><div class=3D""><div =
dir=3D"ltr" class=3D""><div class=3D""><div class=3D""><div class=3D""><br=
 class=3D""></div><div class=3D"">Anyways, that's a problem for me to =
figure out.<br class=3D""></div><div class=3D""><br class=3D"">Back to =
your problem, I had something typed out concerning checking rxsum's and =
txsum's are turned off on<br class=3D"">the interfaces, or at least see =
if that makes a difference, trying to use a disk type of nvme, and =
trying ng_bridge<br class=3D"">w/ netgraph interfaces but now I'm =
concluding my house is made of glass -- Hah! -- so until I get my house =
in<br class=3D"">order I'm going to refrain from providing details.<br =
class=3D""><br class=3D""></div><div class=3D"">Sorry and thanks!<br =
class=3D""></div><div class=3D"">~Paul<br =
class=3D""></div></div></div></div>
</div></blockquote></div><br class=3D""><div class=3D""><div =
style=3D"caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);">Best =
regards,</div><div style=3D"caret-color: rgb(0, 0, 0); color: rgb(0, 0, =
0);">Zhenlei</div></div></body></html>=

--Apple-Mail=_4F10CFD6-B7D4-48A0-AF0F-0D620E51DF54--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3858240B-7225-4ECB-B4A6-4DE006ED869D>