Date: Mon, 31 Oct 2022 16:03:17 +0800 From: Zhenlei Huang <zlei.huang@gmail.com> To: Paul Procacci <pprocacci@gmail.com> Cc: FreeBSD virtualization <freebsd-virtualization@freebsd.org> Subject: Re: NFS in bhyve VM mounted via bridge interface Message-ID: <3858240B-7225-4ECB-B4A6-4DE006ED869D@gmail.com> In-Reply-To: <CAFbbPuiHZokb_7Q=TpXy7fFBNfJGtN=9Dt3T2%2Bbx5OZUQOqjbg@mail.gmail.com> References: <A4F5B9EF-AA2B-4F34-8F62-A12ECE4E9566@jld3.net> <CAFbbPuiHZokb_7Q=TpXy7fFBNfJGtN=9Dt3T2%2Bbx5OZUQOqjbg@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
--Apple-Mail=_4F10CFD6-B7D4-48A0-AF0F-0D620E51DF54 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii > On Oct 31, 2022, at 2:02 PM, Paul Procacci <pprocacci@gmail.com> = wrote: >=20 >=20 >=20 > On Mon, Oct 31, 2022 at 12:00 AM John Doherty <bsdlists@jld3.net = <mailto:bsdlists@jld3.net>> wrote: > I have a machine running FreeBSD 12.3-RELEASE with a zpool that = consists=20 > of 12 mirrored pairs of 14 TB disks. I'll call this the "storage=20 > server." On that machine, I can write to ZFS file systems at around = 950=20 > MB/s and read from them at around 1450 MB/s. I'm happy with that. >=20 > I have another machine running Alma linux 8.6 that mounts file systems=20= > from the storage server via NFS over a 10 GbE network. On this = machine,=20 > I can write to and read from an NFS file system at around 450 MB/s. I=20= > wish that this were better but it's OK. >=20 > I created a bhyve VM on the storage server that also runs Alma linux=20= > 8.6. It has a vNIC that is bridged with the 10 GbE physical NIC and a=20= > tap interface: >=20 > [root@ss3] # ifconfig vm-storage > vm-storage: flags=3D8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> = metric 0=20 > mtu 1500 > ether 82:d3:46:17:4e:ee > id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15 > maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200 > root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0 > member: tap1 flags=3D143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP> > ifmaxaddr 0 port 10 priority 128 path cost 2000000 > member: ixl0 flags=3D143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP> > ifmaxaddr 0 port 5 priority 128 path cost 2000 > groups: bridge vm-switch viid-ddece@ > nd6 options=3D1<PERFORMNUD> >=20 > I mount file systems from the storage server on this VM via NFS. I can=20= > write to those file systems at around 250 MB/s and read from them at=20= > around 280 MB/s. This surprised me a little: I thought that this might=20= > perform better than or at least as well as the physical 10 GbE network=20= > but find that it performs significantly worse. >=20 > All my read and write tests here are stupidly simple, using dd to read=20= > from /dev/zero and write to a file or to read from a file and write to=20= > /dev/null. >=20 > Is anyone else either surprised or unsurprised by these results? >=20 > I have not yet tried passing a physical interface on the storage = server=20 > through to the VM with PCI passthrough, but the machine does have=20 > another 10 GbE interface I could use for this. This stuff is all about=20= > 3,200 miles away from me so I need to get someone to plug a cable in = for=20 > me. I'll be interested to see how that works out, though. >=20 > Any comments much appreciated. Thanks. >=20 >=20 >=20 > I was getting geared up to help you with this and then this happened: >=20 > Host: > # dd if=3D17-04-27.mp4 of=3D/dev/null bs=3D4096 > 216616+1 records in > 216616+1 records out > 887263074 bytes transferred in 76.830892 secs (11548259 bytes/sec) >=20 > VM: > dd if=3D17-04-27.mp4 of=3D/dev/null bs=3D4096 > 216616+1 records in > 216616+1 records out > 887263074 bytes transferred in 7.430017 secs (119416016 bytes/sec) >=20 > I'm totally flabbergasted. These results are consistent and not at = all what I expected to see. > I even ran the tests on the VM first and the host second. Call me = confused. I thinks you should bypass local cache while testing. Try iflag=3Ddirect = , see dd(1) . If the input file 17-04-27.mp4 is on NFS, then you could also verify the = network IO by netstat. >=20 > Anyways, that's a problem for me to figure out. >=20 > Back to your problem, I had something typed out concerning checking = rxsum's and txsum's are turned off on > the interfaces, or at least see if that makes a difference, trying to = use a disk type of nvme, and trying ng_bridge > w/ netgraph interfaces but now I'm concluding my house is made of = glass -- Hah! -- so until I get my house in > order I'm going to refrain from providing details. >=20 > Sorry and thanks! > ~Paul Best regards, Zhenlei= --Apple-Mail=_4F10CFD6-B7D4-48A0-AF0F-0D620E51DF54 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii <html><head><meta http-equiv=3D"Content-Type" content=3D"text/html; = charset=3Dus-ascii"></head><body style=3D"word-wrap: break-word; = -webkit-nbsp-mode: space; line-break: after-white-space;" class=3D""><br = class=3D""><div><blockquote type=3D"cite" class=3D""><div class=3D"">On = Oct 31, 2022, at 2:02 PM, Paul Procacci <<a = href=3D"mailto:pprocacci@gmail.com" class=3D"">pprocacci@gmail.com</a>>= wrote:</div><br class=3D"Apple-interchange-newline"><div class=3D""><div = dir=3D"ltr" class=3D""><div class=3D""><div class=3D""><div = class=3D""><div class=3D""><div dir=3D"ltr" class=3D""><br = class=3D""></div><br class=3D""><div class=3D"gmail_quote"><div = dir=3D"ltr" class=3D"gmail_attr">On Mon, Oct 31, 2022 at 12:00 AM John = Doherty <<a href=3D"mailto:bsdlists@jld3.net" = class=3D"">bsdlists@jld3.net</a>> wrote:<br = class=3D""></div><blockquote class=3D"gmail_quote" style=3D"margin:0px = 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">I = have a machine running FreeBSD 12.3-RELEASE with a zpool that consists = <br class=3D""> of 12 mirrored pairs of 14 TB disks. I'll call this the "storage = <br class=3D""> server." On that machine, I can write to ZFS file systems at around 950 = <br class=3D""> MB/s and read from them at around 1450 MB/s. I'm happy with that.<br = class=3D""> <br class=3D""> I have another machine running Alma linux 8.6 that mounts file systems = <br class=3D""> from the storage server via NFS over a 10 GbE network. On this machine, = <br class=3D""> I can write to and read from an NFS file system at around 450 MB/s. I = <br class=3D""> wish that this were better but it's OK.<br class=3D""> <br class=3D""> I created a bhyve VM on the storage server that also runs Alma linux <br = class=3D""> 8.6. It has a vNIC that is bridged with the 10 GbE physical NIC and a = <br class=3D""> tap interface:<br class=3D""> <br class=3D""> [root@ss3] # ifconfig vm-storage<br class=3D""> vm-storage: flags=3D8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> = metric 0 <br class=3D""> mtu 1500<br class=3D""> ether 82:d3:46:17:4e:ee<br class=3D""> id 00:00:00:00:00:00 priority 32768 = hellotime 2 fwddelay 15<br class=3D""> maxage 20 holdcnt 6 proto rstp maxaddr 2000 = timeout 1200<br class=3D""> root id 00:00:00:00:00:00 priority 32768 = ifcost 0 port 0<br class=3D""> member: tap1 = flags=3D143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP><br class=3D""> ifmaxaddr 0 port = 10 priority 128 path cost 2000000<br class=3D""> member: ixl0 = flags=3D143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP><br class=3D""> ifmaxaddr 0 port = 5 priority 128 path cost 2000<br class=3D""> groups: bridge vm-switch viid-ddece@<br = class=3D""> nd6 options=3D1<PERFORMNUD><br = class=3D""> <br class=3D""> I mount file systems from the storage server on this VM via NFS. I can = <br class=3D""> write to those file systems at around 250 MB/s and read from them at <br = class=3D""> around 280 MB/s. This surprised me a little: I thought that this might = <br class=3D""> perform better than or at least as well as the physical 10 GbE network = <br class=3D""> but find that it performs significantly worse.<br class=3D""> <br class=3D""> All my read and write tests here are stupidly simple, using dd to read = <br class=3D""> from /dev/zero and write to a file or to read from a file and write to = <br class=3D""> /dev/null.<br class=3D""> <br class=3D""> Is anyone else either surprised or unsurprised by these results?<br = class=3D""> <br class=3D""> I have not yet tried passing a physical interface on the storage server = <br class=3D""> through to the VM with PCI passthrough, but the machine does have <br = class=3D""> another 10 GbE interface I could use for this. This stuff is all about = <br class=3D""> 3,200 miles away from me so I need to get someone to plug a cable in for = <br class=3D""> me. I'll be interested to see how that works out, though.<br class=3D""> <br class=3D""> Any comments much appreciated. Thanks.<br class=3D""> <br class=3D""> <br class=3D""> </blockquote></div><br class=3D""></div>I was getting geared up to help = you with this and then this happened:<br class=3D""><br = class=3D""></div><div class=3D"">Host:<br class=3D""># dd = if=3D17-04-27.mp4 of=3D/dev/null bs=3D4096<br class=3D"">216616+1 = records in<br class=3D"">216616+1 records out<br class=3D"">887263074 = bytes transferred in 76.830892 secs (11548259 bytes/sec)<br class=3D""><br= class=3D""></div><div class=3D"">VM:<br class=3D"">dd if=3D17-04-27.mp4 = of=3D/dev/null bs=3D4096</div><div class=3D"">216616+1 records in<br = class=3D"">216616+1 records out<br class=3D""></div><div = class=3D"">887263074 bytes transferred in 7.430017 secs (119416016 = bytes/sec)<br class=3D""><br class=3D""></div><div class=3D"">I'm = totally flabbergasted. These results are consistent and not at all = what I expected to see.<br class=3D""></div><div class=3D"">I even ran = the tests on the VM first and the host second. Call me = confused.<br = class=3D""></div></div></div></div></div></blockquote><div><br = class=3D""></div><div>I thinks you should bypass local cache while = testing. Try iflag=3Ddirect , see dd(1) .</div><div><br = class=3D""></div><div>If the input file 17-04-27.mp4 is on NFS, = then you could also verify the network IO by netstat.</div><br = class=3D""><blockquote type=3D"cite" class=3D""><div class=3D""><div = dir=3D"ltr" class=3D""><div class=3D""><div class=3D""><div class=3D""><br= class=3D""></div><div class=3D"">Anyways, that's a problem for me to = figure out.<br class=3D""></div><div class=3D""><br class=3D"">Back to = your problem, I had something typed out concerning checking rxsum's and = txsum's are turned off on<br class=3D"">the interfaces, or at least see = if that makes a difference, trying to use a disk type of nvme, and = trying ng_bridge<br class=3D"">w/ netgraph interfaces but now I'm = concluding my house is made of glass -- Hah! -- so until I get my house = in<br class=3D"">order I'm going to refrain from providing details.<br = class=3D""><br class=3D""></div><div class=3D"">Sorry and thanks!<br = class=3D""></div><div class=3D"">~Paul<br = class=3D""></div></div></div></div> </div></blockquote></div><br class=3D""><div class=3D""><div = style=3D"caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);">Best = regards,</div><div style=3D"caret-color: rgb(0, 0, 0); color: rgb(0, 0, = 0);">Zhenlei</div></div></body></html>= --Apple-Mail=_4F10CFD6-B7D4-48A0-AF0F-0D620E51DF54--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3858240B-7225-4ECB-B4A6-4DE006ED869D>