Date: Mon, 3 Jul 2023 16:24:00 -0400 From: Cheng Cui <cc@freebsd.org> To: Murali Krishnamurthy <muralik1@vmware.com> Cc: "Scheffenegger, Richard" <rscheff@freebsd.org>, FreeBSD Transport <freebsd-transport@freebsd.org> Subject: Re: FreeBSD TCP (with iperf3) comparison with Linux Message-ID: <CAGaXuiLq-ia=9ci=81GnnW2FUTdpPrVa_F2nVm6a6rmLsHhBRw@mail.gmail.com> In-Reply-To: <PH0PR05MB10064A16ADEE4B35DF6E5DF0FFB29A@PH0PR05MB10064.namprd05.prod.outlook.com> References: <53aff274-b1a8-0730-6971-2755c7e7b688@freebsd.org> <PH0PR05MB100642BD041192E6B7EBDBFE1FB2AA@PH0PR05MB10064.namprd05.prod.outlook.com> <CAGaXuiJEsgRXo12iVW_9C-VzT%2BF3E3CuYa-es3qJ9w8n3yrAwg@mail.gmail.com> <PH0PR05MB10064A16ADEE4B35DF6E5DF0FFB29A@PH0PR05MB10064.namprd05.prod.outlook.com>
next in thread | previous in thread | raw e-mail | index | archive | help
--000000000000139feb05ff9af368 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I see. Sorry about a straight description in my previous email. If you found the iperf3 report shows bad throughput and increasing numbers in the "Retr" field, also the "netstat -sp tcp" shows retransmitted packets without SACK recovery episodes (SACK is enabled by default). Then, you are likely hitting the problem I described, and the root cause is the TX queue drops. The tcpdump trace file won't show any packet retransmissions and the peer won't be aware of packet loss, as this is a local problem. cc@s1:~ % netstat -sp tcp | egrep "tcp:|retrans|SACK" tcp: 139 data packets (300416 bytes) retransmitted << 0 data packets unnecessarily retransmitted 3 retransmit timeouts 0 retransmitted 0 SACK recovery episodes << 0 segment rexmits in SACK recovery episodes 0 byte rexmits in SACK recovery episodes 0 SACK options (SACK blocks) received 0 SACK options (SACK blocks) sent 0 SACK retransmissions lost 0 SACK scoreboard overflow Local packet drops due to TX full can be found from this command, for example cc@s1:~ % netstat -i -I bce4 -nd Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll Drop bce4 1500 <Link#5> 00:10:18:56:94:d4 286184 0 0 148079 0 0 54 << bce4 - 10.1.1.0/24 10.1.1.2 286183 - - 582111 - - - cc@s1:= ~ % Hope the above stats can help you better root cause analysis. Also, increasing the TX queue size is a workaround and is specific to a particular NIC. But you get the idea. Best Regards, Cheng Cui On Mon, Jul 3, 2023 at 11:34=E2=80=AFAM Murali Krishnamurthy <muralik1@vmwa= re.com> wrote: > Cheng, > > > > Thanks for your inputs. > > > > Sorry, I am not familiar with this area. > > > > Few queries, > > > > =E2=80=9CI believe the default values for bce tx/rx pages are 2. And I ha= ppened to > find > this problem before that when the tx queue was full, it would not enqueue > packets > and started return errors. > And this error was misunderstood by the TCP layer as retransmission.=E2= =80=9D > > > > Could you please elaborate what is misunderstood by TCP here? Loss of > packets should anyway lead to retransmissions. > > > > Could you point to some stats where I can see such drops due to queue > getting full? > > > > I have a vmx interface in my VM and I have attached the screenshot of > ifconfig command for that. > > Anything we can understand from that? > > Will your suggestion of increasing tx_pages=3D4 and rx_pages=3D4 work for= this > ? If so, I assume names would be hw.vmx.tx_pages=3D4 and hw.vmx.rx_pages = ? > > > > Regards > > Murali > > > > > > *From: *Cheng Cui <cc@freebsd.org> > *Date: *Friday, 30 June 2023 at 10:02 PM > *To: *Murali Krishnamurthy <muralik1@vmware.com> > *Cc: *Scheffenegger, Richard <rscheff@freebsd.org>, FreeBSD Transport < > freebsd-transport@freebsd.org> > *Subject: *Re: FreeBSD TCP (with iperf3) comparison with Linux > > *!! External Email* > > I used an emulation testbed from Emulab.net with Dummynet traffic shaper > adding 100ms RTT > between two nodes, the link capacity is 1Gbps and both nodes are using > freebsd13.2. > > cc@s1:~ % ping -c 3 r1 > PING r1-link1 (10.1.1.3): 56 data bytes > 64 bytes from 10.1.1.3: icmp_seq=3D0 ttl=3D64 time=3D100.091 ms > 64 bytes from 10.1.1.3: icmp_seq=3D1 ttl=3D64 time=3D99.995 ms > 64 bytes from 10.1.1.3: icmp_seq=3D2 ttl=3D64 time=3D99.979 ms > > --- r1-link1 ping statistics --- > 3 packets transmitted, 3 packets received, 0.0% packet loss > round-trip min/avg/max/stddev =3D 99.979/100.022/100.091/0.049 ms > > > cc@s1:~ % iperf3 -c r1 -t 10 -i 1 -C cubic > Connecting to host r1, port 5201 > [ 5] local 10.1.1.2 port 56089 connected to 10.1.1.3 port 5201 > [ ID] Interval Transfer Bitrate Retr Cwnd > [ 5] 0.00-1.00 sec 4.19 MBytes 35.2 Mbits/sec 0 1.24 MBytes > > [ 5] 1.00-2.00 sec 56.5 MBytes 474 Mbits/sec 6 2.41 MBytes > > [ 5] 2.00-3.00 sec 58.6 MBytes 492 Mbits/sec 18 7.17 MBytes > > [ 5] 3.00-4.00 sec 65.6 MBytes 550 Mbits/sec 14 606 KBytes > > [ 5] 4.00-5.00 sec 60.8 MBytes 510 Mbits/sec 18 7.22 MBytes > > [ 5] 5.00-6.00 sec 62.1 MBytes 521 Mbits/sec 12 7.86 MBytes > > [ 5] 6.00-7.00 sec 60.9 MBytes 512 Mbits/sec 14 3.43 MBytes > > [ 5] 7.00-8.00 sec 62.8 MBytes 527 Mbits/sec 16 372 KBytes > > [ 5] 8.00-9.00 sec 59.3 MBytes 497 Mbits/sec 14 1.77 MBytes > > [ 5] 9.00-10.00 sec 57.0 MBytes 477 Mbits/sec 18 7.13 MBytes > > - - - - - - - - - - - - - - - - - - - - - - - - - > [ ID] Interval Transfer Bitrate Retr > [ 5] 0.00-10.00 sec 548 MBytes 459 Mbits/sec 130 > sender > [ 5] 0.00-10.10 sec 540 MBytes 449 Mbits/sec > receiver > > iperf Done. > > cc@s1:~ % ifconfig bce4 > bce4: flags=3D8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1= 500 > > options=3Dc01bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCS= UM,TSO4,VLAN_HWTSO,LINKSTATE> > ether 00:10:18:56:94:d4 > inet 10.1.1.2 netmask 0xffffff00 broadcast 10.1.1.255 > media: Ethernet 1000baseT <full-duplex> > status: active > nd6 options=3D29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> > > I believe the default values for bce tx/rx pages are 2. And I happened to > find > this problem before that when the tx queue was full, it would not enqueue > packets > and started return errors. > And this error was misunderstood by the TCP layer as retransmission. > > After adding hw.bce.tx_pages=3D4 and hw.bce.rx_pages=3D4 in /boot/loader.= conf > and reboot: > > cc@s1:~ % iperf3 -c r1 -t 10 -i 1 -C cubic > Connecting to host r1, port 5201 > [ 5] local 10.1.1.2 port 20478 connected to 10.1.1.3 port 5201 > [ ID] Interval Transfer Bitrate Retr Cwnd > [ 5] 0.00-1.00 sec 4.15 MBytes 34.8 Mbits/sec 0 1.17 MBytes > > [ 5] 1.00-2.00 sec 83.1 MBytes 697 Mbits/sec 0 12.2 MBytes > > [ 5] 2.00-3.00 sec 112 MBytes 939 Mbits/sec 0 12.2 MBytes > > [ 5] 3.00-4.00 sec 113 MBytes 944 Mbits/sec 0 12.2 MBytes > > [ 5] 4.00-5.00 sec 112 MBytes 940 Mbits/sec 0 12.2 MBytes > > [ 5] 5.00-6.00 sec 112 MBytes 942 Mbits/sec 0 12.2 MBytes > > [ 5] 6.00-7.00 sec 112 MBytes 938 Mbits/sec 0 12.2 MBytes > > [ 5] 7.00-8.00 sec 113 MBytes 944 Mbits/sec 0 12.2 MBytes > > [ 5] 8.00-9.00 sec 112 MBytes 938 Mbits/sec 0 12.2 MBytes > > [ 5] 9.00-10.00 sec 113 MBytes 947 Mbits/sec 0 12.2 MBytes > > - - - - - - - - - - - - - - - - - - - - - - - - - > [ ID] Interval Transfer Bitrate Retr > [ 5] 0.00-10.00 sec 985 MBytes 826 Mbits/sec 0 > sender > [ 5] 0.00-10.11 sec 982 MBytes 815 Mbits/sec > receiver > > iperf Done. > > > > Best Regards, > > Cheng Cui > > > > > > On Fri, Jun 30, 2023 at 12:26=E2=80=AFPM Murali Krishnamurthy <muralik1@v= mware.com> > wrote: > > Richard, > > > > Appreciate the useful inputs you have shared so far. Will try to figure > out regarding packet drops. > > > > Regarding HyStart, I see even BSD code base has support for this. May I > know by when can we see that in an release, if not already available ? > > > > Regarding this point : *=E2=80=9CSwitching to other cc modules may give s= ome more > insights. But again, I suspect that momentary (microsecond) burstiness of > BSD may be causing this significantly higher loss rate.=E2=80=9D* > > Is there some info somewhere where I can understand more on this in detai= l? > > > > Regards > > Murali > > > > > > On 30/06/23, 9:35 PM, "owner-freebsd-transport@freebsd.org" < > owner-freebsd-transport@freebsd.org> wrote: > > > > Hi Murali, > > > > > Q. Since you mention two hypervisors - what is the phyiscal network > topology in between these two servers? What theoretical link rates would = be > attainable? > > > > > > Here is the topology > > > > > > Iperf end points are on 2 different hypervisors. > > > > > > =E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80= =94=E2=80=94=E2=80=94=E2=80=94 =E2=80=94=E2=80=94=E2=80=94=E2=80=94= =E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2= =80=94=E2=80=94=E2=80=94=E2=80=94 > =E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94 =E2= =80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94-=E2=80=94 > > > | Linux VM1 | | BSD 13 VM > 1 | > | Linux VM2 | | BSD 13 VM 2 | > > > |___________| |_ ____ ____ ___ > | = |___________ > | |_ ____ ____ ___ | > > > | | > | > | | > > > > | | > | | > > > > =E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94= =E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94 = =E2= =80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80= =94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94 > > > | ESX Hypervisor 1 | 10G link connected vi= a > L2 Switch | ESX Hypervisor 2 | > > > | > |=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94= =E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2= =80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94 > | | > > > |=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80= =94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94 > | > |=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94= =E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94| > > > > > > > > > Nic is of 10G capacity on both ESX server and it has below config. > > > > > > So, when both VMs run on the same Hypervisor, maybe with another VM to > simulate the 100ms delay, can you attain a lossless baseline scenario? > > > > > > > BDP for 16MB Socket buffer: 16 MB * (1000 ms * 100ms latency) * 8 bits/ > 1024 =3D 1.25 Gbps > > > > > > So theoretically we should see close to 1.25Gbps of Bitrate and we see > Linux reaching close to this number. > > > > Under no loss, yes. > > > > > > > But BSD is not able to do that. > > > > > > > > > Q. Did you run iperf3? Did the transmitting endpoint report any > retransmissions between Linux or FBSD hosts? > > > > > > Yes, we used iper3. I see Linux doing less number retransmissions > compared to BSD. > > > On BSD, the best performance was around 600 Mbps bitrate and the number > of retransmissions for this number seen is around 32K > > > On Linux, the best performance was around 1.15 Gbps bitrate and the > number of retransmissions for this number seen is only 2K. > > > So as you pointed the number of retransmissions in BSD could be the rea= l > issue here. > > > > There are other cc modules available; but I believe one major deviation i= s > that Linux can perform mechanisms like hystart; ACKing every packet when > the client detects slow start; perform pacing to achieve more uniform > packet transmissions. > > > > I think the next step would be to find out, at which queue those packet > discards are coming from (external switch? delay generator? Vswitch? Eth > stack inside the VM?) > > > > Or alternatively, provide your ESX hypervisors with vastly more link > speed, to rule out any L2 induced packet drops - provided your delay > generator is not the source when momentarily overloaded. > > > > > Is there a way to reduce this packet loss by fine tuning some parameter= s > w.r.t ring buffer or any other areas? > > > > Finding where these arise (looking at queue and port counters) would be > the next step. But this is not really my specific area of expertise beyon= d > the high level, vendor independent observations. > > > > Switching to other cc modules may give some more insights. But again, I > suspect that momentary (microsecond) burstiness of BSD may be causing thi= s > significantly higher loss rate. > > > > TCP RACK would be another option. That stack has pacing, more fine-graine= d > timing, the RACK loss recovery mechanisms etc. Maybe that helps reduce th= e > observed packet drops by iperf, and consequently, yield a higher overall > throuhgput. > > > > > > > > > > > > *!! External Email:* This email originated from outside of the > organization. Do not click links or open attachments unless you recognize > the sender. > > > --000000000000139feb05ff9af368 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div dir=3D"ltr"><div>I see. Sorry about a straight descri= ption in my previous email. <br></div><div><br></div><div>If you found the = iperf3 report shows bad throughput and increasing numbers in the "Retr= " field, also the "netstat -sp tcp" shows retransmitted pack= ets without SACK recovery episodes (SACK is enabled by default). Then, you = are likely hitting the problem I described, and the root cause is the TX qu= eue drops. The tcpdump trace file won't show any packet retransmissions= and the peer won't be aware of packet loss, as this is a local problem= .<br></div><div><br></div><div>cc@s1:~ % netstat -sp tcp | egrep "tcp:= |retrans|SACK"<br>tcp:<br><span style=3D"background-color:rgb(255,255,= 0)">139 data packets (300416 bytes) retransmitted =C2=A0 =C2=A0 =C2=A0 <= <</span><br>0 data packets unnecessarily retransmitted<br>3 retransmit t= imeouts<br>0 retransmitted<br><span style=3D"background-color:rgb(255,255,0= )">0 SACK recovery episodes =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0<<</span><br>0 segment rexmits in SACK recovery episodes= <br>0 byte rexmits in SACK recovery episodes<br>0 SACK options (SACK blocks= ) received<br>0 SACK options (SACK blocks) sent<br>0 SACK retransmissions l= ost<br>0 SACK scoreboard overflow</div><div><br></div><div>Local packet dro= ps due to TX full can be found from this command, for example<br></div><div= ><span class=3D"gmail-prismjs gmail-css-1vd0zfg"><code class=3D"gmail-langu= age-" style=3D"white-space:pre"><span class=3D"gmail-">cc@s1:~ % netstat -i= -I bce4 -nd </span></code></span><span class=3D"gmail-prismjs gmail-css-1vd0zfg"><code = class=3D"gmail-language-" style=3D"white-space:pre">Name Mtu Network = Address Ipkts Ierrs Idrop Opkts Oerrs Coll Drop </code></span><span class=3D"gmail-prismjs gmail-css-1vd0zfg"><code class= =3D"gmail-language-" style=3D"white-space:pre"><span style=3D"background-co= lor:rgb(255,255,0)">bce4 1500 <Link#5> 00:10:18:56:94:d4 286= 184 0 0 148079 0 0 54 <<</span> </code></span><span class=3D"gmail-prismjs gmail-css-1vd0zfg"><code class= =3D"gmail-language-" style=3D"white-space:pre">bce4 - <a href=3D"http:= //10.1.1.0/24">10.1.1.0/24</a> 10.1.1.2 286183 - - 5= 82111 - - -=20 </code></span><span class=3D"gmail-prismjs gmail-css-1vd0zfg"><code class= =3D"gmail-language-" style=3D"white-space:pre">cc@s1:~ % </code></span></di= v><div><div><div dir=3D"ltr" class=3D"gmail_signature"><div dir=3D"ltr"><di= v><br></div><div>Hope the above stats can help you better root cause analys= is. Also, increasing the TX queue size is a workaround and is specific to a= particular NIC. But you get the idea.<br></div><div><br></div>Best Regards= ,<div>Cheng Cui</div></div></div></div><br></div></div><br><div class=3D"gm= ail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Mon, Jul 3, 2023 at 11:= 34=E2=80=AFAM Murali Krishnamurthy <<a href=3D"mailto:muralik1@vmware.co= m">muralik1@vmware.com</a>> wrote:<br></div><blockquote class=3D"gmail_q= uote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,2= 04);padding-left:1ex"><div class=3D"msg7691862665969819310"> <div style=3D"overflow-wrap: break-word;" lang=3D"EN-IN"> <div class=3D"m_7691862665969819310WordSection1"> <p class=3D"MsoNormal"><span>Cheng,<u></u><u></u></span></p> <p class=3D"MsoNormal"><span><u></u>=C2=A0<u></u></span></p> <p class=3D"MsoNormal"><span>Thanks for your inputs.<u></u><u></u></span></= p> <p class=3D"MsoNormal"><span><u></u>=C2=A0<u></u></span></p> <p class=3D"MsoNormal"><span>Sorry, I am not familiar with this area.<u></u= ><u></u></span></p> <p class=3D"MsoNormal"><span><u></u>=C2=A0<u></u></span></p> <p class=3D"MsoNormal"><span>Few queries,<u></u><u></u></span></p> <p class=3D"MsoNormal"><span><u></u>=C2=A0<u></u></span></p> <p class=3D"MsoNormal">=E2=80=9CI believe the default values for bce tx/rx = pages are 2. And I happened to find<br> this problem before that when the tx queue was full, it would not enqueue p= ackets<br> and started return errors.<br> And this error was misunderstood by the TCP layer as retransmission.=E2=80= =9D<u></u><u></u></p> <p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p> <p class=3D"MsoNormal">Could you please elaborate what is misunderstood by = TCP here? Loss of packets should anyway lead to retransmissions.<u></u><u><= /u></p> <p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p> <p class=3D"MsoNormal">Could you point to some stats where I can see such d= rops due to queue getting full?<u></u><u></u></p> <p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p> <p class=3D"MsoNormal">I have a vmx interface in my VM and I have attached = the screenshot of ifconfig command for that.<u></u><u></u></p> <p class=3D"MsoNormal">Anything we can understand from that?<u></u><u></u><= /p> <p class=3D"MsoNormal">Will your suggestion of increasing tx_pages=3D4 and = rx_pages=3D4 work for this ? If so, I assume names would be hw.vmx.tx_pages= =3D4 and hw.vmx.rx_pages ?<u></u><u></u></p> <p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p> <p class=3D"MsoNormal">Regards<u></u><u></u></p> <p class=3D"MsoNormal">Murali <u></u><u></u></p> <p class=3D"MsoNormal"><span><u></u>=C2=A0<u></u></span></p> <p class=3D"MsoNormal"><span><u></u>=C2=A0<u></u></span></p> <div id=3D"m_7691862665969819310mail-editor-reference-message-container"> <div> <div style=3D"border-color:rgb(181,196,223) currentcolor currentcolor;borde= r-style:solid none none;border-width:1pt medium medium;padding:3pt 0cm 0cm"= > <p class=3D"MsoNormal" style=3D"margin-bottom:12pt"><b><span style=3D"font-= size:12pt;color:black">From: </span></b><span style=3D"font-size:12pt;color:black">Cheng Cui <<a href= =3D"mailto:cc@freebsd.org" target=3D"_blank">cc@freebsd.org</a>><br> <b>Date: </b>Friday, 30 June 2023 at 10:02 PM<br> <b>To: </b>Murali Krishnamurthy <<a href=3D"mailto:muralik1@vmware.com" = target=3D"_blank">muralik1@vmware.com</a>><br> <b>Cc: </b>Scheffenegger, Richard <<a href=3D"mailto:rscheff@freebsd.org= " target=3D"_blank">rscheff@freebsd.org</a>>, FreeBSD Transport <<a h= ref=3D"mailto:freebsd-transport@freebsd.org" target=3D"_blank">freebsd-tran= sport@freebsd.org</a>><br> <b>Subject: </b>Re: FreeBSD TCP (with iperf3) comparison with Linux<u></u><= u></u></span></p> </div> <table style=3D"width:100%" width=3D"100%" cellspacing=3D"0" cellpadding=3D= "0" border=3D"0" align=3D"left"> <tbody> <tr> <td style=3D"background:rgb(253,197,145);padding:3.75pt 1.5pt"></td> <td style=3D"width:100%;background:rgb(255,248,240);padding:3.75pt 3pt 3.75= pt 9pt" width=3D"100%"> <div> <p class=3D"MsoNormal"> <b><span style=3D"font-size:10.5pt;font-family:Metropolis;color:rgb(68,68,6= 8)">!! External Email</span></b><span style=3D"font-size:10.5pt;font-family= :Metropolis;color:black"> </span><span style=3D"font-size:10.5pt;font-family:Metropolis"><u></u><u></= u></span></p> </div> </td> </tr> </tbody> </table> <div> <div> <p class=3D"MsoNormal">I used an emulation testbed from Emulab.net with Dum= mynet traffic shaper adding 100ms RTT<br> between two nodes, the link capacity is 1Gbps and both nodes are using free= bsd13.2.<br> <br> cc@s1:~ % ping -c 3 r1<br> PING r1-link1 (10.1.1.3): 56 data bytes<br> 64 bytes from <a href=3D"http://10.1.1.3/" target=3D"_blank">10.1.1.3</a>: = icmp_seq=3D0 ttl=3D64 time=3D100.091 ms<br> 64 bytes from <a href=3D"http://10.1.1.3/" target=3D"_blank">10.1.1.3</a>: = icmp_seq=3D1 ttl=3D64 time=3D99.995 ms<br> 64 bytes from <a href=3D"http://10.1.1.3/" target=3D"_blank">10.1.1.3</a>: = icmp_seq=3D2 ttl=3D64 time=3D99.979 ms<br> <br> --- r1-link1 ping statistics ---<br> 3 packets transmitted, 3 packets received, 0.0% packet loss<br> round-trip min/avg/max/stddev =3D 99.979/100.022/100.091/0.049 ms<br> <br> <br> cc@s1:~ % iperf3 -c r1 -t 10 -i 1 -C cubic<br> Connecting to host r1, port 5201<br> [ =C2=A05] local 10.1.1.2 port 56089 connected to 10.1.1.3 port 5201<br> [ ID] Interval =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Transfer =C2=A0 =C2=A0 Bi= trate =C2=A0 =C2=A0 =C2=A0 =C2=A0 Retr =C2=A0Cwnd<br> [ =C2=A05] =C2=A0 0.00-1.00 =C2=A0 sec =C2=A04.19 MBytes =C2=A035.2 Mbits/s= ec =C2=A0 =C2=A00 =C2=A0 1.24 MBytes =C2=A0 =C2=A0 =C2=A0 <br> [ =C2=A05] =C2=A0 1.00-2.00 =C2=A0 sec =C2=A056.5 MBytes =C2=A0 474 Mbits/s= ec =C2=A0 =C2=A06 =C2=A0 2.41 MBytes =C2=A0 =C2=A0 =C2=A0 <br> [ =C2=A05] =C2=A0 2.00-3.00 =C2=A0 sec =C2=A058.6 MBytes =C2=A0 492 Mbits/s= ec =C2=A0 18 =C2=A0 7.17 MBytes =C2=A0 =C2=A0 =C2=A0 <br> [ =C2=A05] =C2=A0 3.00-4.00 =C2=A0 sec =C2=A065.6 MBytes =C2=A0 550 Mbits/s= ec =C2=A0 14 =C2=A0 =C2=A0606 KBytes =C2=A0 =C2=A0 =C2=A0 <br> [ =C2=A05] =C2=A0 4.00-5.00 =C2=A0 sec =C2=A060.8 MBytes =C2=A0 510 Mbits/s= ec =C2=A0 18 =C2=A0 7.22 MBytes =C2=A0 =C2=A0 =C2=A0 <br> [ =C2=A05] =C2=A0 5.00-6.00 =C2=A0 sec =C2=A062.1 MBytes =C2=A0 521 Mbits/s= ec =C2=A0 12 =C2=A0 7.86 MBytes =C2=A0 =C2=A0 =C2=A0 <br> [ =C2=A05] =C2=A0 6.00-7.00 =C2=A0 sec =C2=A060.9 MBytes =C2=A0 512 Mbits/s= ec =C2=A0 14 =C2=A0 3.43 MBytes =C2=A0 =C2=A0 =C2=A0 <br> [ =C2=A05] =C2=A0 7.00-8.00 =C2=A0 sec =C2=A062.8 MBytes =C2=A0 527 Mbits/s= ec =C2=A0 16 =C2=A0 =C2=A0372 KBytes =C2=A0 =C2=A0 =C2=A0 <br> [ =C2=A05] =C2=A0 8.00-9.00 =C2=A0 sec =C2=A059.3 MBytes =C2=A0 497 Mbits/s= ec =C2=A0 14 =C2=A0 1.77 MBytes =C2=A0 =C2=A0 =C2=A0 <br> [ =C2=A05] =C2=A0 9.00-10.00 =C2=A0sec =C2=A057.0 MBytes =C2=A0 477 Mbits/s= ec =C2=A0 18 =C2=A0 7.13 MBytes =C2=A0 =C2=A0 =C2=A0 <br> - - - - - - - - - - - - - - - - - - - - - - - - -<br> [ ID] Interval =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Transfer =C2=A0 =C2=A0 Bi= trate =C2=A0 =C2=A0 =C2=A0 =C2=A0 Retr<br> [ =C2=A05] =C2=A0 0.00-10.00 =C2=A0sec =C2=A0 548 MBytes =C2=A0 459 Mbits/s= ec =C2=A0130 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 sender<br> [ =C2=A05] =C2=A0 0.00-10.10 =C2=A0sec =C2=A0 540 MBytes =C2=A0 449 Mbits/s= ec =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0receiver<b= r> <br> iperf Done.<br> <br> cc@s1:~ % ifconfig bce4<br> bce4: flags=3D8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 m= tu 1500<br> options=3Dc01bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWC= SUM,TSO4,VLAN_HWTSO,LINKSTATE><br> ether 00:10:18:56:94:d4<br> inet 10.1.1.2 netmask 0xffffff00 broadcast 10.1.1.255<br> media: Ethernet 1000baseT <full-duplex><br> status: active<br> nd6 options=3D29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL><br> <br> I believe the default values for bce tx/rx pages are 2. And I happened to f= ind<br> this problem before that when the tx queue was full, it would not enqueue p= ackets<br> and started return errors.<br> And this error was misunderstood by the TCP layer as retransmission.<br> <br> After adding hw.bce.tx_pages=3D4 and hw.bce.rx_pages=3D4 in /boot/loader.co= nf and reboot:<br> <br> cc@s1:~ % iperf3 -c r1 -t 10 -i 1 -C cubic<br> Connecting to host r1, port 5201<br> [ =C2=A05] local 10.1.1.2 port 20478 connected to 10.1.1.3 port 5201<br> [ ID] Interval =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Transfer =C2=A0 =C2=A0 Bi= trate =C2=A0 =C2=A0 =C2=A0 =C2=A0 Retr =C2=A0Cwnd<br> [ =C2=A05] =C2=A0 0.00-1.00 =C2=A0 sec =C2=A04.15 MBytes =C2=A034.8 Mbits/s= ec =C2=A0 =C2=A00 =C2=A0 1.17 MBytes =C2=A0 =C2=A0 =C2=A0 <br> [ =C2=A05] =C2=A0 1.00-2.00 =C2=A0 sec =C2=A083.1 MBytes =C2=A0 697 Mbits/s= ec =C2=A0 =C2=A00 =C2=A0 12.2 MBytes =C2=A0 =C2=A0 =C2=A0 <br> [ =C2=A05] =C2=A0 2.00-3.00 =C2=A0 sec =C2=A0 112 MBytes =C2=A0 939 Mbits/s= ec =C2=A0 =C2=A00 =C2=A0 12.2 MBytes =C2=A0 =C2=A0 =C2=A0 <br> [ =C2=A05] =C2=A0 3.00-4.00 =C2=A0 sec =C2=A0 113 MBytes =C2=A0 944 Mbits/s= ec =C2=A0 =C2=A00 =C2=A0 12.2 MBytes =C2=A0 =C2=A0 =C2=A0 <br> [ =C2=A05] =C2=A0 4.00-5.00 =C2=A0 sec =C2=A0 112 MBytes =C2=A0 940 Mbits/s= ec =C2=A0 =C2=A00 =C2=A0 12.2 MBytes =C2=A0 =C2=A0 =C2=A0 <br> [ =C2=A05] =C2=A0 5.00-6.00 =C2=A0 sec =C2=A0 112 MBytes =C2=A0 942 Mbits/s= ec =C2=A0 =C2=A00 =C2=A0 12.2 MBytes =C2=A0 =C2=A0 =C2=A0 <br> [ =C2=A05] =C2=A0 6.00-7.00 =C2=A0 sec =C2=A0 112 MBytes =C2=A0 938 Mbits/s= ec =C2=A0 =C2=A00 =C2=A0 12.2 MBytes =C2=A0 =C2=A0 =C2=A0 <br> [ =C2=A05] =C2=A0 7.00-8.00 =C2=A0 sec =C2=A0 113 MBytes =C2=A0 944 Mbits/s= ec =C2=A0 =C2=A00 =C2=A0 12.2 MBytes =C2=A0 =C2=A0 =C2=A0 <br> [ =C2=A05] =C2=A0 8.00-9.00 =C2=A0 sec =C2=A0 112 MBytes =C2=A0 938 Mbits/s= ec =C2=A0 =C2=A00 =C2=A0 12.2 MBytes =C2=A0 =C2=A0 =C2=A0 <br> [ =C2=A05] =C2=A0 9.00-10.00 =C2=A0sec =C2=A0 113 MBytes =C2=A0 947 Mbits/s= ec =C2=A0 =C2=A00 =C2=A0 12.2 MBytes =C2=A0 =C2=A0 =C2=A0 <br> - - - - - - - - - - - - - - - - - - - - - - - - -<br> [ ID] Interval =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Transfer =C2=A0 =C2=A0 Bi= trate =C2=A0 =C2=A0 =C2=A0 =C2=A0 Retr<br> [ =C2=A05] =C2=A0 0.00-10.00 =C2=A0sec =C2=A0 985 MBytes =C2=A0 826 Mbits/s= ec =C2=A0 =C2=A00 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 sender<br> [ =C2=A05] =C2=A0 0.00-10.11 =C2=A0sec =C2=A0 982 MBytes =C2=A0 815 Mbits/s= ec =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0receiver<b= r> <br> iperf Done. <u></u><u></u></p> <div> <div> <div> <div> <p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p> </div> <p class=3D"MsoNormal">Best Regards, <u></u><u></u></p> <div> <p class=3D"MsoNormal">Cheng Cui<u></u><u></u></p> </div> </div> </div> </div> <p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p> </div> <p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p> <div> <div> <p class=3D"MsoNormal">On Fri, Jun 30, 2023 at 12:26=E2=80=AFPM Murali Kris= hnamurthy <<a href=3D"mailto:muralik1@vmware.com" target=3D"_blank">mura= lik1@vmware.com</a>> wrote:<u></u><u></u></p> </div> <blockquote style=3D"border-color:currentcolor currentcolor currentcolor rg= b(204,204,204);border-style:none none none solid;border-width:medium medium= medium 1pt;padding:0cm 0cm 0cm 6pt;margin-left:4.8pt;margin-right:0cm"> <div> <div> <div> <p class=3D"MsoNormal">Richard,<u></u><u></u></p> <p class=3D"MsoNormal">=C2=A0<u></u><u></u></p> <p class=3D"MsoNormal">Appreciate the useful inputs you have shared so far.= Will try to figure out regarding packet drops.<u></u><u></u></p> <p class=3D"MsoNormal">=C2=A0<u></u><u></u></p> <p class=3D"MsoNormal">Regarding HyStart, I see even BSD code base has supp= ort for this. May I know by when can we see that in an release, if not alre= ady available ?<u></u><u></u></p> <p class=3D"MsoNormal">=C2=A0<u></u><u></u></p> <p class=3D"MsoNormal">Regarding this point : <i>=E2=80=9CSwitching to other cc modules may give some more insights. But = again, I suspect that momentary (microsecond) burstiness of BSD may be caus= ing this significantly higher loss rate.=E2=80=9D</i><u></u><u></u></p> <p class=3D"MsoNormal">Is there some info somewhere where I can understand = more on this in detail?<u></u><u></u></p> <p class=3D"MsoNormal">=C2=A0<u></u><u></u></p> <p class=3D"MsoNormal">Regards<u></u><u></u></p> <p class=3D"MsoNormal">Murali<u></u><u></u></p> <p class=3D"MsoNormal">=C2=A0<u></u><u></u></p> <p class=3D"MsoNormal">=C2=A0<u></u><u></u></p> <p class=3D"MsoNormal" style=3D"margin-bottom:12pt">On 30/06/23, 9:35 PM, &= quot;<a href=3D"mailto:owner-freebsd-transport@freebsd.org" target=3D"_blan= k">owner-freebsd-transport@freebsd.org</a>" <<a href=3D"mailto:owne= r-freebsd-transport@freebsd.org" target=3D"_blank">owner-freebsd-transport@= freebsd.org</a>> wrote:<u></u><u></u></p> <div> <p class=3D"MsoNormal">=C2=A0<u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">Hi Murali,<u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">=C2=A0<u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">> Q. Since you mention two hypervisors - what is = the phyiscal network topology in between these two servers? What theoretica= l link rates would be attainable?<u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">>=C2=A0=C2=A0<u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">> Here is the topology<u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">> <u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">> Iperf end points are on 2 different hypervisors= . <u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">> <u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">>=C2=A0=C2=A0=E2=80=94=E2=80=94=E2=80=94=E2=80=94= =E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2= =80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80= =94=E2=80=94=E2=80=94=E2=80=94=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =E2=80=94=E2= =80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=E2=80=94=E2= =80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94-=E2=80=94=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0<u></u><u></u><= /p> </div> <div> <p class=3D"MsoNormal">> | Linux VM1 |=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0|=C2=A0=C2=A0BSD 13 VM 1=C2=A0=C2=A0|=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |= =C2=A0=C2=A0Linux VM2=C2=A0=C2=A0|=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0|=C2=A0=C2=A0BSD 1= 3 VM 2=C2=A0=C2=A0|<u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">> |___________|=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0|_ ____ ____ ___ |=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0|___________= |=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0|_ ____ ____ ___ |<u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">> |=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0|=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 |=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |<u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 |=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0|=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |<u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">> =E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94= =E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2= =80=94=E2=80=94=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=E2=80=94=E2=80=94= =E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2= =80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94<u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">> |=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 ESX Hypervisor 1=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0|=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 10G link connected via L2 Switch=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0|=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 ESX Hypervisor=C2=A0=C2=A02=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0|<u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">> |=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 |=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2= =80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80= =94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94= =C2=A0=C2=A0 |=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0|<u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">> |=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94= =E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2= =80=94 |=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=E2=80=94=E2=80=94=E2= =80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80= =94=E2=80=94=E2=80=94=E2=80=94|<u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">> <u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">> <u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">> Nic is of 10G capacity on both ESX server and i= t has below config.<u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">=C2=A0<u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">=C2=A0<u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">So, when both VMs run on the same Hypervisor, maybe = with another VM to simulate the 100ms delay, can you attain a lossless base= line scenario?<u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">=C2=A0<u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">=C2=A0<u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">> BDP for 16MB Socket buffer: 16 MB * (1000 ms * = 100ms latency) * 8 bits/ 1024 =3D 1.25 Gbps<u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">> <u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">> So theoretically we should see close to 1.25Gbp= s of Bitrate and we see Linux reaching close to this number.<u></u><u></u><= /p> </div> <div> <p class=3D"MsoNormal">=C2=A0<u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">Under no loss, yes.<u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">=C2=A0<u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">=C2=A0<u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">> But BSD is not able to do that.<u></u><u></u></= p> </div> <div> <p class=3D"MsoNormal">> <u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">> <u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">> Q. Did you run iperf3? Did the transmitting end= point report any retransmissions between Linux or FBSD hosts?<u></u><u></u>= </p> </div> <div> <p class=3D"MsoNormal">> <u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">> Yes, we used iper3. I see Linux doing less numb= er retransmissions compared to BSD. <u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">> On BSD, the best performance was around 600 Mbp= s bitrate and the number of retransmissions for this number seen is around = 32K<u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">> On Linux, the best performance was around 1.15 = Gbps bitrate and the number of retransmissions for this number seen is only= 2K. <u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">> So as you pointed the number of retransmissions= in BSD could be the real issue here.<u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">=C2=A0<u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">There are other cc modules available; but I believe = one major deviation is that Linux can perform mechanisms like hystart; ACKi= ng every packet when the client detects slow start; perform pacing to achieve more uniform packet transmissions.<u></u><u></u>= </p> </div> <div> <p class=3D"MsoNormal">=C2=A0<u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">I think the next step would be to find out, at which= queue those packet discards are coming from (external switch? delay genera= tor? Vswitch? Eth stack inside the VM?)<u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">=C2=A0<u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">Or alternatively, provide your ESX hypervisors with = vastly more link speed, to rule out any L2 induced packet drops - provided = your delay generator is not the source when momentarily overloaded.<u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">=C2=A0<u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">> Is there a way to reduce this packet loss by fi= ne tuning some parameters w.r.t ring buffer or any other areas? <u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">=C2=A0<u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">Finding where these arise (looking at queue and port= counters) would be the next step. But this is not really my specific area = of expertise beyond the high level, vendor independent observations.<u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">=C2=A0<u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">Switching to other cc modules may give some more ins= ights. But again, I suspect that momentary (microsecond) burstiness of BSD = may be causing this significantly higher loss rate.<u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">=C2=A0<u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">TCP RACK would be another option. That stack has pac= ing, more fine-grained timing, the RACK loss recovery mechanisms etc. Maybe= that helps reduce the observed packet drops by iperf, and consequently, yield a higher overall throuhgput.<u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">=C2=A0<u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">=C2=A0<u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">=C2=A0<u></u><u></u></p> </div> <div> <p class=3D"MsoNormal">=C2=A0<u></u><u></u></p> </div> </div> </div> </div> </blockquote> </div> </div> <p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p> <table style=3D"width:100%" width=3D"100%" cellspacing=3D"0" cellpadding=3D= "0" border=3D"0" align=3D"left"> <tbody> <tr> <td style=3D"background:rgb(253,197,145);padding:3.75pt 1.5pt"></td> <td style=3D"width:100%;background:rgb(255,248,240);padding:3.75pt 3pt 3.75= pt 9pt" width=3D"100%"> <div> <p class=3D"MsoNormal"> <b><span style=3D"font-size:10.5pt;font-family:Metropolis;color:rgb(68,68,6= 8)">!! External Email:</span></b><span style=3D"font-size:10.5pt;font-famil= y:Metropolis;color:black"> This email originated from outside of the organi= zation. Do not click links or open attachments unless you recognize the sender. </span><span style=3D"font-size:10.5pt;fo= nt-family:Metropolis"><u></u><u></u></span></p> </div> </td> </tr> </tbody> </table> <p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p> </div> </div> </div> </div> </div></blockquote></div></div> --000000000000139feb05ff9af368--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAGaXuiLq-ia=9ci=81GnnW2FUTdpPrVa_F2nVm6a6rmLsHhBRw>