Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 3 Jul 2023 16:24:00 -0400
From:      Cheng Cui <cc@freebsd.org>
To:        Murali Krishnamurthy <muralik1@vmware.com>
Cc:        "Scheffenegger, Richard" <rscheff@freebsd.org>, FreeBSD Transport <freebsd-transport@freebsd.org>
Subject:   Re: FreeBSD TCP (with iperf3) comparison with Linux
Message-ID:  <CAGaXuiLq-ia=9ci=81GnnW2FUTdpPrVa_F2nVm6a6rmLsHhBRw@mail.gmail.com>
In-Reply-To: <PH0PR05MB10064A16ADEE4B35DF6E5DF0FFB29A@PH0PR05MB10064.namprd05.prod.outlook.com>
References:  <53aff274-b1a8-0730-6971-2755c7e7b688@freebsd.org> <PH0PR05MB100642BD041192E6B7EBDBFE1FB2AA@PH0PR05MB10064.namprd05.prod.outlook.com> <CAGaXuiJEsgRXo12iVW_9C-VzT%2BF3E3CuYa-es3qJ9w8n3yrAwg@mail.gmail.com> <PH0PR05MB10064A16ADEE4B35DF6E5DF0FFB29A@PH0PR05MB10064.namprd05.prod.outlook.com>

next in thread | previous in thread | raw e-mail | index | archive | help
--000000000000139feb05ff9af368
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

I see. Sorry about a straight description in my previous email.

If you found the iperf3 report shows bad throughput and increasing numbers
in the "Retr" field, also the "netstat -sp tcp" shows retransmitted packets
without SACK recovery episodes (SACK is enabled by default). Then, you are
likely hitting the problem I described, and the root cause is the TX queue
drops. The tcpdump trace file won't show any packet retransmissions and the
peer won't be aware of packet loss, as this is a local problem.

cc@s1:~ % netstat -sp tcp | egrep "tcp:|retrans|SACK"
tcp:
139 data packets (300416 bytes) retransmitted       <<
0 data packets unnecessarily retransmitted
3 retransmit timeouts
0 retransmitted
0 SACK recovery episodes                                      <<
0 segment rexmits in SACK recovery episodes
0 byte rexmits in SACK recovery episodes
0 SACK options (SACK blocks) received
0 SACK options (SACK blocks) sent
0 SACK retransmissions lost
0 SACK scoreboard overflow

Local packet drops due to TX full can be found from this command, for
example
cc@s1:~ % netstat -i -I bce4 -nd Name Mtu Network Address Ipkts Ierrs Idrop
Opkts Oerrs Coll Drop bce4 1500 <Link#5> 00:10:18:56:94:d4 286184 0 0
148079 0 0 54 << bce4 - 10.1.1.0/24 10.1.1.2 286183 - - 582111 - - - cc@s1:=
~
%

Hope the above stats can help you better root cause analysis. Also,
increasing the TX queue size is a workaround and is specific to a
particular NIC. But you get the idea.

Best Regards,
Cheng Cui


On Mon, Jul 3, 2023 at 11:34=E2=80=AFAM Murali Krishnamurthy <muralik1@vmwa=
re.com>
wrote:

> Cheng,
>
>
>
> Thanks for your inputs.
>
>
>
> Sorry, I am not familiar with this area.
>
>
>
> Few queries,
>
>
>
> =E2=80=9CI believe the default values for bce tx/rx pages are 2. And I ha=
ppened to
> find
> this problem before that when the tx queue was full, it would not enqueue
> packets
> and started return errors.
> And this error was misunderstood by the TCP layer as retransmission.=E2=
=80=9D
>
>
>
> Could you please elaborate what is misunderstood by TCP here? Loss of
> packets should anyway lead to retransmissions.
>
>
>
> Could you point to some stats where I can see such drops due to queue
> getting full?
>
>
>
> I have a vmx interface in my VM and I have attached the screenshot of
> ifconfig command for that.
>
> Anything we can understand from that?
>
> Will your suggestion of increasing tx_pages=3D4 and rx_pages=3D4 work for=
 this
> ? If so, I assume names would be hw.vmx.tx_pages=3D4 and hw.vmx.rx_pages =
?
>
>
>
> Regards
>
> Murali
>
>
>
>
>
> *From: *Cheng Cui <cc@freebsd.org>
> *Date: *Friday, 30 June 2023 at 10:02 PM
> *To: *Murali Krishnamurthy <muralik1@vmware.com>
> *Cc: *Scheffenegger, Richard <rscheff@freebsd.org>, FreeBSD Transport <
> freebsd-transport@freebsd.org>
> *Subject: *Re: FreeBSD TCP (with iperf3) comparison with Linux
>
> *!! External Email*
>
> I used an emulation testbed from Emulab.net with Dummynet traffic shaper
> adding 100ms RTT
> between two nodes, the link capacity is 1Gbps and both nodes are using
> freebsd13.2.
>
> cc@s1:~ % ping -c 3 r1
> PING r1-link1 (10.1.1.3): 56 data bytes
> 64 bytes from 10.1.1.3: icmp_seq=3D0 ttl=3D64 time=3D100.091 ms
> 64 bytes from 10.1.1.3: icmp_seq=3D1 ttl=3D64 time=3D99.995 ms
> 64 bytes from 10.1.1.3: icmp_seq=3D2 ttl=3D64 time=3D99.979 ms
>
> --- r1-link1 ping statistics ---
> 3 packets transmitted, 3 packets received, 0.0% packet loss
> round-trip min/avg/max/stddev =3D 99.979/100.022/100.091/0.049 ms
>
>
> cc@s1:~ % iperf3 -c r1 -t 10 -i 1 -C cubic
> Connecting to host r1, port 5201
> [  5] local 10.1.1.2 port 56089 connected to 10.1.1.3 port 5201
> [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
> [  5]   0.00-1.00   sec  4.19 MBytes  35.2 Mbits/sec    0   1.24 MBytes
>
> [  5]   1.00-2.00   sec  56.5 MBytes   474 Mbits/sec    6   2.41 MBytes
>
> [  5]   2.00-3.00   sec  58.6 MBytes   492 Mbits/sec   18   7.17 MBytes
>
> [  5]   3.00-4.00   sec  65.6 MBytes   550 Mbits/sec   14    606 KBytes
>
> [  5]   4.00-5.00   sec  60.8 MBytes   510 Mbits/sec   18   7.22 MBytes
>
> [  5]   5.00-6.00   sec  62.1 MBytes   521 Mbits/sec   12   7.86 MBytes
>
> [  5]   6.00-7.00   sec  60.9 MBytes   512 Mbits/sec   14   3.43 MBytes
>
> [  5]   7.00-8.00   sec  62.8 MBytes   527 Mbits/sec   16    372 KBytes
>
> [  5]   8.00-9.00   sec  59.3 MBytes   497 Mbits/sec   14   1.77 MBytes
>
> [  5]   9.00-10.00  sec  57.0 MBytes   477 Mbits/sec   18   7.13 MBytes
>
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bitrate         Retr
> [  5]   0.00-10.00  sec   548 MBytes   459 Mbits/sec  130
> sender
> [  5]   0.00-10.10  sec   540 MBytes   449 Mbits/sec
>  receiver
>
> iperf Done.
>
> cc@s1:~ % ifconfig bce4
> bce4: flags=3D8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1=
500
>
> options=3Dc01bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCS=
UM,TSO4,VLAN_HWTSO,LINKSTATE>
> ether 00:10:18:56:94:d4
> inet 10.1.1.2 netmask 0xffffff00 broadcast 10.1.1.255
> media: Ethernet 1000baseT <full-duplex>
> status: active
> nd6 options=3D29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
>
> I believe the default values for bce tx/rx pages are 2. And I happened to
> find
> this problem before that when the tx queue was full, it would not enqueue
> packets
> and started return errors.
> And this error was misunderstood by the TCP layer as retransmission.
>
> After adding hw.bce.tx_pages=3D4 and hw.bce.rx_pages=3D4 in /boot/loader.=
conf
> and reboot:
>
> cc@s1:~ % iperf3 -c r1 -t 10 -i 1 -C cubic
> Connecting to host r1, port 5201
> [  5] local 10.1.1.2 port 20478 connected to 10.1.1.3 port 5201
> [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
> [  5]   0.00-1.00   sec  4.15 MBytes  34.8 Mbits/sec    0   1.17 MBytes
>
> [  5]   1.00-2.00   sec  83.1 MBytes   697 Mbits/sec    0   12.2 MBytes
>
> [  5]   2.00-3.00   sec   112 MBytes   939 Mbits/sec    0   12.2 MBytes
>
> [  5]   3.00-4.00   sec   113 MBytes   944 Mbits/sec    0   12.2 MBytes
>
> [  5]   4.00-5.00   sec   112 MBytes   940 Mbits/sec    0   12.2 MBytes
>
> [  5]   5.00-6.00   sec   112 MBytes   942 Mbits/sec    0   12.2 MBytes
>
> [  5]   6.00-7.00   sec   112 MBytes   938 Mbits/sec    0   12.2 MBytes
>
> [  5]   7.00-8.00   sec   113 MBytes   944 Mbits/sec    0   12.2 MBytes
>
> [  5]   8.00-9.00   sec   112 MBytes   938 Mbits/sec    0   12.2 MBytes
>
> [  5]   9.00-10.00  sec   113 MBytes   947 Mbits/sec    0   12.2 MBytes
>
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bitrate         Retr
> [  5]   0.00-10.00  sec   985 MBytes   826 Mbits/sec    0
> sender
> [  5]   0.00-10.11  sec   982 MBytes   815 Mbits/sec
>  receiver
>
> iperf Done.
>
>
>
> Best Regards,
>
> Cheng Cui
>
>
>
>
>
> On Fri, Jun 30, 2023 at 12:26=E2=80=AFPM Murali Krishnamurthy <muralik1@v=
mware.com>
> wrote:
>
> Richard,
>
>
>
> Appreciate the useful inputs you have shared so far. Will try to figure
> out regarding packet drops.
>
>
>
> Regarding HyStart, I see even BSD code base has support for this. May I
> know by when can we see that in an release, if not already available ?
>
>
>
> Regarding this point : *=E2=80=9CSwitching to other cc modules may give s=
ome more
> insights. But again, I suspect that momentary (microsecond) burstiness of
> BSD may be causing this significantly higher loss rate.=E2=80=9D*
>
> Is there some info somewhere where I can understand more on this in detai=
l?
>
>
>
> Regards
>
> Murali
>
>
>
>
>
> On 30/06/23, 9:35 PM, "owner-freebsd-transport@freebsd.org" <
> owner-freebsd-transport@freebsd.org> wrote:
>
>
>
> Hi Murali,
>
>
>
> > Q. Since you mention two hypervisors - what is the phyiscal network
> topology in between these two servers? What theoretical link rates would =
be
> attainable?
>
> >
>
> > Here is the topology
>
> >
>
> > Iperf end points are on 2 different hypervisors.
>
> >
>
> >  =E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=
=94=E2=80=94=E2=80=94=E2=80=94        =E2=80=94=E2=80=94=E2=80=94=E2=80=94=
=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=
=80=94=E2=80=94=E2=80=94=E2=80=94
> =E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94                =E2=
=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94-=E2=80=94
>
> > | Linux VM1 |      |  BSD 13 VM
> 1  |
> |  Linux VM2  |                |  BSD 13 VM 2  |
>
> > |___________|      |_ ____ ____ ___
> |                                                                        =
          |___________
> |                |_ ____ ____ ___ |
>
> > |          |
> |
> |                                   |
>
> >
> |                          |
> |                                   |
>
> >
> =E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=
=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94            =
                                                                      =E2=
=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=
=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94
>
> > |           ESX Hypervisor 1          |           10G link connected vi=
a
> L2 Switch                      |           ESX Hypervisor  2            |
>
> > |
> |=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=
=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=
=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94
> |                                                |
>
> > |=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=
=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94
> |
> |=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=
=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94|
>
> >
>
> >
>
> > Nic is of 10G capacity on both ESX server and it has below config.
>
>
>
>
>
> So, when both VMs run on the same Hypervisor, maybe with another VM to
> simulate the 100ms delay, can you attain a lossless baseline scenario?
>
>
>
>
>
> > BDP for 16MB Socket buffer: 16 MB * (1000 ms * 100ms latency) * 8 bits/
> 1024 =3D 1.25 Gbps
>
> >
>
> > So theoretically we should see close to 1.25Gbps of Bitrate and we see
> Linux reaching close to this number.
>
>
>
> Under no loss, yes.
>
>
>
>
>
> > But BSD is not able to do that.
>
> >
>
> >
>
> > Q. Did you run iperf3? Did the transmitting endpoint report any
> retransmissions between Linux or FBSD hosts?
>
> >
>
> > Yes, we used iper3. I see Linux doing less number retransmissions
> compared to BSD.
>
> > On BSD, the best performance was around 600 Mbps bitrate and the number
> of retransmissions for this number seen is around 32K
>
> > On Linux, the best performance was around 1.15 Gbps bitrate and the
> number of retransmissions for this number seen is only 2K.
>
> > So as you pointed the number of retransmissions in BSD could be the rea=
l
> issue here.
>
>
>
> There are other cc modules available; but I believe one major deviation i=
s
> that Linux can perform mechanisms like hystart; ACKing every packet when
> the client detects slow start; perform pacing to achieve more uniform
> packet transmissions.
>
>
>
> I think the next step would be to find out, at which queue those packet
> discards are coming from (external switch? delay generator? Vswitch? Eth
> stack inside the VM?)
>
>
>
> Or alternatively, provide your ESX hypervisors with vastly more link
> speed, to rule out any L2 induced packet drops - provided your delay
> generator is not the source when momentarily overloaded.
>
>
>
> > Is there a way to reduce this packet loss by fine tuning some parameter=
s
> w.r.t ring buffer or any other areas?
>
>
>
> Finding where these arise (looking at queue and port counters) would be
> the next step. But this is not really my specific area of expertise beyon=
d
> the high level, vendor independent observations.
>
>
>
> Switching to other cc modules may give some more insights. But again, I
> suspect that momentary (microsecond) burstiness of BSD may be causing thi=
s
> significantly higher loss rate.
>
>
>
> TCP RACK would be another option. That stack has pacing, more fine-graine=
d
> timing, the RACK loss recovery mechanisms etc. Maybe that helps reduce th=
e
> observed packet drops by iperf, and consequently, yield a higher overall
> throuhgput.
>
>
>
>
>
>
>
>
>
>
>
> *!! External Email:* This email originated from outside of the
> organization. Do not click links or open attachments unless you recognize
> the sender.
>
>
>

--000000000000139feb05ff9af368
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr"><div>I see. Sorry about a straight descri=
ption in my previous email. <br></div><div><br></div><div>If you found the =
iperf3 report shows bad throughput and increasing numbers in the &quot;Retr=
&quot; field, also the &quot;netstat -sp tcp&quot; shows retransmitted pack=
ets without SACK recovery episodes (SACK is enabled by default). Then, you =
are likely hitting the problem I described, and the root cause is the TX qu=
eue drops. The tcpdump trace file won&#39;t show any packet retransmissions=
 and the peer won&#39;t be aware of packet loss, as this is a local problem=
.<br></div><div><br></div><div>cc@s1:~ % netstat -sp tcp | egrep &quot;tcp:=
|retrans|SACK&quot;<br>tcp:<br><span style=3D"background-color:rgb(255,255,=
0)">139 data packets (300416 bytes) retransmitted =C2=A0 =C2=A0 =C2=A0 &lt;=
&lt;</span><br>0 data packets unnecessarily retransmitted<br>3 retransmit t=
imeouts<br>0 retransmitted<br><span style=3D"background-color:rgb(255,255,0=
)">0 SACK recovery episodes =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0&lt;&lt;</span><br>0 segment rexmits in SACK recovery episodes=
<br>0 byte rexmits in SACK recovery episodes<br>0 SACK options (SACK blocks=
) received<br>0 SACK options (SACK blocks) sent<br>0 SACK retransmissions l=
ost<br>0 SACK scoreboard overflow</div><div><br></div><div>Local packet dro=
ps due to TX full can be found from this command, for example<br></div><div=
><span class=3D"gmail-prismjs gmail-css-1vd0zfg"><code class=3D"gmail-langu=
age-" style=3D"white-space:pre"><span class=3D"gmail-">cc@s1:~ % netstat -i=
 -I bce4 -nd
</span></code></span><span class=3D"gmail-prismjs gmail-css-1vd0zfg"><code =
class=3D"gmail-language-" style=3D"white-space:pre">Name    Mtu Network    =
   Address              Ipkts Ierrs Idrop    Opkts Oerrs  Coll  Drop
</code></span><span class=3D"gmail-prismjs gmail-css-1vd0zfg"><code class=
=3D"gmail-language-" style=3D"white-space:pre"><span style=3D"background-co=
lor:rgb(255,255,0)">bce4   1500 &lt;Link#5&gt;      00:10:18:56:94:d4   286=
184     0     0   148079     0     0    54   &lt;&lt;</span>
</code></span><span class=3D"gmail-prismjs gmail-css-1vd0zfg"><code class=
=3D"gmail-language-" style=3D"white-space:pre">bce4      - <a href=3D"http:=
//10.1.1.0/24">10.1.1.0/24</a>   10.1.1.2            286183     -     -   5=
82111     -     -     -=20
</code></span><span class=3D"gmail-prismjs gmail-css-1vd0zfg"><code class=
=3D"gmail-language-" style=3D"white-space:pre">cc@s1:~ % </code></span></di=
v><div><div><div dir=3D"ltr" class=3D"gmail_signature"><div dir=3D"ltr"><di=
v><br></div><div>Hope the above stats can help you better root cause analys=
is. Also, increasing the TX queue size is a workaround and is specific to a=
 particular NIC. But you get the idea.<br></div><div><br></div>Best Regards=
,<div>Cheng Cui</div></div></div></div><br></div></div><br><div class=3D"gm=
ail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Mon, Jul 3, 2023 at 11:=
34=E2=80=AFAM Murali Krishnamurthy &lt;<a href=3D"mailto:muralik1@vmware.co=
m">muralik1@vmware.com</a>&gt; wrote:<br></div><blockquote class=3D"gmail_q=
uote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,2=
04);padding-left:1ex"><div class=3D"msg7691862665969819310">





<div style=3D"overflow-wrap: break-word;" lang=3D"EN-IN">
<div class=3D"m_7691862665969819310WordSection1">
<p class=3D"MsoNormal"><span>Cheng,<u></u><u></u></span></p>
<p class=3D"MsoNormal"><span><u></u>=C2=A0<u></u></span></p>
<p class=3D"MsoNormal"><span>Thanks for your inputs.<u></u><u></u></span></=
p>
<p class=3D"MsoNormal"><span><u></u>=C2=A0<u></u></span></p>
<p class=3D"MsoNormal"><span>Sorry, I am not familiar with this area.<u></u=
><u></u></span></p>
<p class=3D"MsoNormal"><span><u></u>=C2=A0<u></u></span></p>
<p class=3D"MsoNormal"><span>Few queries,<u></u><u></u></span></p>
<p class=3D"MsoNormal"><span><u></u>=C2=A0<u></u></span></p>
<p class=3D"MsoNormal">=E2=80=9CI believe the default values for bce tx/rx =
pages are 2. And I happened to find<br>
this problem before that when the tx queue was full, it would not enqueue p=
ackets<br>
and started return errors.<br>
And this error was misunderstood by the TCP layer as retransmission.=E2=80=
=9D<u></u><u></u></p>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
<p class=3D"MsoNormal">Could you please elaborate what is misunderstood by =
TCP here? Loss of packets should anyway lead to retransmissions.<u></u><u><=
/u></p>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
<p class=3D"MsoNormal">Could you point to some stats where I can see such d=
rops due to queue getting full?<u></u><u></u></p>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
<p class=3D"MsoNormal">I have a vmx interface in my VM and I have attached =
the screenshot of ifconfig command for that.<u></u><u></u></p>
<p class=3D"MsoNormal">Anything we can understand from that?<u></u><u></u><=
/p>
<p class=3D"MsoNormal">Will your suggestion of increasing tx_pages=3D4 and =
rx_pages=3D4 work for this ? If so, I assume names would be hw.vmx.tx_pages=
=3D4 and hw.vmx.rx_pages ?<u></u><u></u></p>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
<p class=3D"MsoNormal">Regards<u></u><u></u></p>
<p class=3D"MsoNormal">Murali <u></u><u></u></p>
<p class=3D"MsoNormal"><span><u></u>=C2=A0<u></u></span></p>
<p class=3D"MsoNormal"><span><u></u>=C2=A0<u></u></span></p>
<div id=3D"m_7691862665969819310mail-editor-reference-message-container">
<div>
<div style=3D"border-color:rgb(181,196,223) currentcolor currentcolor;borde=
r-style:solid none none;border-width:1pt medium medium;padding:3pt 0cm 0cm"=
>
<p class=3D"MsoNormal" style=3D"margin-bottom:12pt"><b><span style=3D"font-=
size:12pt;color:black">From:
</span></b><span style=3D"font-size:12pt;color:black">Cheng Cui &lt;<a href=
=3D"mailto:cc@freebsd.org" target=3D"_blank">cc@freebsd.org</a>&gt;<br>
<b>Date: </b>Friday, 30 June 2023 at 10:02 PM<br>
<b>To: </b>Murali Krishnamurthy &lt;<a href=3D"mailto:muralik1@vmware.com" =
target=3D"_blank">muralik1@vmware.com</a>&gt;<br>
<b>Cc: </b>Scheffenegger, Richard &lt;<a href=3D"mailto:rscheff@freebsd.org=
" target=3D"_blank">rscheff@freebsd.org</a>&gt;, FreeBSD Transport &lt;<a h=
ref=3D"mailto:freebsd-transport@freebsd.org" target=3D"_blank">freebsd-tran=
sport@freebsd.org</a>&gt;<br>
<b>Subject: </b>Re: FreeBSD TCP (with iperf3) comparison with Linux<u></u><=
u></u></span></p>
</div>
<table style=3D"width:100%" width=3D"100%" cellspacing=3D"0" cellpadding=3D=
"0" border=3D"0" align=3D"left">
<tbody>
<tr>
<td style=3D"background:rgb(253,197,145);padding:3.75pt 1.5pt"></td>
<td style=3D"width:100%;background:rgb(255,248,240);padding:3.75pt 3pt 3.75=
pt 9pt" width=3D"100%">
<div>
<p class=3D"MsoNormal">
<b><span style=3D"font-size:10.5pt;font-family:Metropolis;color:rgb(68,68,6=
8)">!! External Email</span></b><span style=3D"font-size:10.5pt;font-family=
:Metropolis;color:black">
</span><span style=3D"font-size:10.5pt;font-family:Metropolis"><u></u><u></=
u></span></p>
</div>
</td>
</tr>
</tbody>
</table>
<div>
<div>
<p class=3D"MsoNormal">I used an emulation testbed from Emulab.net with Dum=
mynet traffic shaper adding 100ms RTT<br>
between two nodes, the link capacity is 1Gbps and both nodes are using free=
bsd13.2.<br>
<br>
cc@s1:~ % ping -c 3 r1<br>
PING r1-link1 (10.1.1.3): 56 data bytes<br>
64 bytes from <a href=3D"http://10.1.1.3/" target=3D"_blank">10.1.1.3</a>: =
icmp_seq=3D0 ttl=3D64 time=3D100.091 ms<br>
64 bytes from <a href=3D"http://10.1.1.3/" target=3D"_blank">10.1.1.3</a>: =
icmp_seq=3D1 ttl=3D64 time=3D99.995 ms<br>
64 bytes from <a href=3D"http://10.1.1.3/" target=3D"_blank">10.1.1.3</a>: =
icmp_seq=3D2 ttl=3D64 time=3D99.979 ms<br>
<br>
--- r1-link1 ping statistics ---<br>
3 packets transmitted, 3 packets received, 0.0% packet loss<br>
round-trip min/avg/max/stddev =3D 99.979/100.022/100.091/0.049 ms<br>
<br>
<br>
cc@s1:~ % iperf3 -c r1 -t 10 -i 1 -C cubic<br>
Connecting to host r1, port 5201<br>
[ =C2=A05] local 10.1.1.2 port 56089 connected to 10.1.1.3 port 5201<br>
[ ID] Interval =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Transfer =C2=A0 =C2=A0 Bi=
trate =C2=A0 =C2=A0 =C2=A0 =C2=A0 Retr =C2=A0Cwnd<br>
[ =C2=A05] =C2=A0 0.00-1.00 =C2=A0 sec =C2=A04.19 MBytes =C2=A035.2 Mbits/s=
ec =C2=A0 =C2=A00 =C2=A0 1.24 MBytes =C2=A0 =C2=A0 =C2=A0 <br>
[ =C2=A05] =C2=A0 1.00-2.00 =C2=A0 sec =C2=A056.5 MBytes =C2=A0 474 Mbits/s=
ec =C2=A0 =C2=A06 =C2=A0 2.41 MBytes =C2=A0 =C2=A0 =C2=A0 <br>
[ =C2=A05] =C2=A0 2.00-3.00 =C2=A0 sec =C2=A058.6 MBytes =C2=A0 492 Mbits/s=
ec =C2=A0 18 =C2=A0 7.17 MBytes =C2=A0 =C2=A0 =C2=A0 <br>
[ =C2=A05] =C2=A0 3.00-4.00 =C2=A0 sec =C2=A065.6 MBytes =C2=A0 550 Mbits/s=
ec =C2=A0 14 =C2=A0 =C2=A0606 KBytes =C2=A0 =C2=A0 =C2=A0 <br>
[ =C2=A05] =C2=A0 4.00-5.00 =C2=A0 sec =C2=A060.8 MBytes =C2=A0 510 Mbits/s=
ec =C2=A0 18 =C2=A0 7.22 MBytes =C2=A0 =C2=A0 =C2=A0 <br>
[ =C2=A05] =C2=A0 5.00-6.00 =C2=A0 sec =C2=A062.1 MBytes =C2=A0 521 Mbits/s=
ec =C2=A0 12 =C2=A0 7.86 MBytes =C2=A0 =C2=A0 =C2=A0 <br>
[ =C2=A05] =C2=A0 6.00-7.00 =C2=A0 sec =C2=A060.9 MBytes =C2=A0 512 Mbits/s=
ec =C2=A0 14 =C2=A0 3.43 MBytes =C2=A0 =C2=A0 =C2=A0 <br>
[ =C2=A05] =C2=A0 7.00-8.00 =C2=A0 sec =C2=A062.8 MBytes =C2=A0 527 Mbits/s=
ec =C2=A0 16 =C2=A0 =C2=A0372 KBytes =C2=A0 =C2=A0 =C2=A0 <br>
[ =C2=A05] =C2=A0 8.00-9.00 =C2=A0 sec =C2=A059.3 MBytes =C2=A0 497 Mbits/s=
ec =C2=A0 14 =C2=A0 1.77 MBytes =C2=A0 =C2=A0 =C2=A0 <br>
[ =C2=A05] =C2=A0 9.00-10.00 =C2=A0sec =C2=A057.0 MBytes =C2=A0 477 Mbits/s=
ec =C2=A0 18 =C2=A0 7.13 MBytes =C2=A0 =C2=A0 =C2=A0 <br>
- - - - - - - - - - - - - - - - - - - - - - - - -<br>
[ ID] Interval =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Transfer =C2=A0 =C2=A0 Bi=
trate =C2=A0 =C2=A0 =C2=A0 =C2=A0 Retr<br>
[ =C2=A05] =C2=A0 0.00-10.00 =C2=A0sec =C2=A0 548 MBytes =C2=A0 459 Mbits/s=
ec =C2=A0130 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 sender<br>
[ =C2=A05] =C2=A0 0.00-10.10 =C2=A0sec =C2=A0 540 MBytes =C2=A0 449 Mbits/s=
ec =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0receiver<b=
r>
<br>
iperf Done.<br>
<br>
cc@s1:~ % ifconfig bce4<br>
bce4: flags=3D8843&lt;UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST&gt; metric 0 m=
tu 1500<br>
options=3Dc01bb&lt;RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWC=
SUM,TSO4,VLAN_HWTSO,LINKSTATE&gt;<br>
ether 00:10:18:56:94:d4<br>
inet 10.1.1.2 netmask 0xffffff00 broadcast 10.1.1.255<br>
media: Ethernet 1000baseT &lt;full-duplex&gt;<br>
status: active<br>
nd6 options=3D29&lt;PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL&gt;<br>
<br>
I believe the default values for bce tx/rx pages are 2. And I happened to f=
ind<br>
this problem before that when the tx queue was full, it would not enqueue p=
ackets<br>
and started return errors.<br>
And this error was misunderstood by the TCP layer as retransmission.<br>
<br>
After adding hw.bce.tx_pages=3D4 and hw.bce.rx_pages=3D4 in /boot/loader.co=
nf and reboot:<br>
<br>
cc@s1:~ % iperf3 -c r1 -t 10 -i 1 -C cubic<br>
Connecting to host r1, port 5201<br>
[ =C2=A05] local 10.1.1.2 port 20478 connected to 10.1.1.3 port 5201<br>
[ ID] Interval =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Transfer =C2=A0 =C2=A0 Bi=
trate =C2=A0 =C2=A0 =C2=A0 =C2=A0 Retr =C2=A0Cwnd<br>
[ =C2=A05] =C2=A0 0.00-1.00 =C2=A0 sec =C2=A04.15 MBytes =C2=A034.8 Mbits/s=
ec =C2=A0 =C2=A00 =C2=A0 1.17 MBytes =C2=A0 =C2=A0 =C2=A0 <br>
[ =C2=A05] =C2=A0 1.00-2.00 =C2=A0 sec =C2=A083.1 MBytes =C2=A0 697 Mbits/s=
ec =C2=A0 =C2=A00 =C2=A0 12.2 MBytes =C2=A0 =C2=A0 =C2=A0 <br>
[ =C2=A05] =C2=A0 2.00-3.00 =C2=A0 sec =C2=A0 112 MBytes =C2=A0 939 Mbits/s=
ec =C2=A0 =C2=A00 =C2=A0 12.2 MBytes =C2=A0 =C2=A0 =C2=A0 <br>
[ =C2=A05] =C2=A0 3.00-4.00 =C2=A0 sec =C2=A0 113 MBytes =C2=A0 944 Mbits/s=
ec =C2=A0 =C2=A00 =C2=A0 12.2 MBytes =C2=A0 =C2=A0 =C2=A0 <br>
[ =C2=A05] =C2=A0 4.00-5.00 =C2=A0 sec =C2=A0 112 MBytes =C2=A0 940 Mbits/s=
ec =C2=A0 =C2=A00 =C2=A0 12.2 MBytes =C2=A0 =C2=A0 =C2=A0 <br>
[ =C2=A05] =C2=A0 5.00-6.00 =C2=A0 sec =C2=A0 112 MBytes =C2=A0 942 Mbits/s=
ec =C2=A0 =C2=A00 =C2=A0 12.2 MBytes =C2=A0 =C2=A0 =C2=A0 <br>
[ =C2=A05] =C2=A0 6.00-7.00 =C2=A0 sec =C2=A0 112 MBytes =C2=A0 938 Mbits/s=
ec =C2=A0 =C2=A00 =C2=A0 12.2 MBytes =C2=A0 =C2=A0 =C2=A0 <br>
[ =C2=A05] =C2=A0 7.00-8.00 =C2=A0 sec =C2=A0 113 MBytes =C2=A0 944 Mbits/s=
ec =C2=A0 =C2=A00 =C2=A0 12.2 MBytes =C2=A0 =C2=A0 =C2=A0 <br>
[ =C2=A05] =C2=A0 8.00-9.00 =C2=A0 sec =C2=A0 112 MBytes =C2=A0 938 Mbits/s=
ec =C2=A0 =C2=A00 =C2=A0 12.2 MBytes =C2=A0 =C2=A0 =C2=A0 <br>
[ =C2=A05] =C2=A0 9.00-10.00 =C2=A0sec =C2=A0 113 MBytes =C2=A0 947 Mbits/s=
ec =C2=A0 =C2=A00 =C2=A0 12.2 MBytes =C2=A0 =C2=A0 =C2=A0 <br>
- - - - - - - - - - - - - - - - - - - - - - - - -<br>
[ ID] Interval =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Transfer =C2=A0 =C2=A0 Bi=
trate =C2=A0 =C2=A0 =C2=A0 =C2=A0 Retr<br>
[ =C2=A05] =C2=A0 0.00-10.00 =C2=A0sec =C2=A0 985 MBytes =C2=A0 826 Mbits/s=
ec =C2=A0 =C2=A00 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 sender<br>
[ =C2=A05] =C2=A0 0.00-10.11 =C2=A0sec =C2=A0 982 MBytes =C2=A0 815 Mbits/s=
ec =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0receiver<b=
r>
<br>
iperf Done. <u></u><u></u></p>
<div>
<div>
<div>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<p class=3D"MsoNormal">Best Regards, <u></u><u></u></p>
<div>
<p class=3D"MsoNormal">Cheng Cui<u></u><u></u></p>
</div>
</div>
</div>
</div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
<div>
<div>
<p class=3D"MsoNormal">On Fri, Jun 30, 2023 at 12:26=E2=80=AFPM Murali Kris=
hnamurthy &lt;<a href=3D"mailto:muralik1@vmware.com" target=3D"_blank">mura=
lik1@vmware.com</a>&gt; wrote:<u></u><u></u></p>
</div>
<blockquote style=3D"border-color:currentcolor currentcolor currentcolor rg=
b(204,204,204);border-style:none none none solid;border-width:medium medium=
 medium 1pt;padding:0cm 0cm 0cm 6pt;margin-left:4.8pt;margin-right:0cm">
<div>
<div>
<div>
<p class=3D"MsoNormal">Richard,<u></u><u></u></p>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
<p class=3D"MsoNormal">Appreciate the useful inputs you have shared so far.=
 Will try to figure out regarding packet drops.<u></u><u></u></p>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
<p class=3D"MsoNormal">Regarding HyStart, I see even BSD code base has supp=
ort for this. May I know by when can we see that in an release, if not alre=
ady available ?<u></u><u></u></p>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
<p class=3D"MsoNormal">Regarding this point :
<i>=E2=80=9CSwitching to other cc modules may give some more insights. But =
again, I suspect that momentary (microsecond) burstiness of BSD may be caus=
ing this significantly higher loss rate.=E2=80=9D</i><u></u><u></u></p>
<p class=3D"MsoNormal">Is there some info somewhere where I can understand =
more on this in detail?<u></u><u></u></p>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
<p class=3D"MsoNormal">Regards<u></u><u></u></p>
<p class=3D"MsoNormal">Murali<u></u><u></u></p>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
<p class=3D"MsoNormal" style=3D"margin-bottom:12pt">On 30/06/23, 9:35 PM, &=
quot;<a href=3D"mailto:owner-freebsd-transport@freebsd.org" target=3D"_blan=
k">owner-freebsd-transport@freebsd.org</a>&quot; &lt;<a href=3D"mailto:owne=
r-freebsd-transport@freebsd.org" target=3D"_blank">owner-freebsd-transport@=
freebsd.org</a>&gt;
 wrote:<u></u><u></u></p>
<div>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">Hi Murali,<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">&gt; Q. Since you mention two hypervisors - what is =
the phyiscal network topology in between these two servers? What theoretica=
l link rates would be attainable?<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">&gt;=C2=A0=C2=A0<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">&gt; Here is the topology<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">&gt;
<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">&gt; Iperf end points are on 2 different hypervisors=
.
<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">&gt;
<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">&gt;=C2=A0=C2=A0=E2=80=94=E2=80=94=E2=80=94=E2=80=94=
=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=
=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=
=94=E2=80=94=E2=80=94=E2=80=94=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =E2=80=94=E2=
=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=E2=80=94=E2=
=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94-=E2=80=94=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0<u></u><u></u><=
/p>
</div>
<div>
<p class=3D"MsoNormal">&gt; | Linux VM1 |=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0|=C2=A0=C2=A0BSD 13 VM 1=C2=A0=C2=A0|=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=
=C2=A0=C2=A0Linux VM2=C2=A0=C2=A0|=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0|=C2=A0=C2=A0BSD 1=
3 VM 2=C2=A0=C2=A0|<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">&gt; |___________|=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0|_ ____ ____ ___ |=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0|___________=
 |=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0|_ ____ ____ ___ |<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">&gt; |=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0|=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0 |=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0
 |<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">&gt;=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0 |=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0|=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0
 |<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">&gt; =E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=
=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=
=80=94=E2=80=94=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=E2=80=94=E2=80=94=
=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=
=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">&gt; |=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0 ESX Hypervisor 1=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0|=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0 10G link connected via L2 Switch=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0|=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0 ESX Hypervisor=C2=A0=C2=A02=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0|<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">&gt; |=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0 |=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=
=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=
=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=
=C2=A0=C2=A0 |=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0|<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">&gt; |=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=
=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=
=80=94 |=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=E2=80=94=E2=80=94=E2=
=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=
=94=E2=80=94=E2=80=94=E2=80=94|<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">&gt;
<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">&gt;
<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">&gt; Nic is of 10G capacity on both ESX server and i=
t has below config.<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">So, when both VMs run on the same Hypervisor, maybe =
with another VM to simulate the 100ms delay, can you attain a lossless base=
line scenario?<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">&gt; BDP for 16MB Socket buffer: 16 MB * (1000 ms * =
100ms latency) * 8 bits/ 1024 =3D 1.25 Gbps<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">&gt;
<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">&gt; So theoretically we should see close to 1.25Gbp=
s of Bitrate and we see Linux reaching close to this number.<u></u><u></u><=
/p>
</div>
<div>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">Under no loss, yes.<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">&gt; But BSD is not able to do that.<u></u><u></u></=
p>
</div>
<div>
<p class=3D"MsoNormal">&gt;
<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">&gt;
<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">&gt; Q. Did you run iperf3? Did the transmitting end=
point report any retransmissions between Linux or FBSD hosts?<u></u><u></u>=
</p>
</div>
<div>
<p class=3D"MsoNormal">&gt;
<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">&gt; Yes, we used iper3. I see Linux doing less numb=
er retransmissions compared to BSD.
<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">&gt; On BSD, the best performance was around 600 Mbp=
s bitrate and the number of retransmissions for this number seen is around =
32K<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">&gt; On Linux, the best performance was around 1.15 =
Gbps bitrate and the number of retransmissions for this number seen is only=
 2K.
<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">&gt; So as you pointed the number of retransmissions=
 in BSD could be the real issue here.<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">There are other cc modules available; but I believe =
one major deviation is that Linux can perform mechanisms like hystart; ACKi=
ng every packet when the client detects slow start;
 perform pacing to achieve more uniform packet transmissions.<u></u><u></u>=
</p>
</div>
<div>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">I think the next step would be to find out, at which=
 queue those packet discards are coming from (external switch? delay genera=
tor? Vswitch? Eth stack inside the VM?)<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">Or alternatively, provide your ESX hypervisors with =
vastly more link speed, to rule out any L2 induced packet drops - provided =
your delay generator is not the source when momentarily
 overloaded.<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">&gt; Is there a way to reduce this packet loss by fi=
ne tuning some parameters w.r.t ring buffer or any other areas?
<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">Finding where these arise (looking at queue and port=
 counters) would be the next step. But this is not really my specific area =
of expertise beyond the high level, vendor independent
 observations.<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">Switching to other cc modules may give some more ins=
ights. But again, I suspect that momentary (microsecond) burstiness of BSD =
may be causing this significantly higher loss rate.<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">TCP RACK would be another option. That stack has pac=
ing, more fine-grained timing, the RACK loss recovery mechanisms etc. Maybe=
 that helps reduce the observed packet drops by iperf,
 and consequently, yield a higher overall throuhgput.<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
</div>
<div>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
<table style=3D"width:100%" width=3D"100%" cellspacing=3D"0" cellpadding=3D=
"0" border=3D"0" align=3D"left">
<tbody>
<tr>
<td style=3D"background:rgb(253,197,145);padding:3.75pt 1.5pt"></td>
<td style=3D"width:100%;background:rgb(255,248,240);padding:3.75pt 3pt 3.75=
pt 9pt" width=3D"100%">
<div>
<p class=3D"MsoNormal">
<b><span style=3D"font-size:10.5pt;font-family:Metropolis;color:rgb(68,68,6=
8)">!! External Email:</span></b><span style=3D"font-size:10.5pt;font-famil=
y:Metropolis;color:black"> This email originated from outside of the organi=
zation. Do not click links or open attachments
 unless you recognize the sender. </span><span style=3D"font-size:10.5pt;fo=
nt-family:Metropolis"><u></u><u></u></span></p>
</div>
</td>
</tr>
</tbody>
</table>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
</div>
</div>
</div>

</div></blockquote></div></div>

--000000000000139feb05ff9af368--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAGaXuiLq-ia=9ci=81GnnW2FUTdpPrVa_F2nVm6a6rmLsHhBRw>