Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 19 Jun 2025 13:55:26 +0200
From:      =?UTF-8?Q?Olivier_Cochard=2DLabb=C3=A9?= <olivier@freebsd.org>
To:        Daniel Braniss <danny@cs.huji.ac.il>
Cc:        freebsd-hackers <hackers@freebsd.org>
Subject:   Re: 100Gb performance
Message-ID:  <CA%2Bq%2BTcpstNn7i_zks-v-9yLJNfYrzbh-KvnU9%2Bgx4xwOPn49YQ@mail.gmail.com>
In-Reply-To: <69416040-9E55-42E9-9203-FF1706F2A51E@cs.huji.ac.il>

index | next in thread | previous in thread | raw e-mail

[-- Attachment #1 --]
On Thu, Jun 19, 2025 at 8:20 AM Daniel Braniss <danny@cs.huji.ac.il> wrote:

> hi,
>
> i am running 14.2 on a DELL PowerEdge R750 with a mellanox/nvidia 100Gb
> nic  mlx5en:
>
> mce0: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP>
> metric 0 mtu 1500
>
>
> options=66ef07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,NV,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6,HWSTATS,HWRXTSTMP,MEXTPG,VXLAN_HWCSUM,VXLAN_HWTSO>
>        ether ...
>        inet ... netmask 0xfffffc00 broadcast ….
>        media: Ethernet 100GBase-KR4 <full-duplex,rxpause,txpause>
>        status: active
>        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
>
> I’m doing a rsync from an iscsilon mounted via this mce0, the best max
> throughput is about 1GBs which is a bit depressing
>
>
Regarding this 1GB/s (8Gb/s) it is how I get on my side with a very simple
netcat transfert.
By simple transfert, I mean using one TCP flow with a single process netcat:

On the receiver host:
nc -l 12345 > /dev/null
On the sender host:
dd if=/dev/zero bs=1G count=100 | nc 1.1.1.30 12345
107374182400 bytes transferred in 77.772515 secs (1380618626 bytes/sec)

Which is about 1.3GB/s, so close to your 1GB/s.

Let’s dig a little more and one the sender, displaying the stats for each
NIC queues.
How many queues the drivers configured on my sender system ?
# sysctl dev.mce.0.conf.channels
dev.mce.0.conf.channels: 40
So 40 queues (match my output of nproc), great.
But how many were used during this test:
# sysctl dev.mce.0 | awk '/txstat.*\.bytes/ && $NF != 0'
dev.mce.0.txstat26tc0.bytes: 74
dev.mce.0.txstat20tc0.bytes: 120
dev.mce.0.txstat4tc0.bytes: 112564632016
dev.mce.0.txstat0tc0.bytes: 60

=> Only one queue (the number 4 in my example) is used.
And it is the same problem on the receiver: One queue/one core

Let’s improving this by running 8 parallels nc at the same time, but we
need to use 8 different TCP sessions to let RSS selecting 8 differents
queues:
On the receiver host:
nc -l 12341 > /dev/null &
nc -l 12342 > /dev/null &
nc -l 12343 > /dev/null &
nc -l 12344 > /dev/null &
nc -l 12345 > /dev/null &
nc -l 12346 > /dev/null &
nc -l 12347 > /dev/null &
nc -l 12348 > /dev/null

On the sender host:
dd if=/dev/zero bs=1G count=100 | nc 1.1.1.30 12341 &
dd if=/dev/zero bs=1G count=100 | nc 1.1.1.30 12342 &
dd if=/dev/zero bs=1G count=100 | nc 1.1.1.30 12343 &
dd if=/dev/zero bs=1G count=100 | nc 1.1.1.30 12344 &
dd if=/dev/zero bs=1G count=100 | nc 1.1.1.30 12345 &
dd if=/dev/zero bs=1G count=100 | nc 1.1.1.30 12346 &
dd if=/dev/zero bs=1G count=100 | nc 1.1.1.30 12347 &
dd if=/dev/zero bs=1G count=100 | nc 1.1.1.30 12348

Then we need to add the output from all those dd:
107374182400 bytes transferred in 103.937552 secs (1033064374 bytes/sec)
107374182400 bytes transferred in 104.474689 secs (1027753071 bytes/sec)
107374182400 bytes transferred in 104.939627 secs (1023199578 bytes/sec)
107374182400 bytes transferred in 105.002306 secs (1022588806 bytes/sec)
107374182400 bytes transferred in 105.674894 secs (1016080345 bytes/sec)
107374182400 bytes transferred in 105.687319 secs (1015960885 bytes/sec)
107374182400 bytes transferred in 106.480994 secs (1008388239 bytes/sec)
107374182400 bytes transferred in 106.837954 secs (1005019084 bytes/sec)

To have a total of 8152054382 bytes/sec (8.15 GBytes/s or 65Gb/s).
You can check the stats per queue, and you will notice that 8 of them
should have been used.
So you need to use a multi-threaded/parallel rsync equivalent (on both
sides) to fill your link.


but tcpdump -i mce0 says:
> store-09# tcpdump -i mce0 host <same net as mce0>
> tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
> listening on mce0, link-type EN10MB (Ethernet), snapshot length 262144
> bytes
>                                             **********
>
>
Don’t worry about the libpcap definition, from contrib/libpcap/pcap/dlt.h :
#define DLT_NULL        0       /* BSD loopback encapsulation */
#define DLT_EN10MB      1       /* Ethernet (10Mb) */
#define DLT_EN3MB       2       /* Experimental Ethernet (3Mb) */
#define DLT_AX25        3       /* Amateur Radio AX.25 */
#define DLT_PRONET      4       /* Proteon ProNET Token Ring */
#define DLT_CHAOS       5       /* Chaos */
#define DLT_IEEE802     6       /* 802.5 Token Ring */
#define DLT_ARCNET      7       /* ARCNET, with BSD-style header */
#define DLT_SLIP        8       /* Serial Line IP */
#define DLT_PPP         9       /* Point-to-point Protocol */
#define DLT_FDDI        10      /* FDDI */

So the EN10MB is simply the term used for "Ethernet".

Regards,
Olivier

[-- Attachment #2 --]
<div dir="ltr"><div dir="ltr"><div class="gmail_default" style="font-family:&quot;courier new&quot;,monospace"><br></div></div><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Thu, Jun 19, 2025 at 8:20 AM Daniel Braniss &lt;<a href="mailto:danny@cs.huji.ac.il">danny@cs.huji.ac.il</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;">hi,<div><br></div><div><div>i am running 14.2 on a DELL PowerEdge R750 with a mellanox/nvidia 100Gb nic  mlx5en: </div><div><br></div><div><span style="color:rgb(0,0,0)">mce0: flags=1008843&lt;UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP&gt; metric 0 mtu 1500</span><br style="color:rgb(0,0,0)"><span style="color:rgb(0,0,0)"> </span></div><div><span style="white-space:pre-wrap">	</span><span style="color:rgb(0,0,0)">options=66ef07bb&lt;RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,NV,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6,HWSTATS,HWRXTSTMP,MEXTPG,VXLAN_HWCSUM,VXLAN_HWTSO&gt;</span></div><span style="color:rgb(0,0,0)">       ether ...</span><br style="color:rgb(0,0,0)"><font color="#000000"><span>       inet ... netmask 0xfffffc00 broadcast ….</span></font><br style="color:rgb(0,0,0)"><span style="color:rgb(0,0,0)">       media: Ethernet 100GBase-KR4 &lt;full-duplex,rxpause,txpause&gt;</span><br style="color:rgb(0,0,0)"><span style="color:rgb(0,0,0)">       status: active</span><br style="color:rgb(0,0,0)"><span style="color:rgb(0,0,0)">       nd6 options=29&lt;PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL&gt;</span></div><div><span style="color:rgb(0,0,0)"><br></span></div><div><font color="#000000"><span>I’m doing a rsync from an iscsilon mounted via this mce0, the best </span></font><span style="color:rgb(0,0,0)">max throughput is about 1GBs which is a bit depressing</span></div><div><br></div></div></blockquote><div><br></div><div class="gmail_default" style="font-family:&quot;courier new&quot;,monospace">Regarding this 1GB/s (8Gb/s) it is how I get on my side with a very simple netcat transfert.</div><div class="gmail_default" style="font-family:&quot;courier new&quot;,monospace">By simple transfert, I mean using one TCP flow with a single process netcat:</div><div class="gmail_default" style="font-family:&quot;courier new&quot;,monospace"><br></div><div class="gmail_default" style="font-family:&quot;courier new&quot;,monospace">On the receiver host:</div><div class="gmail_default" style="font-family:&quot;courier new&quot;,monospace">nc -l 12345 &gt; /dev/null</div><div class="gmail_default" style="font-family:&quot;courier new&quot;,monospace">On the sender host:</div><div class="gmail_default" style="font-family:&quot;courier new&quot;,monospace">dd if=/dev/zero bs=1G count=100 | nc 1.1.1.30 12345</div><div class="gmail_default" style="font-family:&quot;courier new&quot;,monospace">107374182400 bytes transferred in 77.772515 secs (1380618626 bytes/sec)</div><div class="gmail_default" style="font-family:&quot;courier new&quot;,monospace"><br></div><div class="gmail_default" style="font-family:&quot;courier new&quot;,monospace">Which is about 1.3GB/s, so close to your 1GB/s.</div><div class="gmail_default" style="font-family:&quot;courier new&quot;,monospace"><br></div><div class="gmail_default" style="font-family:&quot;courier new&quot;,monospace">Let’s dig a little more and one the sender, displaying the stats for each NIC queues.</div><div class="gmail_default" style="font-family:&quot;courier new&quot;,monospace">How many queues the drivers configured on my sender system ?</div><div class="gmail_default" style="font-family:&quot;courier new&quot;,monospace"># sysctl dev.mce.0.conf.channels<br>dev.mce.0.conf.channels: 40</div><div class="gmail_default" style="font-family:&quot;courier new&quot;,monospace">So 40 queues (match my output of nproc), great.</div><div class="gmail_default" style="font-family:&quot;courier new&quot;,monospace">But how many were used during this test:</div><div class="gmail_default" style="font-family:&quot;courier new&quot;,monospace"># sysctl dev.mce.0 | awk &#39;/txstat.*\.bytes/ &amp;&amp; $NF != 0&#39;<br>dev.mce.0.txstat26tc0.bytes: 74<br>dev.mce.0.txstat20tc0.bytes: 120<br>dev.mce.0.txstat4tc0.bytes: 112564632016<br>dev.mce.0.txstat0tc0.bytes: 60</div><div class="gmail_default" style="font-family:&quot;courier new&quot;,monospace"><br></div><div class="gmail_default" style="font-family:&quot;courier new&quot;,monospace">=&gt; Only one queue (the number 4 in my example) is used.</div><div class="gmail_default" style="font-family:&quot;courier new&quot;,monospace">And it is the same problem on the receiver: One queue/one core</div><div class="gmail_default" style="font-family:&quot;courier new&quot;,monospace"><br></div><div class="gmail_default" style="font-family:&quot;courier new&quot;,monospace">Let’s improving this by running 8 parallels nc at the same time, but we need to use 8 different TCP sessions to let RSS selecting 8 differents queues:</div><div class="gmail_default" style="font-family:&quot;courier new&quot;,monospace">On the receiver host:</div><div class="gmail_default" style="font-family:&quot;courier new&quot;,monospace">nc -l 12341 &gt; /dev/null &amp;<br>nc -l 12342 &gt; /dev/null &amp;<br>nc -l 12343 &gt; /dev/null &amp;<br>nc -l 12344 &gt; /dev/null &amp;<br>nc -l 12345 &gt; /dev/null &amp;<br>nc -l 12346 &gt; /dev/null &amp;<br>nc -l 12347 &gt; /dev/null &amp;<br>nc -l 12348 &gt; /dev/null<br><br>On the sender host:<br>dd if=/dev/zero bs=1G count=100 | nc 1.1.1.30 12341 &amp;<br>dd if=/dev/zero bs=1G count=100 | nc 1.1.1.30 12342 &amp;<br>dd if=/dev/zero bs=1G count=100 | nc 1.1.1.30 12343 &amp;<br>dd if=/dev/zero bs=1G count=100 | nc 1.1.1.30 12344 &amp;<br>dd if=/dev/zero bs=1G count=100 | nc 1.1.1.30 12345 &amp;<br>dd if=/dev/zero bs=1G count=100 | nc 1.1.1.30 12346 &amp;<br>dd if=/dev/zero bs=1G count=100 | nc 1.1.1.30 12347 &amp;<br>dd if=/dev/zero bs=1G count=100 | nc 1.1.1.30 12348</div><div class="gmail_default" style="font-family:&quot;courier new&quot;,monospace"><br></div><div class="gmail_default" style="font-family:&quot;courier new&quot;,monospace">Then we need to add the output from all those dd:</div><div class="gmail_default" style="font-family:&quot;courier new&quot;,monospace">107374182400 bytes transferred in 103.937552 secs (1033064374 bytes/sec)<br>107374182400 bytes transferred in 104.474689 secs (1027753071 bytes/sec)<br>107374182400 bytes transferred in 104.939627 secs (1023199578 bytes/sec)<br>107374182400 bytes transferred in 105.002306 secs (1022588806 bytes/sec)<br>107374182400 bytes transferred in 105.674894 secs (1016080345 bytes/sec)<br>107374182400 bytes transferred in 105.687319 secs (1015960885 bytes/sec)<br>107374182400 bytes transferred in 106.480994 secs (1008388239 bytes/sec)<br>107374182400 bytes transferred in 106.837954 secs (1005019084 bytes/sec)</div><div class="gmail_default" style="font-family:&quot;courier new&quot;,monospace"><br></div><div class="gmail_default" style="font-family:&quot;courier new&quot;,monospace">To have a total of 8152054382 bytes/sec (8.15 GBytes/s or 65Gb/s).</div><div class="gmail_default" style="font-family:&quot;courier new&quot;,monospace">You can check the stats per queue, and you will notice that 8 of them should have been used.</div><div class="gmail_default" style="font-family:&quot;courier new&quot;,monospace">So you need to use a multi-threaded/parallel rsync equivalent (on both sides) to fill your link.</div><div class="gmail_default" style="font-family:&quot;courier new&quot;,monospace"><br></div><div class="gmail_default" style="font-family:&quot;courier new&quot;,monospace"><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;"><div></div><div><span style="color:rgb(0,0,0)">but tcpdump -i mce0 says:</span></div><div><span style="color:rgb(0,0,0)">store-09# tcpdump -i mce0 host &lt;same net as mce0&gt;</span><br style="color:rgb(0,0,0)"><span style="color:rgb(0,0,0)">tcpdump: verbose output suppressed, use -v[v]... for full protocol decode</span><br style="color:rgb(0,0,0)"><span style="color:rgb(0,0,0)">listening on mce0, link-type EN10MB (Ethernet), snapshot length 262144 bytes</span><br style="color:rgb(0,0,0)"></div><div><span style="color:rgb(0,0,0)">                                            **********</span></div><div><br></div></div></blockquote><div><br></div><div class="gmail_default" style="font-family:&quot;courier new&quot;,monospace">Don’t worry about the libpcap definition, from contrib/libpcap/pcap/dlt.h :</div><div class="gmail_default" style="font-family:&quot;courier new&quot;,monospace">#define DLT_NULL        0       /* BSD loopback encapsulation */<br>#define DLT_EN10MB      1       /* Ethernet (10Mb) */<br>#define DLT_EN3MB       2       /* Experimental Ethernet (3Mb) */<br>#define DLT_AX25        3       /* Amateur Radio AX.25 */<br>#define DLT_PRONET      4       /* Proteon ProNET Token Ring */<br>#define DLT_CHAOS       5       /* Chaos */<br>#define DLT_IEEE802     6       /* 802.5 Token Ring */<br>#define DLT_ARCNET      7       /* ARCNET, with BSD-style header */<br>#define DLT_SLIP        8       /* Serial Line IP */<br>#define DLT_PPP         9       /* Point-to-point Protocol */<br>#define DLT_FDDI        10      /* FDDI */</div><div class="gmail_default" style="font-family:&quot;courier new&quot;,monospace"><br></div><div class="gmail_default" style="font-family:&quot;courier new&quot;,monospace">So the EN10MB is simply the term used for &quot;Ethernet&quot;.</div><div class="gmail_default" style="font-family:&quot;courier new&quot;,monospace"><br></div><div class="gmail_default" style="font-family:&quot;courier new&quot;,monospace">Regards,</div><div class="gmail_default" style="font-family:&quot;courier new&quot;,monospace">Olivier</div></div></div>
help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CA%2Bq%2BTcpstNn7i_zks-v-9yLJNfYrzbh-KvnU9%2Bgx4xwOPn49YQ>