Date: Mon, 8 Dec 2025 08:33:25 -0800 From: Adrian Chadd <adrian@freebsd.org> To: Kajetan Staszkiewicz <vegeta@tuxpowered.net> Cc: Konstantin Belousov <kostikbel@gmail.com>, net@freebsd.org Subject: Re: RSS causing bad forwarding performance? Message-ID: <CAJ-VmokOkfAprkPcuomXF4cMM7uRKx-=t-cP1T5%2BvKNfk-nSaA@mail.gmail.com> In-Reply-To: <f39ea972-da4b-4310-b5f0-7a6133ca0f0c@tuxpowered.net> References: <db616c5b-55bb-40dc-aa70-0b132b06bb75@tuxpowered.net> <aTYT1zK__eQfBZ0M@kib.kiev.ua> <f39ea972-da4b-4310-b5f0-7a6133ca0f0c@tuxpowered.net>
index | next in thread | previous in thread | raw e-mail
On Mon, 8 Dec 2025 at 06:24, Kajetan Staszkiewicz <vegeta@tuxpowered.net> wrote: > > On 2025-12-08 00:55, Konstantin Belousov wrote: > > > It is somewhat strange that with/without RSS results differ for UDP. > > mlx5en driver always enable hashing the packet into rx queue. And, > > with single UDP stream I would expect all packets to hit the same queue. > With a single UDP stream and RSS disabled the DUT gets 2 CPU cores > loaded. One at 100%, I understand this is where the interrupts for > incoming packets land and it handles receiving, forwarding and sending > the packet (with direct ISR dispatch) and another around 15-20%, my best > guess that it's handling interrupts for confirmations of packets sent > out through the other NIC. > > With a single UDP stream and RSS enabled the DUT gets only 1 CPU core > loaded. I understand that thanks to RSS the outbound queue on mce1 is > the same as inbound queue on mce0 and thus the same CPU core handles irq > for both queues. > > > As consequence, with/without RSS should be same (low). > > It is low for no RSS, but with RSS it's not just low, it's terrible. > > > Would it be UDP which encapsulates some other traffic, e.g. tunnel that > > can be further classified by the internal headers, like inner headers > > of the vxlan, then more that one receive queue could be used. > > The script stl/udp_1pkt_simple.py (provided with TRex) creates UDP > packets from port 1025 to port 12, filled with 0x78, length 10 B. My > goal is to test packets per second performance, so I've choosen this > test as it creates very short packets. > > > BTW, mce cards have huge numbers of supported offloads, but all of them are > > host-oriented, they would not help for the forwarding. > > > Again, since iperf stream would hit single send/receive queue. > > Parallel iperfs between same machines scale. > > It seems that parallel streams forwarded through the machine scale too. > It's a single stream that kills it, and only with option RSS enabled. RSS was never really designed for optimising a single flow by having it consume two CPU cores. It was designed for optimising a /whole lot of flows/ by directing them to a consistent CPU mapping and if used in conjunction with CPU selection for the transmit side, to avoid cross-CPU locking/synchronisation entirely. It doesn't help that the RSS defaults (ie only one netisr, not hybrid mode IIRC, etc) are not the best for lots of flows. So in short, I think you're testing the wrong thing. -adrianhelp
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-VmokOkfAprkPcuomXF4cMM7uRKx-=t-cP1T5%2BvKNfk-nSaA>
