Date: Tue, 18 May 2021 23:11:03 +0200 From: Vincenzo Maffione <vmaffione@freebsd.org> To: Kevin Bowling <kevin.bowling@kev009.com> Cc: Marko Zec <zec@fer.hr>, Francois ten Krooden <ftk@nanoteq.com>, Jacques Fourie <jacques.fourie@gmail.com>, "freebsd-net@freebsd.org" <freebsd-net@freebsd.org> Subject: Re: Vector Packet Processing (VPP) portability on FreeBSD Message-ID: <CA%2B_eA9joMB4C3=hdP9u0r7TkmeLLPX3=o1nCCqtk84kmkjFQkw@mail.gmail.com> In-Reply-To: <CAK7dMtD2vgzHG4XAxpcUoTnZCpmC2Onwa%2BUd%2Bw1dKb1W_TCxfQ@mail.gmail.com> References: <AB9BB4D903F59549B2E27CC033B964D6C4F8BECE@NTQ-EXC.nanoteq.co.za> <91e21d18a4214af4898dd09f11144493@EX16-05.ad.unipi.it> <CA%2BhQ2%2BjQ2fh4TXz02mTxAHJkHBWzfNhd=yRqPG45E7Z4umAsKA@mail.gmail.com> <e778ca61766741b0950585f6b26d8fff@EX16-05.ad.unipi.it> <CA%2BhQ2%2BhzjT5%2BRXmUUV4PpkXkvgQEJb8JrLPY7LqteV9ixeM7Ew@mail.gmail.com> <AB9BB4D903F59549B2E27CC033B964D6C4F8D386@NTQ-EXC.nanoteq.co.za> <CALX0vxA3_eDRJmEGBak=e99nOrBkFYEmdnBHEY9JLTmT7tQ2vQ@mail.gmail.com> <AB9BB4D903F59549B2E27CC033B964D6C4F8D3BB@NTQ-EXC.nanoteq.co.za> <CA%2B_eA9iG=4nemZxM_yETxGTMMC-oXPtMZmWc9DCp%2BqJaCQt4=g@mail.gmail.com> <AB9BB4D903F59549B2E27CC033B964D6C4F8D74A@NTQ-EXC.nanoteq.co.za> <20210517192054.0907beea@x23> <CAK7dMtD2vgzHG4XAxpcUoTnZCpmC2Onwa%2BUd%2Bw1dKb1W_TCxfQ@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Il giorno mar 18 mag 2021 alle ore 09:32 Kevin Bowling < kevin.bowling@kev009.com> ha scritto: > > > On Mon, May 17, 2021 at 10:20 AM Marko Zec <zec@fer.hr> wrote: > >> On Mon, 17 May 2021 09:53:25 +0000 >> Francois ten Krooden <ftk@Nanoteq.com> wrote: >> >> > On 2021/05/16 09:22, Vincenzo Maffione wrote: >> > >> > > >> > > Hi, >> > > Yes, you are not using emulated netmap mode. >> > > >> > > In the test setup depicted here >> > > https://github.com/ftk-ntq/vpp/wiki/VPP-throughput-using-netmap- >> > > interfaces#test-setup >> > > I think you should really try to replace VPP with the netmap >> > > "bridge" application (tools/tools/netmap/bridge.c), and see what >> > > numbers you get. >> > > >> > > You would run the application this way >> > > # bridge -i ix0 -i ix1 >> > > and this will forward any traffic between ix0 and ix1 (in both >> > > directions). >> > > >> > > These numbers would give you a better idea of where to look next >> > > (e.g. VPP code improvements or system tuning such as NIC >> > > interrupts, CPU binding, etc.). >> > >> > Thank you for the suggestion. >> > I did run a test with the bridge this morning, and updated the >> > results as well. +-------------+------------------+ >> > | Packet Size | Throughput (pps) | >> > +-------------+------------------+ >> > | 64 bytes | 7.197 Mpps | >> > | 128 bytes | 7.638 Mpps | >> > | 512 bytes | 2.358 Mpps | >> > | 1280 bytes | 964.915 kpps | >> > | 1518 bytes | 815.239 kpps | >> > +-------------+------------------+ >> >> I assume you're on 13.0 where netmap throughput is lower compared to >> 11.x due to migration of most drivers to iflib (apparently increased >> overhead) and different driver defaults. On 11.x I could move 10G line >> rate from one ix to another at low CPU freqs, where on 13.x the CPU >> must be set to max speed, and still can't do 14.88 Mpps. >> > > I believe this issue is in the combined txrx interrupt filter. It is > causing a bunch of unnecessary tx re-arms. > Could you please elaborate on that? TX completion is indeed the one thing that changed considerably with the porting to iflib. And this could be a major contributor to the performance drop. My understanding is that TX interrupts are not really used anymore on multi-gigabit NICs such as ix or ixl. Instead, "softirqs" are used, meaning that a timer is used to perform TX completion. I don't know what the motivations were for this design decision. I had to decrease the timer period to 90us to ensure timely completion (see https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=248652). However, the timer period is currently not adaptive. > > >> #1 thing which changed: default # of packets per ring dropped down from >> 2048 (11.x) to 1024 (13.x). Try changing this in /boot/loader.conf: >> >> dev.ixl.0.iflib.override_nrxds=2048 >> dev.ixl.0.iflib.override_ntxds=2048 >> dev.ixl.1.iflib.override_nrxds=2048 >> dev.ixl.1.iflib.override_ntxds=2048 >> etc. >> >> For me this increases the throughput of >> bridge -i netmap:ixl0 -i netmap:ixl1 >> from 9.3 Mpps to 11.4 Mpps >> >> #2: default interrupt moderation delays seem to be too long. Combined >> with increasing the ring sizes, reducing dev.ixl.0.rx_itr from 62 >> (default) to 40 increases the throughput further from 11.4 to 14.5 Mpps >> >> Hope this helps, >> >> Marko >> >> >> > Besides for the 64-byte and 128-byte packets the other sizes where >> > matching the maximum rates possible on 10Gbps. This was when the >> > bridge application was running on a single core, and the cpu core was >> > maxing out at a 100%. >> > >> > I think there might be a bit of system tuning needed, but I suspect >> > most of the improvement would be needed in VPP. >> > >> > Regards >> > Francois >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >> >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CA%2B_eA9joMB4C3=hdP9u0r7TkmeLLPX3=o1nCCqtk84kmkjFQkw>