Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 18 May 2021 23:11:03 +0200
From:      Vincenzo Maffione <vmaffione@freebsd.org>
To:        Kevin Bowling <kevin.bowling@kev009.com>
Cc:        Marko Zec <zec@fer.hr>, Francois ten Krooden <ftk@nanoteq.com>,  Jacques Fourie <jacques.fourie@gmail.com>,  "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
Subject:   Re: Vector Packet Processing (VPP) portability on FreeBSD
Message-ID:  <CA%2B_eA9joMB4C3=hdP9u0r7TkmeLLPX3=o1nCCqtk84kmkjFQkw@mail.gmail.com>
In-Reply-To: <CAK7dMtD2vgzHG4XAxpcUoTnZCpmC2Onwa%2BUd%2Bw1dKb1W_TCxfQ@mail.gmail.com>
References:  <AB9BB4D903F59549B2E27CC033B964D6C4F8BECE@NTQ-EXC.nanoteq.co.za> <91e21d18a4214af4898dd09f11144493@EX16-05.ad.unipi.it> <CA%2BhQ2%2BjQ2fh4TXz02mTxAHJkHBWzfNhd=yRqPG45E7Z4umAsKA@mail.gmail.com> <e778ca61766741b0950585f6b26d8fff@EX16-05.ad.unipi.it> <CA%2BhQ2%2BhzjT5%2BRXmUUV4PpkXkvgQEJb8JrLPY7LqteV9ixeM7Ew@mail.gmail.com> <AB9BB4D903F59549B2E27CC033B964D6C4F8D386@NTQ-EXC.nanoteq.co.za> <CALX0vxA3_eDRJmEGBak=e99nOrBkFYEmdnBHEY9JLTmT7tQ2vQ@mail.gmail.com> <AB9BB4D903F59549B2E27CC033B964D6C4F8D3BB@NTQ-EXC.nanoteq.co.za> <CA%2B_eA9iG=4nemZxM_yETxGTMMC-oXPtMZmWc9DCp%2BqJaCQt4=g@mail.gmail.com> <AB9BB4D903F59549B2E27CC033B964D6C4F8D74A@NTQ-EXC.nanoteq.co.za> <20210517192054.0907beea@x23> <CAK7dMtD2vgzHG4XAxpcUoTnZCpmC2Onwa%2BUd%2Bw1dKb1W_TCxfQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Il giorno mar 18 mag 2021 alle ore 09:32 Kevin Bowling <
kevin.bowling@kev009.com> ha scritto:

>
>
> On Mon, May 17, 2021 at 10:20 AM Marko Zec <zec@fer.hr> wrote:
>
>> On Mon, 17 May 2021 09:53:25 +0000
>> Francois ten Krooden <ftk@Nanoteq.com> wrote:
>>
>> > On 2021/05/16 09:22, Vincenzo Maffione wrote:
>> >
>> > >
>> > > Hi,
>> > >   Yes, you are not using emulated netmap mode.
>> > >
>> > >   In the test setup depicted here
>> > > https://github.com/ftk-ntq/vpp/wiki/VPP-throughput-using-netmap-
>> > > interfaces#test-setup
>> > > I think you should really try to replace VPP with the netmap
>> > > "bridge" application (tools/tools/netmap/bridge.c), and see what
>> > > numbers you get.
>> > >
>> > > You would run the application this way
>> > > # bridge -i ix0 -i ix1
>> > > and this will forward any traffic between ix0 and ix1 (in both
>> > > directions).
>> > >
>> > > These numbers would give you a better idea of where to look next
>> > > (e.g. VPP code improvements or system tuning such as NIC
>> > > interrupts, CPU binding, etc.).
>> >
>> > Thank you for the suggestion.
>> > I did run a test with the bridge this morning, and updated the
>> > results as well. +-------------+------------------+
>> > | Packet Size | Throughput (pps) |
>> > +-------------+------------------+
>> > |   64 bytes  |    7.197 Mpps    |
>> > |  128 bytes  |    7.638 Mpps    |
>> > |  512 bytes  |    2.358 Mpps    |
>> > | 1280 bytes  |  964.915 kpps    |
>> > | 1518 bytes  |  815.239 kpps    |
>> > +-------------+------------------+
>>
>> I assume you're on 13.0 where netmap throughput is lower compared to
>> 11.x due to migration of most drivers to iflib (apparently increased
>> overhead) and different driver defaults.  On 11.x I could move 10G line
>> rate from one ix to another at low CPU freqs, where on 13.x the CPU
>> must be set to max speed, and still can't do 14.88 Mpps.
>>
>
> I believe this issue is in the combined txrx interrupt filter.  It is
> causing a bunch of unnecessary tx re-arms.
>

Could you please elaborate on that?

TX completion is indeed the one thing that changed considerably with the
porting to iflib. And this could be a major contributor to the performance
drop.
My understanding is that TX interrupts are not really used anymore on
multi-gigabit NICs such as ix or ixl. Instead, "softirqs" are used, meaning
that a timer is used to perform TX completion. I don't know what the
motivations were for this design decision.
I had to decrease the timer period to 90us to ensure timely completion (see
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=248652). However, the
timer period is currently not adaptive.



>
>
>> #1 thing which changed: default # of packets per ring dropped down from
>> 2048 (11.x) to 1024 (13.x).  Try changing this in /boot/loader.conf:
>>
>> dev.ixl.0.iflib.override_nrxds=2048
>> dev.ixl.0.iflib.override_ntxds=2048
>> dev.ixl.1.iflib.override_nrxds=2048
>> dev.ixl.1.iflib.override_ntxds=2048
>> etc.
>>
>> For me this increases the throughput of
>> bridge -i netmap:ixl0 -i netmap:ixl1
>> from 9.3 Mpps to 11.4 Mpps
>>
>> #2: default interrupt moderation delays seem to be too long.  Combined
>> with increasing the ring sizes, reducing dev.ixl.0.rx_itr from 62
>> (default) to 40 increases the throughput further from 11.4 to 14.5 Mpps
>>
>> Hope this helps,
>>
>> Marko
>>
>>
>> > Besides for the 64-byte and 128-byte packets the other sizes where
>> > matching the maximum rates possible on 10Gbps. This was when the
>> > bridge application was running on a single core, and the cpu core was
>> > maxing out at a 100%.
>> >
>> > I think there might be a bit of system tuning needed, but I suspect
>> > most of the improvement would be needed in VPP.
>> >
>> > Regards
>> > Francois
>> _______________________________________________
>> freebsd-net@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-net
>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>>
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CA%2B_eA9joMB4C3=hdP9u0r7TkmeLLPX3=o1nCCqtk84kmkjFQkw>