FreeBSD Mail Archives

Date:      Mon, 24 May 2021 23:34:16 -0700
From:      Kevin Bowling <kevin.bowling@kev009.com>
To:        Vincenzo Maffione <vmaffione@freebsd.org>
Cc:        Francois ten Krooden <ftk@nanoteq.com>, Jacques Fourie <jacques.fourie@gmail.com>, Marko Zec <zec@fer.hr>,  "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>
Subject:   Re: Vector Packet Processing (VPP) portability on FreeBSD
Message-ID:  <CAK7dMtB37iN0HQMuwX-Fk=SR%2Bnc4fZLa-N863%2BNOZe9d1ebG_g@mail.gmail.com>
In-Reply-To: <CAK7dMtDWor3KqdEshfaqUH2mgagU%2BvT2M6jgwAwKiNt9J1ec%2Bw@mail.gmail.com>
References:  <AB9BB4D903F59549B2E27CC033B964D6C4F8BECE@NTQ-EXC.nanoteq.co.za> <91e21d18a4214af4898dd09f11144493@EX16-05.ad.unipi.it> <CA%2BhQ2%2BjQ2fh4TXz02mTxAHJkHBWzfNhd=yRqPG45E7Z4umAsKA@mail.gmail.com> <e778ca61766741b0950585f6b26d8fff@EX16-05.ad.unipi.it> <CA%2BhQ2%2BhzjT5%2BRXmUUV4PpkXkvgQEJb8JrLPY7LqteV9ixeM7Ew@mail.gmail.com> <AB9BB4D903F59549B2E27CC033B964D6C4F8D386@NTQ-EXC.nanoteq.co.za> <CALX0vxA3_eDRJmEGBak=e99nOrBkFYEmdnBHEY9JLTmT7tQ2vQ@mail.gmail.com> <AB9BB4D903F59549B2E27CC033B964D6C4F8D3BB@NTQ-EXC.nanoteq.co.za> <CA%2B_eA9iG=4nemZxM_yETxGTMMC-oXPtMZmWc9DCp%2BqJaCQt4=g@mail.gmail.com> <AB9BB4D903F59549B2E27CC033B964D6C4F8D74A@NTQ-EXC.nanoteq.co.za> <20210517192054.0907beea@x23> <CAK7dMtD2vgzHG4XAxpcUoTnZCpmC2Onwa%2BUd%2Bw1dKb1W_TCxfQ@mail.gmail.com> <CA%2B_eA9joMB4C3=hdP9u0r7TkmeLLPX3=o1nCCqtk84kmkjFQkw@mail.gmail.com> <CAK7dMtDWor3KqdEshfaqUH2mgagU%2BvT2M6jgwAwKiNt9J1ec%2Bw@mail.gmail.com>

--000000000000c2538805c321b7f6
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

The one other thing I want to mention, what this means in effect is every
que ends up limited by EITR on ixgbe (around 30kps with the default
settings) whether it=E2=80=99s a TX or RX workload.  This ends up working o=
k if you
have sufficient CPU but seems awkward.  On the TX workload we should need a
magnitude less interrupts to do 10g. There was some work to adapt AIM to
this new combined handler but it is not properly tuned and I=E2=80=99m not =
sure it
should consider TX at all.

Regards,
Kevin

On Mon, May 24, 2021 at 11:16 PM Kevin Bowling <kevin.bowling@kev009.com>
wrote:

> I don't fully understand the issue, but in iflib_fast_intr_rxtx
> https://cgit.freebsd.org/src/tree/sys/net/iflib.c#n1581 it seems like
> we end up re-enabling interrupts per course instead of only handling
> spurious cases or some low water threshold (which seems like it would
> be tricky to do here).  The idea is we want to pump interrupts by
> disabling them in the msix_que handler, and then wait to re-enable
> only when we have more work to do in the ift_task grouptask.
>
> It was a lot easier to reason about this with separate TX and RX
> interrupts.  Doing the combined TXRX is definitely a win in terms of
> reducing msi-x vector usage (which is important in a lot of FreeBSD
> use cases), but it's tricky to understand.
>
> My time has been sucked away due to work, so I haven't been looking at
> this problem to the depth I want to.  I'd be interested in discussing
> it further with anyone that is interested in it.
>
> Regards,
> Kevin
>
> On Tue, May 18, 2021 at 2:11 PM Vincenzo Maffione <vmaffione@freebsd.org>
> wrote:
> >
> >
> >
> > Il giorno mar 18 mag 2021 alle ore 09:32 Kevin Bowling <
> kevin.bowling@kev009.com> ha scritto:
> >>
> >>
> >>
> >> On Mon, May 17, 2021 at 10:20 AM Marko Zec <zec@fer.hr> wrote:
> >>>
> >>> On Mon, 17 May 2021 09:53:25 +0000
> >>> Francois ten Krooden <ftk@Nanoteq.com> wrote:
> >>>
> >>> > On 2021/05/16 09:22, Vincenzo Maffione wrote:
> >>> >
> >>> > >
> >>> > > Hi,
> >>> > >   Yes, you are not using emulated netmap mode.
> >>> > >
> >>> > >   In the test setup depicted here
> >>> > > https://github.com/ftk-ntq/vpp/wiki/VPP-throughput-using-netmap-
> >>> > > interfaces#test-setup
> >>> > > I think you should really try to replace VPP with the netmap
> >>> > > "bridge" application (tools/tools/netmap/bridge.c), and see what
> >>> > > numbers you get.
> >>> > >
> >>> > > You would run the application this way
> >>> > > # bridge -i ix0 -i ix1
> >>> > > and this will forward any traffic between ix0 and ix1 (in both
> >>> > > directions).
> >>> > >
> >>> > > These numbers would give you a better idea of where to look next
> >>> > > (e.g. VPP code improvements or system tuning such as NIC
> >>> > > interrupts, CPU binding, etc.).
> >>> >
> >>> > Thank you for the suggestion.
> >>> > I did run a test with the bridge this morning, and updated the
> >>> > results as well. +-------------+------------------+
> >>> > | Packet Size | Throughput (pps) |
> >>> > +-------------+------------------+
> >>> > |   64 bytes  |    7.197 Mpps    |
> >>> > |  128 bytes  |    7.638 Mpps    |
> >>> > |  512 bytes  |    2.358 Mpps    |
> >>> > | 1280 bytes  |  964.915 kpps    |
> >>> > | 1518 bytes  |  815.239 kpps    |
> >>> > +-------------+------------------+
> >>>
> >>> I assume you're on 13.0 where netmap throughput is lower compared to
> >>> 11.x due to migration of most drivers to iflib (apparently increased
> >>> overhead) and different driver defaults.  On 11.x I could move 10G li=
ne
> >>> rate from one ix to another at low CPU freqs, where on 13.x the CPU
> >>> must be set to max speed, and still can't do 14.88 Mpps.
> >>
> >>
> >> I believe this issue is in the combined txrx interrupt filter.  It is
> causing a bunch of unnecessary tx re-arms.
> >
> >
> > Could you please elaborate on that?
> >
> > TX completion is indeed the one thing that changed considerably with th=
e
> porting to iflib. And this could be a major contributor to the performanc=
e
> drop.
> > My understanding is that TX interrupts are not really used anymore on
> multi-gigabit NICs such as ix or ixl. Instead, "softirqs" are used, meani=
ng
> that a timer is used to perform TX completion. I don't know what the
> motivations were for this design decision.
> > I had to decrease the timer period to 90us to ensure timely completion
> (see https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D248652). However=
,
> the timer period is currently not adaptive.
> >
> >
> >>
> >>
> >>>
> >>> #1 thing which changed: default # of packets per ring dropped down fr=
om
> >>> 2048 (11.x) to 1024 (13.x).  Try changing this in /boot/loader.conf:
> >>>
> >>> dev.ixl.0.iflib.override_nrxds=3D2048
> >>> dev.ixl.0.iflib.override_ntxds=3D2048
> >>> dev.ixl.1.iflib.override_nrxds=3D2048
> >>> dev.ixl.1.iflib.override_ntxds=3D2048
> >>> etc.
> >>>
> >>> For me this increases the throughput of
> >>> bridge -i netmap:ixl0 -i netmap:ixl1
> >>> from 9.3 Mpps to 11.4 Mpps
> >>>
> >>> #2: default interrupt moderation delays seem to be too long.  Combine=
d
> >>> with increasing the ring sizes, reducing dev.ixl.0.rx_itr from 62
> >>> (default) to 40 increases the throughput further from 11.4 to 14.5 Mp=
ps
> >>>
> >>> Hope this helps,
> >>>
> >>> Marko
> >>>
> >>>
> >>> > Besides for the 64-byte and 128-byte packets the other sizes where
> >>> > matching the maximum rates possible on 10Gbps. This was when the
> >>> > bridge application was running on a single core, and the cpu core w=
as
> >>> > maxing out at a 100%.
> >>> >
> >>> > I think there might be a bit of system tuning needed, but I suspect
> >>> > most of the improvement would be needed in VPP.
> >>> >
> >>> > Regards
> >>> > Francois
> >>> _______________________________________________
> >>> freebsd-net@freebsd.org mailing list
> >>> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> >>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org=
"
>

--000000000000c2538805c321b7f6
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"auto">The one other thing I want to mention, what this means in=
 effect is every que ends up limited by EITR on ixgbe (around 30kps with th=
e default settings) whether it=E2=80=99s a TX or RX workload.=C2=A0 This en=
ds up working ok if you have sufficient CPU but seems awkward.=C2=A0 On the=
 TX workload we should need a magnitude less interrupts to do 10g. There wa=
s some work to adapt AIM to this new combined handler but it is not properl=
y tuned and I=E2=80=99m not sure it should consider TX at all.</div><div di=
r=3D"auto"><br></div><div dir=3D"auto">Regards,</div><div dir=3D"auto">Kevi=
n</div><div><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_=
attr">On Mon, May 24, 2021 at 11:16 PM Kevin Bowling &lt;<a href=3D"mailto:=
kevin.bowling@kev009.com">kevin.bowling@kev009.com</a>&gt; wrote:<br></div>=
<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left-width:1px;border-left-style:solid;padding-left:1ex;border-left-color:r=
gb(204,204,204)">I don&#39;t fully understand the issue, but in iflib_fast_=
intr_rxtx<br>
<a href=3D"https://cgit.freebsd.org/src/tree/sys/net/iflib.c#n1581" rel=3D"=
noreferrer" target=3D"_blank">https://cgit.freebsd.org/src/tree/sys/net/ifl=
ib.c#n1581</a> it seems like<br>
we end up re-enabling interrupts per course instead of only handling<br>
spurious cases or some low water threshold (which seems like it would<br>
be tricky to do here).=C2=A0 The idea is we want to pump interrupts by<br>
disabling them in the msix_que handler, and then wait to re-enable<br>
only when we have more work to do in the ift_task grouptask.<br>
<br>
It was a lot easier to reason about this with separate TX and RX<br>
interrupts.=C2=A0 Doing the combined TXRX is definitely a win in terms of<b=
r>
reducing msi-x vector usage (which is important in a lot of FreeBSD<br>
use cases), but it&#39;s tricky to understand.<br>
<br>
My time has been sucked away due to work, so I haven&#39;t been looking at<=
br>
this problem to the depth I want to.=C2=A0 I&#39;d be interested in discuss=
ing<br>
it further with anyone that is interested in it.<br>
<br>
Regards,<br>
Kevin<br>
<br>
On Tue, May 18, 2021 at 2:11 PM Vincenzo Maffione &lt;<a href=3D"mailto:vma=
ffione@freebsd.org" target=3D"_blank">vmaffione@freebsd.org</a>&gt; wrote:<=
br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt; Il giorno mar 18 mag 2021 alle ore 09:32 Kevin Bowling &lt;<a href=3D"=
mailto:kevin.bowling@kev009.com" target=3D"_blank">kevin.bowling@kev009.com=
</a>&gt; ha scritto:<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; On Mon, May 17, 2021 at 10:20 AM Marko Zec &lt;<a href=3D"mailto:z=
ec@fer.hr" target=3D"_blank">zec@fer.hr</a>&gt; wrote:<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; On Mon, 17 May 2021 09:53:25 +0000<br>
&gt;&gt;&gt; Francois ten Krooden &lt;ftk@Nanoteq.com&gt; wrote:<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; &gt; On 2021/05/16 09:22, Vincenzo Maffione wrote:<br>
&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt; &gt; &gt;<br>
&gt;&gt;&gt; &gt; &gt; Hi,<br>
&gt;&gt;&gt; &gt; &gt;=C2=A0 =C2=A0Yes, you are not using emulated netmap m=
ode.<br>
&gt;&gt;&gt; &gt; &gt;<br>
&gt;&gt;&gt; &gt; &gt;=C2=A0 =C2=A0In the test setup depicted here<br>
&gt;&gt;&gt; &gt; &gt; <a href=3D"https://github.com/ftk-ntq/vpp/wiki/VPP-t=
hroughput-using-netmap-" rel=3D"noreferrer" target=3D"_blank">https://githu=
b.com/ftk-ntq/vpp/wiki/VPP-throughput-using-netmap-</a><br>
&gt;&gt;&gt; &gt; &gt; interfaces#test-setup<br>
&gt;&gt;&gt; &gt; &gt; I think you should really try to replace VPP with th=
e netmap<br>
&gt;&gt;&gt; &gt; &gt; &quot;bridge&quot; application (tools/tools/netmap/b=
ridge.c), and see what<br>
&gt;&gt;&gt; &gt; &gt; numbers you get.<br>
&gt;&gt;&gt; &gt; &gt;<br>
&gt;&gt;&gt; &gt; &gt; You would run the application this way<br>
&gt;&gt;&gt; &gt; &gt; # bridge -i ix0 -i ix1<br>
&gt;&gt;&gt; &gt; &gt; and this will forward any traffic between ix0 and ix=
1 (in both<br>
&gt;&gt;&gt; &gt; &gt; directions).<br>
&gt;&gt;&gt; &gt; &gt;<br>
&gt;&gt;&gt; &gt; &gt; These numbers would give you a better idea of where =
to look next<br>
&gt;&gt;&gt; &gt; &gt; (e.g. VPP code improvements or system tuning such as=
 NIC<br>
&gt;&gt;&gt; &gt; &gt; interrupts, CPU binding, etc.).<br>
&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt; &gt; Thank you for the suggestion.<br>
&gt;&gt;&gt; &gt; I did run a test with the bridge this morning, and update=
d the<br>
&gt;&gt;&gt; &gt; results as well. +-------------+------------------+<br>
&gt;&gt;&gt; &gt; | Packet Size | Throughput (pps) |<br>
&gt;&gt;&gt; &gt; +-------------+------------------+<br>
&gt;&gt;&gt; &gt; |=C2=A0 =C2=A064 bytes=C2=A0 |=C2=A0 =C2=A0 7.197 Mpps=C2=
=A0 =C2=A0 |<br>
&gt;&gt;&gt; &gt; |=C2=A0 128 bytes=C2=A0 |=C2=A0 =C2=A0 7.638 Mpps=C2=A0 =
=C2=A0 |<br>
&gt;&gt;&gt; &gt; |=C2=A0 512 bytes=C2=A0 |=C2=A0 =C2=A0 2.358 Mpps=C2=A0 =
=C2=A0 |<br>
&gt;&gt;&gt; &gt; | 1280 bytes=C2=A0 |=C2=A0 964.915 kpps=C2=A0 =C2=A0 |<br=
>
&gt;&gt;&gt; &gt; | 1518 bytes=C2=A0 |=C2=A0 815.239 kpps=C2=A0 =C2=A0 |<br=
>
&gt;&gt;&gt; &gt; +-------------+------------------+<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; I assume you&#39;re on 13.0 where netmap throughput is lower c=
ompared to<br>
&gt;&gt;&gt; 11.x due to migration of most drivers to iflib (apparently inc=
reased<br>
&gt;&gt;&gt; overhead) and different driver defaults.=C2=A0 On 11.x I could=
 move 10G line<br>
&gt;&gt;&gt; rate from one ix to another at low CPU freqs, where on 13.x th=
e CPU<br>
&gt;&gt;&gt; must be set to max speed, and still can&#39;t do 14.88 Mpps.<b=
r>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; I believe this issue is in the combined txrx interrupt filter.=C2=
=A0 It is causing a bunch of unnecessary tx re-arms.<br>
&gt;<br>
&gt;<br>
&gt; Could you please elaborate on that?<br>
&gt;<br>
&gt; TX completion is indeed the one thing that changed considerably with t=
he porting to iflib. And this could be a major contributor to the performan=
ce drop.<br>
&gt; My understanding is that TX interrupts are not really used anymore on =
multi-gigabit NICs such as ix or ixl. Instead, &quot;softirqs&quot; are use=
d, meaning that a timer is used to perform TX completion. I don&#39;t know =
what the motivations were for this design decision.<br>
&gt; I had to decrease the timer period to 90us to ensure timely completion=
 (see <a href=3D"https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D248652=
" rel=3D"noreferrer" target=3D"_blank">https://bugs.freebsd.org/bugzilla/sh=
ow_bug.cgi?id=3D248652</a>). However, the timer period is currently not ada=
ptive.<br>
&gt;<br>
&gt;<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; #1 thing which changed: default # of packets per ring dropped =
down from<br>
&gt;&gt;&gt; 2048 (11.x) to 1024 (13.x).=C2=A0 Try changing this in /boot/l=
oader.conf:<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; dev.ixl.0.iflib.override_nrxds=3D2048<br>
&gt;&gt;&gt; dev.ixl.0.iflib.override_ntxds=3D2048<br>
&gt;&gt;&gt; dev.ixl.1.iflib.override_nrxds=3D2048<br>
&gt;&gt;&gt; dev.ixl.1.iflib.override_ntxds=3D2048<br>
&gt;&gt;&gt; etc.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; For me this increases the throughput of<br>
&gt;&gt;&gt; bridge -i netmap:ixl0 -i netmap:ixl1<br>
&gt;&gt;&gt; from 9.3 Mpps to 11.4 Mpps<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; #2: default interrupt moderation delays seem to be too long.=
=C2=A0 Combined<br>
&gt;&gt;&gt; with increasing the ring sizes, reducing dev.ixl.0.rx_itr from=
 62<br>
&gt;&gt;&gt; (default) to 40 increases the throughput further from 11.4 to =
14.5 Mpps<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; Hope this helps,<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; Marko<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; &gt; Besides for the 64-byte and 128-byte packets the other si=
zes where<br>
&gt;&gt;&gt; &gt; matching the maximum rates possible on 10Gbps. This was w=
hen the<br>
&gt;&gt;&gt; &gt; bridge application was running on a single core, and the =
cpu core was<br>
&gt;&gt;&gt; &gt; maxing out at a 100%.<br>
&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt; &gt; I think there might be a bit of system tuning needed, but=
 I suspect<br>
&gt;&gt;&gt; &gt; most of the improvement would be needed in VPP.<br>
&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt; &gt; Regards<br>
&gt;&gt;&gt; &gt; Francois<br>
&gt;&gt;&gt; _______________________________________________<br>
&gt;&gt;&gt; <a href=3D"mailto:freebsd-net@freebsd.org" target=3D"_blank">f=
reebsd-net@freebsd.org</a> mailing list<br>
&gt;&gt;&gt; <a href=3D"https://lists.freebsd.org/mailman/listinfo/freebsd-=
net" rel=3D"noreferrer" target=3D"_blank">https://lists.freebsd.org/mailman=
/listinfo/freebsd-net</a><br>
&gt;&gt;&gt; To unsubscribe, send any mail to &quot;<a href=3D"mailto:freeb=
sd-net-unsubscribe@freebsd.org" target=3D"_blank">freebsd-net-unsubscribe@f=
reebsd.org</a>&quot;<br>
</blockquote></div></div>

--000000000000c2538805c321b7f6--

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAK7dMtB37iN0HQMuwX-Fk=SR%2Bnc4fZLa-N863%2BNOZe9d1ebG_g>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation