Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 17 Aug 2015 09:54:36 -0700
From:      Maxim Sobolev <sobomax@FreeBSD.org>
To:        Luigi Rizzo <rizzo@iet.unipi.it>
Cc:        Babak Farrokhi <farrokhi@freebsd.org>, "Alexander V. Chernikov" <melifaro@ipfw.ru>,  =?UTF-8?Q?Olivier_Cochard=2DLabb=C3=A9?= <olivier@cochard.me>,  "freebsd@intel.com" <freebsd@intel.com>, =?UTF-8?Q?Jev_Bj=C3=B6rsell?= <jev@sippysoft.com>,  FreeBSD Net <freebsd-net@freebsd.org>
Subject:   Re: Poor high-PPS performance of the 10G ixgbe(9) NIC/driver in FreeBSD 10.1
Message-ID:  <CAH7qZfvtT3VoqkZ8MROErkW9GsPg00Orc0GQ0R3kHpDTsacN%2BA@mail.gmail.com>
In-Reply-To: <CAH7qZfvqBFtKfR97ZDhGhspKJ30=Mw=y-4UXOWRU_DXtJuyF6w@mail.gmail.com>
References:  <CAH7qZft-CZCKv_7E9PE%2B4ZN3EExhezMnAb3kvShQzYhRYb2jMg@mail.gmail.com> <77171439377164@web21h.yandex.ru> <55CB2F18.40902@FreeBSD.org> <CA%2BhQ2%2Bi37JzeUh8drxLSeeXHzYaRH9ZXvFyMBpF0XLHoiMSXMg@mail.gmail.com> <CAH7qZfvqBFtKfR97ZDhGhspKJ30=Mw=y-4UXOWRU_DXtJuyF6w@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
I think we are getting a better performance today with the IXGBE_FDIR
switched off. It's not 100% decisive though, since we've only pushed it to
little bit below 200kpps. We'll push more traffic tomorrow and see how it
goes.

-Maxim

On Fri, Aug 14, 2015 at 10:29 AM, Maxim Sobolev <sobomax@freebsd.org> wrote=
:

> Hi guys, unfortunately no, neither reduction of the number of queues from
> 8 to 6 nor pinning interrupt rate at 20000 per queue have not made any
> difference. The card still goes kaboom at about 200Kpps no matter what. i=
n
> fact I've gone bit further, and after the first spike went on an pushed
> interrupt rate even further down to 10000, but again no difference either=
,
> it still blows at the same mark. Although it did have effect on interrupt
> rate reduction from 190K to some 130K according to the systat -vm, so tha=
t
> the moderation itself seems to be working fine. We will try disabling IXG=
BE_FDIR
> tomorrow and see if it helps.
>
> http://sobomax.sippysoft.com/ScreenShot391.png <- systat -vm with
> max_interrupt_rate =3D 20000 right before overload
>
> http://sobomax.sippysoft.com/ScreenShot392.png <- systat -vm during issue
> unfolding (max_interrupt_rate =3D 10000)
>
> http://sobomax.sippysoft.com/ScreenShot394.png <- cpu/net monitoring,
> first two spikes are with max_interrupt_rate =3D 20000, the third one max=
_interrupt_rate
> =3D 10000
>
> -Max
>
> On Wed, Aug 12, 2015 at 5:23 AM, Luigi Rizzo <rizzo@iet.unipi.it> wrote:
>
>> As I was telling to maxim, you should disable aim because it only matche=
s
>> the max interrupt rate to the average packet size, which is the last thi=
ng
>> you want.
>>
>> Setting the interrupt rate with sysctl (one per queue) gives you precise
>> control on the max rate and (hence, extra latency). 20k interrupts/s giv=
e
>> you 50us of latency, and the 2k slots in the queue are still enough to
>> absorb a burst of min-sized frames hitting a single queue (the os will
>> start dropping long before that level, but that's another story).
>>
>> Cheers
>> Luigi
>>
>> On Wednesday, August 12, 2015, Babak Farrokhi <farrokhi@freebsd.org>
>> wrote:
>>
>>> I ran into the same problem with almost the same hardware (Intel X520)
>>> on 10-STABLE. HT/SMT is disabled and cards are configured with 8 queues=
,
>>> with the same sysctl tunings as sobomax@ did. I am not using lagg, no
>>> FLOWTABLE.
>>>
>>> I experimented with pmcstat (RESOURCE_STALLS) a while ago and here [1]
>>> [2] you can see the results, including pmc output, callchain, flamegrap=
h
>>> and gprof output.
>>>
>>> I am experiencing huge number of interrupts with 200kpps load:
>>>
>>> # sysctl dev.ix | grep interrupt_rate
>>> dev.ix.1.queue7.interrupt_rate: 125000
>>> dev.ix.1.queue6.interrupt_rate: 6329
>>> dev.ix.1.queue5.interrupt_rate: 500000
>>> dev.ix.1.queue4.interrupt_rate: 100000
>>> dev.ix.1.queue3.interrupt_rate: 50000
>>> dev.ix.1.queue2.interrupt_rate: 500000
>>> dev.ix.1.queue1.interrupt_rate: 500000
>>> dev.ix.1.queue0.interrupt_rate: 100000
>>> dev.ix.0.queue7.interrupt_rate: 500000
>>> dev.ix.0.queue6.interrupt_rate: 6097
>>> dev.ix.0.queue5.interrupt_rate: 10204
>>> dev.ix.0.queue4.interrupt_rate: 5208
>>> dev.ix.0.queue3.interrupt_rate: 5208
>>> dev.ix.0.queue2.interrupt_rate: 71428
>>> dev.ix.0.queue1.interrupt_rate: 5494
>>> dev.ix.0.queue0.interrupt_rate: 6250
>>>
>>> [1] http://farrokhi.net/~farrokhi/pmc/6/
>>> [2] http://farrokhi.net/~farrokhi/pmc/7/
>>>
>>> Regards,
>>> Babak
>>>
>>>
>>> Alexander V. Chernikov wrote:
>>> > 12.08.2015, 02:28, "Maxim Sobolev" <sobomax@FreeBSD.org>:
>>> >> Olivier, keep in mind that we are not "kernel forwarding" packets,
>>> but "app
>>> >> forwarding", i.e. the packet goes full way
>>> >> net->kernel->recvfrom->app->sendto->kernel->net, which is why we hav=
e
>>> much
>>> >> lower PPS limits and which is why I think we are actually benefiting
>>> from
>>> >> the extra queues. Single-thread sendto() in a loop is CPU-bound at
>>> about
>>> >> 220K PPS, and while running the test I am observing that outbound
>>> traffic
>>> >> from one thread is mapped into a specific queue (well, pair of queue=
s
>>> on
>>> >> two separate adaptors, due to lagg load balancing action). And the
>>> peak
>>> >> performance of that test is at 7 threads, which I believe correspond=
s
>>> to
>>> >> the number of queues. We have plenty of CPU cores in the box (24) wi=
th
>>> >> HTT/SMT disabled and one CPU is mapped to a specific queue. This
>>> leaves us
>>> >> with at least 8 CPUs fully capable of running our app. If you look a=
t
>>> the
>>> >> CPU utilization, we are at about 10% when the issue hits.
>>> >
>>> > In any case, it would be great if you could provide some profiling
>>> info since there could be
>>> > plenty of problematic places starting from TX rings contention to som=
e
>>> locks inside udp or even
>>> > (in)famous random entropy harvester..
>>> > e.g. something like pmcstat -TS instructions -w1 might be sufficient
>>> to determine the reason
>>> >> ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version -
>>> 2.5.15> port
>>> >> 0x6020-0x603f mem 0xc7c00000-0xc7dfffff,0xc7e04000-0xc7e07fff irq 40
>>> at
>>> >> device 0.0 on pci3
>>> >> ix0: Using MSIX interrupts with 9 vectors
>>> >> ix0: Bound queue 0 to cpu 0
>>> >> ix0: Bound queue 1 to cpu 1
>>> >> ix0: Bound queue 2 to cpu 2
>>> >> ix0: Bound queue 3 to cpu 3
>>> >> ix0: Bound queue 4 to cpu 4
>>> >> ix0: Bound queue 5 to cpu 5
>>> >> ix0: Bound queue 6 to cpu 6
>>> >> ix0: Bound queue 7 to cpu 7
>>> >> ix0: Ethernet address: 0c:c4:7a:5e:be:64
>>> >> ix0: PCI Express Bus: Speed 5.0GT/s Width x8
>>> >> 001.000008 [2705] netmap_attach success for ix0 tx 8/4096 rx
>>> >> 8/4096 queues/slots
>>> >> ix1: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version -
>>> 2.5.15> port
>>> >> 0x6000-0x601f mem 0xc7a00000-0xc7bfffff,0xc7e00000-0xc7e03fff irq 44
>>> at
>>> >> device 0.1 on pci3
>>> >> ix1: Using MSIX interrupts with 9 vectors
>>> >> ix1: Bound queue 0 to cpu 8
>>> >> ix1: Bound queue 1 to cpu 9
>>> >> ix1: Bound queue 2 to cpu 10
>>> >> ix1: Bound queue 3 to cpu 11
>>> >> ix1: Bound queue 4 to cpu 12
>>> >> ix1: Bound queue 5 to cpu 13
>>> >> ix1: Bound queue 6 to cpu 14
>>> >> ix1: Bound queue 7 to cpu 15
>>> >> ix1: Ethernet address: 0c:c4:7a:5e:be:65
>>> >> ix1: PCI Express Bus: Speed 5.0GT/s Width x8
>>> >> 001.000009 [2705] netmap_attach success for ix1 tx 8/4096 rx
>>> >> 8/4096 queues/slots
>>> >>
>>> >> On Tue, Aug 11, 2015 at 4:14 PM, Olivier Cochard-Labb=C3=A9 <
>>> olivier@cochard.me>
>>> >> wrote:
>>> >>
>>> >>>  On Tue, Aug 11, 2015 at 11:18 PM, Maxim Sobolev <
>>> sobomax@freebsd.org>
>>> >>>  wrote:
>>> >>>
>>> >>>>  Hi folks,
>>> >>>>
>>> >>>>  =E2=80=8BHi,
>>> >>>  =E2=80=8B
>>> >>>
>>> >>>>  We've trying to migrate some of our high-PPS systems to a new
>>> hardware
>>> >>>>  that
>>> >>>>  has four X540-AT2 10G NICs and observed that interrupt time goes
>>> through
>>> >>>>  roof after we cross around 200K PPS in and 200K out (two ports in
>>> LACP).
>>> >>>>  The previous hardware was stable up to about 350K PPS in and 350K
>>> out. I
>>> >>>>  believe the old one was equipped with the I350 and had the
>>> identical LACP
>>> >>>>  configuration. The new box also has better CPU with more cores
>>> (i.e. 24
>>> >>>>  cores vs. 16 cores before). CPU itself is 2 x E5-2690 v3.
>>> >>>  =E2=80=8B200K PPS, and even 350K PPS are very low value indeed.
>>> >>>  On a Intel Xeon L5630 (4 cores only) with one X540-AT2=E2=80=8B
>>> >>>
>>> >>>  =E2=80=8B(then 2 10Gigabit ports)=E2=80=8B I've reached about 1.8M=
pps
>>> (fastforwarding
>>> >>>  enabled) [1].
>>> >>>  But my setup didn't use lagg(4): Can you disable lagg configuratio=
n
>>> and
>>> >>>  re-measure your performance without lagg ?
>>> >>>
>>> >>>  Do you let Intel NIC drivers using 8 queues for port too?
>>> >>>  In my use case (forwarding smallest UDP packet size), I obtain
>>> better
>>> >>>  behaviour by limiting NIC queues to 4 (hw.ix.num_queues or
>>> >>>  hw.ixgbe.num_queues, don't remember) if my system had 8 cores. And
>>> this
>>> >>>  with Gigabit Intel[2] or Chelsio NIC [3].
>>> >>>
>>> >>>  Don't forget to disable TSO and LRO too.
>>> >>>
>>> >>>  =E2=80=8BRegards,
>>> >>>
>>> >>>  Olivier
>>> >>>
>>> >>>  [1]
>>> >>>
>>> http://bsdrp.net/documentation/examples/forwarding_performance_lab_of_a=
n_ibm_system_x3550_m3_with_10-gigabit_intel_x540-at2#graphs
>>> >>>  [2]
>>> >>>
>>> http://bsdrp.net/documentation/examples/forwarding_performance_lab_of_a=
_superserver_5018a-ftn4#graph1
>>> >>>  [3]
>>> >>>
>>> http://bsdrp.net/documentation/examples/forwarding_performance_lab_of_a=
_hp_proliant_dl360p_gen8_with_10-gigabit_with_10-gigabit_chelsio_t540-cr#re=
ducing_nic_queues
>>> >> _______________________________________________
>>> >> freebsd-net@freebsd.org mailing list
>>> >> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>>> >> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.or=
g
>>> "
>>> > _______________________________________________
>>> > freebsd-net@freebsd.org mailing list
>>> > http://lists.freebsd.org/mailman/listinfo/freebsd-net
>>> > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org=
"
>>> _______________________________________________
>>> freebsd-net@freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>>
>>
>>
>> --
>> -----------------------------------------+------------------------------=
-
>>  Prof. Luigi RIZZO, rizzo@iet.unipi.it  . Dip. di Ing. dell'Informazione
>>  http://www.iet.unipi.it/~luigi/        . Universita` di Pisa
>>  TEL      +39-050-2217533               . via Diotisalvi 2
>>  Mobile   +39-338-6809875               . 56122 PISA (Italy)
>> -----------------------------------------+------------------------------=
-
>>
>>
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAH7qZfvtT3VoqkZ8MROErkW9GsPg00Orc0GQ0R3kHpDTsacN%2BA>