FreeBSD Mail Archives

Date:      Wed, 12 Aug 2015 17:47:29 -0700
From:      Maxim Sobolev <sobomax@FreeBSD.org>
To:        Adrian Chadd <adrian.chadd@gmail.com>
Cc:        Luigi Rizzo <rizzo@iet.unipi.it>, "Alexander V. Chernikov" <melifaro@ipfw.ru>,  FreeBSD Net <freebsd-net@freebsd.org>, Babak Farrokhi <farrokhi@freebsd.org>,  "freebsd@intel.com" <freebsd@intel.com>, =?UTF-8?Q?Jev_Bj=C3=B6rsell?= <jev@sippysoft.com>,  =?UTF-8?Q?Olivier_Cochard=2DLabb=C3=A9?= <olivier@cochard.me>
Subject:   Re: Poor high-PPS performance of the 10G ixgbe(9) NIC/driver in FreeBSD 10.1
Message-ID:  <CAH7qZft9u=oDjwmoA92H4eSEUJt2Zp8uwm8sN-T292fZc6A9qA@mail.gmail.com>
In-Reply-To: <CAJ-VmokoPn2vy2p6OPbcGi8QLPrrvLZbKWGdChkwjeF5Zh0c=Q@mail.gmail.com>
References:  <CAH7qZfuMhdHvO8w%2Bt9-3DbgsCMph8OCOoni7duYRerp6wO66Xw@mail.gmail.com> <CAJ-VmokoPn2vy2p6OPbcGi8QLPrrvLZbKWGdChkwjeF5Zh0c=Q@mail.gmail.com>

Here we go (ix2 and ix3 are not used):

ix0@pci0:3:0:0: class=3D0x020000 card=3D0x152815d9 chip=3D0x15288086 rev=3D=
0x01
hdr=3D0x00
    vendor     =3D 'Intel Corporation'
    device     =3D 'Ethernet Controller 10-Gigabit X540-AT2'
    class      =3D network
    subclass   =3D ethernet
ix1@pci0:3:0:1: class=3D0x020000 card=3D0x152815d9 chip=3D0x15288086 rev=3D=
0x01
hdr=3D0x00
    vendor     =3D 'Intel Corporation'
    device     =3D 'Ethernet Controller 10-Gigabit X540-AT2'
    class      =3D network
    subclass   =3D ethernet
ix2@pci0:4:0:0: class=3D0x020000 card=3D0x152815d9 chip=3D0x15288086 rev=3D=
0x01
hdr=3D0x00
    vendor     =3D 'Intel Corporation'
    device     =3D 'Ethernet Controller 10-Gigabit X540-AT2'
    class      =3D network
    subclass   =3D ethernet
ix3@pci0:4:0:1: class=3D0x020000 card=3D0x152815d9 chip=3D0x15288086 rev=3D=
0x01
hdr=3D0x00
    vendor     =3D 'Intel Corporation'
    device     =3D 'Ethernet Controller 10-Gigabit X540-AT2'
    class      =3D network
    subclass   =3D ethernet


On Wed, Aug 12, 2015 at 8:23 AM, Adrian Chadd <adrian.chadd@gmail.com>
wrote:

> Right, and for the ixgbe hardware?
>
>
>
> -a
>
>
> On 12 August 2015 at 08:05, Maxim Sobolev <sobomax@freebsd.org> wrote:
> > igb0@pci0:7:0:0:        class=3D0x020000 card=3D0x153315d9 chip=3D0x153=
38086
> > rev=3D0x03 hdr=3D0x00
> >     vendor     =3D 'Intel Corporation'
> >     device     =3D 'I210 Gigabit Network Connection'
> >     class      =3D network
> >     subclass   =3D ethernet
> > igb1@pci0:8:0:0:        class=3D0x020000 card=3D0x153315d9 chip=3D0x153=
38086
> > rev=3D0x03 hdr=3D0x00
> >     vendor     =3D 'Intel Corporation'
> >     device     =3D 'I210 Gigabit Network Connection'
> >     class      =3D network
> >     subclass   =3D ethernet
> >
> >
> > On Wed, Aug 12, 2015 at 8:03 AM, Maxim Sobolev <sobomax@sippysoft.com>
> > wrote:
> >
> >> Ok, so my current settings are:
> >>
> >> hw.ix.max_interrupt_rate: 20000
> >> dev.ix.0.queue0.interrupt_rate: 20000
> >> dev.ix.0.queue1.interrupt_rate: 20000
> >> dev.ix.0.queue2.interrupt_rate: 20000
> >> dev.ix.0.queue3.interrupt_rate: 20000
> >> dev.ix.0.queue4.interrupt_rate: 20000
> >> dev.ix.0.queue5.interrupt_rate: 20000
> >> dev.ix.1.queue0.interrupt_rate: 20000
> >> dev.ix.1.queue1.interrupt_rate: 20000
> >> dev.ix.1.queue2.interrupt_rate: 20000
> >> dev.ix.1.queue3.interrupt_rate: 20000
> >> dev.ix.1.queue4.interrupt_rate: 20000
> >> dev.ix.1.queue5.interrupt_rate: 20000
> >> dev.ix.0.enable_aim: 0
> >> dev.ix.1.enable_aim: 0
> >> dev.ix.2.enable_aim: 0
> >> dev.ix.3.enable_aim: 0
> >> hw.ix.num_queues:6
> >>
> >> We also happen to have I210-based system with only 4 hardware queues, =
it
> >> would be interesting to see how it stacks up.
> >>
> >> On Wed, Aug 12, 2015 at 5:23 AM, Luigi Rizzo <rizzo@iet.unipi.it>
> wrote:
> >>
> >>> As I was telling to maxim, you should disable aim because it only
> matches
> >>> the max interrupt rate to the average packet size, which is the last
> thing
> >>> you want.
> >>>
> >>> Setting the interrupt rate with sysctl (one per queue) gives you
> precise
> >>> control on the max rate and (hence, extra latency). 20k interrupts/s
> give
> >>> you 50us of latency, and the 2k slots in the queue are still enough t=
o
> >>> absorb a burst of min-sized frames hitting a single queue (the os wil=
l
> >>> start dropping long before that level, but that's another story).
> >>>
> >>> Cheers
> >>> Luigi
> >>>
> >>> On Wednesday, August 12, 2015, Babak Farrokhi <farrokhi@freebsd.org>
> >>> wrote:
> >>>
> >>>> I ran into the same problem with almost the same hardware (Intel X52=
0)
> >>>> on 10-STABLE. HT/SMT is disabled and cards are configured with 8
> queues,
> >>>> with the same sysctl tunings as sobomax@ did. I am not using lagg, n=
o
> >>>> FLOWTABLE.
> >>>>
> >>>> I experimented with pmcstat (RESOURCE_STALLS) a while ago and here [=
1]
> >>>> [2] you can see the results, including pmc output, callchain,
> flamegraph
> >>>> and gprof output.
> >>>>
> >>>> I am experiencing huge number of interrupts with 200kpps load:
> >>>>
> >>>> # sysctl dev.ix | grep interrupt_rate
> >>>> dev.ix.1.queue7.interrupt_rate: 125000
> >>>> dev.ix.1.queue6.interrupt_rate: 6329
> >>>> dev.ix.1.queue5.interrupt_rate: 500000
> >>>> dev.ix.1.queue4.interrupt_rate: 100000
> >>>> dev.ix.1.queue3.interrupt_rate: 50000
> >>>> dev.ix.1.queue2.interrupt_rate: 500000
> >>>> dev.ix.1.queue1.interrupt_rate: 500000
> >>>> dev.ix.1.queue0.interrupt_rate: 100000
> >>>> dev.ix.0.queue7.interrupt_rate: 500000
> >>>> dev.ix.0.queue6.interrupt_rate: 6097
> >>>> dev.ix.0.queue5.interrupt_rate: 10204
> >>>> dev.ix.0.queue4.interrupt_rate: 5208
> >>>> dev.ix.0.queue3.interrupt_rate: 5208
> >>>> dev.ix.0.queue2.interrupt_rate: 71428
> >>>> dev.ix.0.queue1.interrupt_rate: 5494
> >>>> dev.ix.0.queue0.interrupt_rate: 6250
> >>>>
> >>>> [1] http://farrokhi.net/~farrokhi/pmc/6/
> >>>> [2] http://farrokhi.net/~farrokhi/pmc/7/
> >>>>
> >>>> Regards,
> >>>> Babak
> >>>>
> >>>>
> >>>> Alexander V. Chernikov wrote:
> >>>> > 12.08.2015, 02:28, "Maxim Sobolev" <sobomax@FreeBSD.org>:
> >>>> >> Olivier, keep in mind that we are not "kernel forwarding" packets=
,
> >>>> but "app
> >>>> >> forwarding", i.e. the packet goes full way
> >>>> >> net->kernel->recvfrom->app->sendto->kernel->net, which is why we
> have
> >>>> much
> >>>> >> lower PPS limits and which is why I think we are actually
> benefiting
> >>>> from
> >>>> >> the extra queues. Single-thread sendto() in a loop is CPU-bound a=
t
> >>>> about
> >>>> >> 220K PPS, and while running the test I am observing that outbound
> >>>> traffic
> >>>> >> from one thread is mapped into a specific queue (well, pair of
> queues
> >>>> on
> >>>> >> two separate adaptors, due to lagg load balancing action). And th=
e
> >>>> peak
> >>>> >> performance of that test is at 7 threads, which I believe
> corresponds
> >>>> to
> >>>> >> the number of queues. We have plenty of CPU cores in the box (24)
> with
> >>>> >> HTT/SMT disabled and one CPU is mapped to a specific queue. This
> >>>> leaves us
> >>>> >> with at least 8 CPUs fully capable of running our app. If you loo=
k
> at
> >>>> the
> >>>> >> CPU utilization, we are at about 10% when the issue hits.
> >>>> >
> >>>> > In any case, it would be great if you could provide some profiling
> >>>> info since there could be
> >>>> > plenty of problematic places starting from TX rings contention to
> some
> >>>> locks inside udp or even
> >>>> > (in)famous random entropy harvester..
> >>>> > e.g. something like pmcstat -TS instructions -w1 might be sufficie=
nt
> >>>> to determine the reason
> >>>> >> ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version -
> >>>> 2.5.15> port
> >>>> >> 0x6020-0x603f mem 0xc7c00000-0xc7dfffff,0xc7e04000-0xc7e07fff irq
> 40
> >>>> at
> >>>> >> device 0.0 on pci3
> >>>> >> ix0: Using MSIX interrupts with 9 vectors
> >>>> >> ix0: Bound queue 0 to cpu 0
> >>>> >> ix0: Bound queue 1 to cpu 1
> >>>> >> ix0: Bound queue 2 to cpu 2
> >>>> >> ix0: Bound queue 3 to cpu 3
> >>>> >> ix0: Bound queue 4 to cpu 4
> >>>> >> ix0: Bound queue 5 to cpu 5
> >>>> >> ix0: Bound queue 6 to cpu 6
> >>>> >> ix0: Bound queue 7 to cpu 7
> >>>> >> ix0: Ethernet address: 0c:c4:7a:5e:be:64
> >>>> >> ix0: PCI Express Bus: Speed 5.0GT/s Width x8
> >>>> >> 001.000008 [2705] netmap_attach success for ix0 tx 8/4096 rx
> >>>> >> 8/4096 queues/slots
> >>>> >> ix1: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version -
> >>>> 2.5.15> port
> >>>> >> 0x6000-0x601f mem 0xc7a00000-0xc7bfffff,0xc7e00000-0xc7e03fff irq
> 44
> >>>> at
> >>>> >> device 0.1 on pci3
> >>>> >> ix1: Using MSIX interrupts with 9 vectors
> >>>> >> ix1: Bound queue 0 to cpu 8
> >>>> >> ix1: Bound queue 1 to cpu 9
> >>>> >> ix1: Bound queue 2 to cpu 10
> >>>> >> ix1: Bound queue 3 to cpu 11
> >>>> >> ix1: Bound queue 4 to cpu 12
> >>>> >> ix1: Bound queue 5 to cpu 13
> >>>> >> ix1: Bound queue 6 to cpu 14
> >>>> >> ix1: Bound queue 7 to cpu 15
> >>>> >> ix1: Ethernet address: 0c:c4:7a:5e:be:65
> >>>> >> ix1: PCI Express Bus: Speed 5.0GT/s Width x8
> >>>> >> 001.000009 [2705] netmap_attach success for ix1 tx 8/4096 rx
> >>>> >> 8/4096 queues/slots
> >>>> >>
> >>>> >> On Tue, Aug 11, 2015 at 4:14 PM, Olivier Cochard-Labb=C3=A9 <
> >>>> olivier@cochard.me>
> >>>> >> wrote:
> >>>> >>
> >>>> >>>  On Tue, Aug 11, 2015 at 11:18 PM, Maxim Sobolev <
> >>>> sobomax@freebsd.org>
> >>>> >>>  wrote:
> >>>> >>>
> >>>> >>>>  Hi folks,
> >>>> >>>>
> >>>> >>>>  Hi,
> >>>> >>>
> >>>> >>>
> >>>> >>>>  We've trying to migrate some of our high-PPS systems to a new
> >>>> hardware
> >>>> >>>>  that
> >>>> >>>>  has four X540-AT2 10G NICs and observed that interrupt time go=
es
> >>>> through
> >>>> >>>>  roof after we cross around 200K PPS in and 200K out (two ports
> in
> >>>> LACP).
> >>>> >>>>  The previous hardware was stable up to about 350K PPS in and
> 350K
> >>>> out. I
> >>>> >>>>  believe the old one was equipped with the I350 and had the
> >>>> identical LACP
> >>>> >>>>  configuration. The new box also has better CPU with more cores
> >>>> (i.e. 24
> >>>> >>>>  cores vs. 16 cores before). CPU itself is 2 x E5-2690 v3.
> >>>> >>>  200K PPS, and even 350K PPS are very low value indeed.
> >>>> >>>  On a Intel Xeon L5630 (4 cores only) with one X540-AT2
> >>>> >>>
> >>>> >>>  (then 2 10Gigabit ports) I've reached about 1.8Mpps
> >>>> (fastforwarding
> >>>> >>>  enabled) [1].
> >>>> >>>  But my setup didn't use lagg(4): Can you disable lagg
> configuration
> >>>> and
> >>>> >>>  re-measure your performance without lagg ?
> >>>> >>>
> >>>> >>>  Do you let Intel NIC drivers using 8 queues for port too?
> >>>> >>>  In my use case (forwarding smallest UDP packet size), I obtain
> >>>> better
> >>>> >>>  behaviour by limiting NIC queues to 4 (hw.ix.num_queues or
> >>>> >>>  hw.ixgbe.num_queues, don't remember) if my system had 8 cores.
> And
> >>>> this
> >>>> >>>  with Gigabit Intel[2] or Chelsio NIC [3].
> >>>> >>>
> >>>> >>>  Don't forget to disable TSO and LRO too.
> >>>> >>>
> >>>> >>>  Regards,
> >>>> >>>
> >>>> >>>  Olivier
> >>>> >>>
> >>>> >>>  [1]
> >>>> >>>
> >>>>
> http://bsdrp.net/documentation/examples/forwarding_performance_lab_of_an_=
ibm_system_x3550_m3_with_10-gigabit_intel_x540-at2#graphs
> >>>> >>>  [2]
> >>>> >>>
> >>>>
> http://bsdrp.net/documentation/examples/forwarding_performance_lab_of_a_s=
uperserver_5018a-ftn4#graph1
> >>>> >>>  [3]
> >>>> >>>
> >>>>
> http://bsdrp.net/documentation/examples/forwarding_performance_lab_of_a_h=
p_proliant_dl360p_gen8_with_10-gigabit_with_10-gigabit_chelsio_t540-cr#redu=
cing_nic_queues
> >>>> >> _______________________________________________
> >>>> >> freebsd-net@freebsd.org mailing list
> >>>> >> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> >>>> >> To unsubscribe, send any mail to "
> freebsd-net-unsubscribe@freebsd.org
> >>>> "
> >>>> > _______________________________________________
> >>>> > freebsd-net@freebsd.org mailing list
> >>>> > http://lists.freebsd.org/mailman/listinfo/freebsd-net
> >>>> > To unsubscribe, send any mail to "
> freebsd-net-unsubscribe@freebsd.org"
> >>>> _______________________________________________
> >>>> freebsd-net@freebsd.org mailing list
> >>>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> >>>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.or=
g
> "
> >>>
> >>>
> >>>
> >>> --
> >>>
> -----------------------------------------+-------------------------------
> >>>  Prof. Luigi RIZZO, rizzo@iet.unipi.it  . Dip. di Ing.
> dell'Informazione
> >>>  http://www.iet.unipi.it/~luigi/        . Universita` di Pisa
> >>>  TEL      +39-050-2217533               . via Diotisalvi 2
> >>>  Mobile   +39-338-6809875               . 56122 PISA (Italy)
> >>>
> -----------------------------------------+-------------------------------
> >>>
> >>>
> >>
> >>
> >> --
> >> Maksym Sobolyev
> >> Sippy Software, Inc.
> >> Internet Telephony (VoIP) Experts
> >> Tel (Canada): +1-778-783-0474
> >> Tel (Toll-Free): +1-855-747-7779
> >> Fax: +1-866-857-6942
> >> Web: http://www.sippysoft.com
> >> MSN: sales@sippysoft.com
> >> Skype: SippySoft
> >>
> > _______________________________________________
> > freebsd-net@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-net
> > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>
>

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAH7qZft9u=oDjwmoA92H4eSEUJt2Zp8uwm8sN-T292fZc6A9qA>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation