Date: Wed, 12 Aug 2015 14:23:21 +0200 From: Luigi Rizzo <rizzo@iet.unipi.it> To: Babak Farrokhi <farrokhi@freebsd.org> Cc: "Alexander V. Chernikov" <melifaro@ipfw.ru>, =?UTF-8?Q?Olivier_Cochard=2DLabb=C3=A9?= <olivier@cochard.me>, Maxim Sobolev <sobomax@freebsd.org>, "freebsd@intel.com" <freebsd@intel.com>, =?UTF-8?Q?Jev_Bj=C3=B6rsell?= <jev@sippysoft.com>, FreeBSD Net <freebsd-net@freebsd.org> Subject: Re: Poor high-PPS performance of the 10G ixgbe(9) NIC/driver in FreeBSD 10.1 Message-ID: <CA%2BhQ2%2Bi37JzeUh8drxLSeeXHzYaRH9ZXvFyMBpF0XLHoiMSXMg@mail.gmail.com> In-Reply-To: <55CB2F18.40902@FreeBSD.org> References: <CAH7qZft-CZCKv_7E9PE%2B4ZN3EExhezMnAb3kvShQzYhRYb2jMg@mail.gmail.com> <77171439377164@web21h.yandex.ru> <55CB2F18.40902@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
As I was telling to maxim, you should disable aim because it only matches the max interrupt rate to the average packet size, which is the last thing you want. Setting the interrupt rate with sysctl (one per queue) gives you precise control on the max rate and (hence, extra latency). 20k interrupts/s give you 50us of latency, and the 2k slots in the queue are still enough to absorb a burst of min-sized frames hitting a single queue (the os will start dropping long before that level, but that's another story). Cheers Luigi On Wednesday, August 12, 2015, Babak Farrokhi <farrokhi@freebsd.org> wrote: > I ran into the same problem with almost the same hardware (Intel X520) > on 10-STABLE. HT/SMT is disabled and cards are configured with 8 queues, > with the same sysctl tunings as sobomax@ did. I am not using lagg, no > FLOWTABLE. > > I experimented with pmcstat (RESOURCE_STALLS) a while ago and here [1] > [2] you can see the results, including pmc output, callchain, flamegraph > and gprof output. > > I am experiencing huge number of interrupts with 200kpps load: > > # sysctl dev.ix | grep interrupt_rate > dev.ix.1.queue7.interrupt_rate: 125000 > dev.ix.1.queue6.interrupt_rate: 6329 > dev.ix.1.queue5.interrupt_rate: 500000 > dev.ix.1.queue4.interrupt_rate: 100000 > dev.ix.1.queue3.interrupt_rate: 50000 > dev.ix.1.queue2.interrupt_rate: 500000 > dev.ix.1.queue1.interrupt_rate: 500000 > dev.ix.1.queue0.interrupt_rate: 100000 > dev.ix.0.queue7.interrupt_rate: 500000 > dev.ix.0.queue6.interrupt_rate: 6097 > dev.ix.0.queue5.interrupt_rate: 10204 > dev.ix.0.queue4.interrupt_rate: 5208 > dev.ix.0.queue3.interrupt_rate: 5208 > dev.ix.0.queue2.interrupt_rate: 71428 > dev.ix.0.queue1.interrupt_rate: 5494 > dev.ix.0.queue0.interrupt_rate: 6250 > > [1] http://farrokhi.net/~farrokhi/pmc/6/ > [2] http://farrokhi.net/~farrokhi/pmc/7/ > > Regards, > Babak > > > Alexander V. Chernikov wrote: > > 12.08.2015, 02:28, "Maxim Sobolev" <sobomax@FreeBSD.org>: > >> Olivier, keep in mind that we are not "kernel forwarding" packets, but > "app > >> forwarding", i.e. the packet goes full way > >> net->kernel->recvfrom->app->sendto->kernel->net, which is why we have > much > >> lower PPS limits and which is why I think we are actually benefiting > from > >> the extra queues. Single-thread sendto() in a loop is CPU-bound at abo= ut > >> 220K PPS, and while running the test I am observing that outbound > traffic > >> from one thread is mapped into a specific queue (well, pair of queues = on > >> two separate adaptors, due to lagg load balancing action). And the pea= k > >> performance of that test is at 7 threads, which I believe corresponds = to > >> the number of queues. We have plenty of CPU cores in the box (24) with > >> HTT/SMT disabled and one CPU is mapped to a specific queue. This leave= s > us > >> with at least 8 CPUs fully capable of running our app. If you look at > the > >> CPU utilization, we are at about 10% when the issue hits. > > > > In any case, it would be great if you could provide some profiling info > since there could be > > plenty of problematic places starting from TX rings contention to some > locks inside udp or even > > (in)famous random entropy harvester.. > > e.g. something like pmcstat -TS instructions -w1 might be sufficient to > determine the reason > >> ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 2.5.15> > port > >> 0x6020-0x603f mem 0xc7c00000-0xc7dfffff,0xc7e04000-0xc7e07fff irq 40 a= t > >> device 0.0 on pci3 > >> ix0: Using MSIX interrupts with 9 vectors > >> ix0: Bound queue 0 to cpu 0 > >> ix0: Bound queue 1 to cpu 1 > >> ix0: Bound queue 2 to cpu 2 > >> ix0: Bound queue 3 to cpu 3 > >> ix0: Bound queue 4 to cpu 4 > >> ix0: Bound queue 5 to cpu 5 > >> ix0: Bound queue 6 to cpu 6 > >> ix0: Bound queue 7 to cpu 7 > >> ix0: Ethernet address: 0c:c4:7a:5e:be:64 > >> ix0: PCI Express Bus: Speed 5.0GT/s Width x8 > >> 001.000008 [2705] netmap_attach success for ix0 tx 8/4096 rx > >> 8/4096 queues/slots > >> ix1: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 2.5.15> > port > >> 0x6000-0x601f mem 0xc7a00000-0xc7bfffff,0xc7e00000-0xc7e03fff irq 44 a= t > >> device 0.1 on pci3 > >> ix1: Using MSIX interrupts with 9 vectors > >> ix1: Bound queue 0 to cpu 8 > >> ix1: Bound queue 1 to cpu 9 > >> ix1: Bound queue 2 to cpu 10 > >> ix1: Bound queue 3 to cpu 11 > >> ix1: Bound queue 4 to cpu 12 > >> ix1: Bound queue 5 to cpu 13 > >> ix1: Bound queue 6 to cpu 14 > >> ix1: Bound queue 7 to cpu 15 > >> ix1: Ethernet address: 0c:c4:7a:5e:be:65 > >> ix1: PCI Express Bus: Speed 5.0GT/s Width x8 > >> 001.000009 [2705] netmap_attach success for ix1 tx 8/4096 rx > >> 8/4096 queues/slots > >> > >> On Tue, Aug 11, 2015 at 4:14 PM, Olivier Cochard-Labb=C3=A9 < > olivier@cochard.me <javascript:;>> > >> wrote: > >> > >>> On Tue, Aug 11, 2015 at 11:18 PM, Maxim Sobolev <sobomax@freebsd.org > <javascript:;>> > >>> wrote: > >>> > >>>> Hi folks, > >>>> > >>>> =E2=80=8BHi, > >>> =E2=80=8B > >>> > >>>> We've trying to migrate some of our high-PPS systems to a new > hardware > >>>> that > >>>> has four X540-AT2 10G NICs and observed that interrupt time goes > through > >>>> roof after we cross around 200K PPS in and 200K out (two ports in > LACP). > >>>> The previous hardware was stable up to about 350K PPS in and 350K > out. I > >>>> believe the old one was equipped with the I350 and had the identica= l > LACP > >>>> configuration. The new box also has better CPU with more cores (i.e= . > 24 > >>>> cores vs. 16 cores before). CPU itself is 2 x E5-2690 v3. > >>> =E2=80=8B200K PPS, and even 350K PPS are very low value indeed. > >>> On a Intel Xeon L5630 (4 cores only) with one X540-AT2=E2=80=8B > >>> > >>> =E2=80=8B(then 2 10Gigabit ports)=E2=80=8B I've reached about 1.8Mpp= s (fastforwarding > >>> enabled) [1]. > >>> But my setup didn't use lagg(4): Can you disable lagg configuration > and > >>> re-measure your performance without lagg ? > >>> > >>> Do you let Intel NIC drivers using 8 queues for port too? > >>> In my use case (forwarding smallest UDP packet size), I obtain bette= r > >>> behaviour by limiting NIC queues to 4 (hw.ix.num_queues or > >>> hw.ixgbe.num_queues, don't remember) if my system had 8 cores. And > this > >>> with Gigabit Intel[2] or Chelsio NIC [3]. > >>> > >>> Don't forget to disable TSO and LRO too. > >>> > >>> =E2=80=8BRegards, > >>> > >>> Olivier > >>> > >>> [1] > >>> > http://bsdrp.net/documentation/examples/forwarding_performance_lab_of_an_= ibm_system_x3550_m3_with_10-gigabit_intel_x540-at2#graphs > >>> [2] > >>> > http://bsdrp.net/documentation/examples/forwarding_performance_lab_of_a_s= uperserver_5018a-ftn4#graph1 > >>> [3] > >>> > http://bsdrp.net/documentation/examples/forwarding_performance_lab_of_a_h= p_proliant_dl360p_gen8_with_10-gigabit_with_10-gigabit_chelsio_t540-cr#redu= cing_nic_queues > >> _______________________________________________ > >> freebsd-net@freebsd.org <javascript:;> mailing list > >> http://lists.freebsd.org/mailman/listinfo/freebsd-net > >> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org > <javascript:;>" > > _______________________________________________ > > freebsd-net@freebsd.org <javascript:;> mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-net > > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org > <javascript:;>" > _______________________________________________ > freebsd-net@freebsd.org <javascript:;> mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org > <javascript:;>" --=20 -----------------------------------------+------------------------------- Prof. Luigi RIZZO, rizzo@iet.unipi.it . Dip. di Ing. dell'Informazione http://www.iet.unipi.it/~luigi/ . Universita` di Pisa TEL +39-050-2217533 . via Diotisalvi 2 Mobile +39-338-6809875 . 56122 PISA (Italy) -----------------------------------------+-------------------------------
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CA%2BhQ2%2Bi37JzeUh8drxLSeeXHzYaRH9ZXvFyMBpF0XLHoiMSXMg>