Date: Wed, 12 Aug 2015 23:47:09 -0700 From: Adrian Chadd <adrian.chadd@gmail.com> To: Maxim Sobolev <sobomax@freebsd.org> Cc: Luigi Rizzo <rizzo@iet.unipi.it>, "Alexander V. Chernikov" <melifaro@ipfw.ru>, FreeBSD Net <freebsd-net@freebsd.org>, Babak Farrokhi <farrokhi@freebsd.org>, "freebsd@intel.com" <freebsd@intel.com>, =?UTF-8?Q?Jev_Bj=C3=B6rsell?= <jev@sippysoft.com>, =?UTF-8?Q?Olivier_Cochard=2DLabb=C3=A9?= <olivier@cochard.me> Subject: Re: Poor high-PPS performance of the 10G ixgbe(9) NIC/driver in FreeBSD 10.1 Message-ID: <CAJ-VmonoS%2BD8BCRJBGUv_4RUxWL=7KX-JB1SA07WZPBmJzj5Ew@mail.gmail.com> In-Reply-To: <CAH7qZft9u=oDjwmoA92H4eSEUJt2Zp8uwm8sN-T292fZc6A9qA@mail.gmail.com> References: <CAH7qZfuMhdHvO8w%2Bt9-3DbgsCMph8OCOoni7duYRerp6wO66Xw@mail.gmail.com> <CAJ-VmokoPn2vy2p6OPbcGi8QLPrrvLZbKWGdChkwjeF5Zh0c=Q@mail.gmail.com> <CAH7qZft9u=oDjwmoA92H4eSEUJt2Zp8uwm8sN-T292fZc6A9qA@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi, Try this: * I'd disable AIM and hard-set interrupts to something sensible; * I'd edit sys/conf/files and sys/dev/ixgbe/Makefile on 10.1 and remove the '-DIXGBE_FDIR' bit that enabled flow director - the software setup for flow director is buggy, and it causes things to get wildly unhappy. -adrian On 12 August 2015 at 17:47, Maxim Sobolev <sobomax@freebsd.org> wrote: > Here we go (ix2 and ix3 are not used): > > ix0@pci0:3:0:0: class=3D0x020000 card=3D0x152815d9 chip=3D0x15288086 rev= =3D0x01 > hdr=3D0x00 > vendor =3D 'Intel Corporation' > device =3D 'Ethernet Controller 10-Gigabit X540-AT2' > class =3D network > subclass =3D ethernet > ix1@pci0:3:0:1: class=3D0x020000 card=3D0x152815d9 chip=3D0x15288086 rev= =3D0x01 > hdr=3D0x00 > vendor =3D 'Intel Corporation' > device =3D 'Ethernet Controller 10-Gigabit X540-AT2' > class =3D network > subclass =3D ethernet > ix2@pci0:4:0:0: class=3D0x020000 card=3D0x152815d9 chip=3D0x15288086 rev= =3D0x01 > hdr=3D0x00 > vendor =3D 'Intel Corporation' > device =3D 'Ethernet Controller 10-Gigabit X540-AT2' > class =3D network > subclass =3D ethernet > ix3@pci0:4:0:1: class=3D0x020000 card=3D0x152815d9 chip=3D0x15288086 rev= =3D0x01 > hdr=3D0x00 > vendor =3D 'Intel Corporation' > device =3D 'Ethernet Controller 10-Gigabit X540-AT2' > class =3D network > subclass =3D ethernet > > > On Wed, Aug 12, 2015 at 8:23 AM, Adrian Chadd <adrian.chadd@gmail.com> > wrote: >> >> Right, and for the ixgbe hardware? >> >> >> >> -a >> >> >> On 12 August 2015 at 08:05, Maxim Sobolev <sobomax@freebsd.org> wrote: >> > igb0@pci0:7:0:0: class=3D0x020000 card=3D0x153315d9 chip=3D0x15= 338086 >> > rev=3D0x03 hdr=3D0x00 >> > vendor =3D 'Intel Corporation' >> > device =3D 'I210 Gigabit Network Connection' >> > class =3D network >> > subclass =3D ethernet >> > igb1@pci0:8:0:0: class=3D0x020000 card=3D0x153315d9 chip=3D0x15= 338086 >> > rev=3D0x03 hdr=3D0x00 >> > vendor =3D 'Intel Corporation' >> > device =3D 'I210 Gigabit Network Connection' >> > class =3D network >> > subclass =3D ethernet >> > >> > >> > On Wed, Aug 12, 2015 at 8:03 AM, Maxim Sobolev <sobomax@sippysoft.com> >> > wrote: >> > >> >> Ok, so my current settings are: >> >> >> >> hw.ix.max_interrupt_rate: 20000 >> >> dev.ix.0.queue0.interrupt_rate: 20000 >> >> dev.ix.0.queue1.interrupt_rate: 20000 >> >> dev.ix.0.queue2.interrupt_rate: 20000 >> >> dev.ix.0.queue3.interrupt_rate: 20000 >> >> dev.ix.0.queue4.interrupt_rate: 20000 >> >> dev.ix.0.queue5.interrupt_rate: 20000 >> >> dev.ix.1.queue0.interrupt_rate: 20000 >> >> dev.ix.1.queue1.interrupt_rate: 20000 >> >> dev.ix.1.queue2.interrupt_rate: 20000 >> >> dev.ix.1.queue3.interrupt_rate: 20000 >> >> dev.ix.1.queue4.interrupt_rate: 20000 >> >> dev.ix.1.queue5.interrupt_rate: 20000 >> >> dev.ix.0.enable_aim: 0 >> >> dev.ix.1.enable_aim: 0 >> >> dev.ix.2.enable_aim: 0 >> >> dev.ix.3.enable_aim: 0 >> >> hw.ix.num_queues:6 >> >> >> >> We also happen to have I210-based system with only 4 hardware queues, >> >> it >> >> would be interesting to see how it stacks up. >> >> >> >> On Wed, Aug 12, 2015 at 5:23 AM, Luigi Rizzo <rizzo@iet.unipi.it> >> >> wrote: >> >> >> >>> As I was telling to maxim, you should disable aim because it only >> >>> matches >> >>> the max interrupt rate to the average packet size, which is the last >> >>> thing >> >>> you want. >> >>> >> >>> Setting the interrupt rate with sysctl (one per queue) gives you >> >>> precise >> >>> control on the max rate and (hence, extra latency). 20k interrupts/s >> >>> give >> >>> you 50us of latency, and the 2k slots in the queue are still enough = to >> >>> absorb a burst of min-sized frames hitting a single queue (the os wi= ll >> >>> start dropping long before that level, but that's another story). >> >>> >> >>> Cheers >> >>> Luigi >> >>> >> >>> On Wednesday, August 12, 2015, Babak Farrokhi <farrokhi@freebsd.org> >> >>> wrote: >> >>> >> >>>> I ran into the same problem with almost the same hardware (Intel >> >>>> X520) >> >>>> on 10-STABLE. HT/SMT is disabled and cards are configured with 8 >> >>>> queues, >> >>>> with the same sysctl tunings as sobomax@ did. I am not using lagg, = no >> >>>> FLOWTABLE. >> >>>> >> >>>> I experimented with pmcstat (RESOURCE_STALLS) a while ago and here >> >>>> [1] >> >>>> [2] you can see the results, including pmc output, callchain, >> >>>> flamegraph >> >>>> and gprof output. >> >>>> >> >>>> I am experiencing huge number of interrupts with 200kpps load: >> >>>> >> >>>> # sysctl dev.ix | grep interrupt_rate >> >>>> dev.ix.1.queue7.interrupt_rate: 125000 >> >>>> dev.ix.1.queue6.interrupt_rate: 6329 >> >>>> dev.ix.1.queue5.interrupt_rate: 500000 >> >>>> dev.ix.1.queue4.interrupt_rate: 100000 >> >>>> dev.ix.1.queue3.interrupt_rate: 50000 >> >>>> dev.ix.1.queue2.interrupt_rate: 500000 >> >>>> dev.ix.1.queue1.interrupt_rate: 500000 >> >>>> dev.ix.1.queue0.interrupt_rate: 100000 >> >>>> dev.ix.0.queue7.interrupt_rate: 500000 >> >>>> dev.ix.0.queue6.interrupt_rate: 6097 >> >>>> dev.ix.0.queue5.interrupt_rate: 10204 >> >>>> dev.ix.0.queue4.interrupt_rate: 5208 >> >>>> dev.ix.0.queue3.interrupt_rate: 5208 >> >>>> dev.ix.0.queue2.interrupt_rate: 71428 >> >>>> dev.ix.0.queue1.interrupt_rate: 5494 >> >>>> dev.ix.0.queue0.interrupt_rate: 6250 >> >>>> >> >>>> [1] http://farrokhi.net/~farrokhi/pmc/6/ >> >>>> [2] http://farrokhi.net/~farrokhi/pmc/7/ >> >>>> >> >>>> Regards, >> >>>> Babak >> >>>> >> >>>> >> >>>> Alexander V. Chernikov wrote: >> >>>> > 12.08.2015, 02:28, "Maxim Sobolev" <sobomax@FreeBSD.org>: >> >>>> >> Olivier, keep in mind that we are not "kernel forwarding" packet= s, >> >>>> but "app >> >>>> >> forwarding", i.e. the packet goes full way >> >>>> >> net->kernel->recvfrom->app->sendto->kernel->net, which is why we >> >>>> >> have >> >>>> much >> >>>> >> lower PPS limits and which is why I think we are actually >> >>>> >> benefiting >> >>>> from >> >>>> >> the extra queues. Single-thread sendto() in a loop is CPU-bound = at >> >>>> about >> >>>> >> 220K PPS, and while running the test I am observing that outboun= d >> >>>> traffic >> >>>> >> from one thread is mapped into a specific queue (well, pair of >> >>>> >> queues >> >>>> on >> >>>> >> two separate adaptors, due to lagg load balancing action). And t= he >> >>>> peak >> >>>> >> performance of that test is at 7 threads, which I believe >> >>>> >> corresponds >> >>>> to >> >>>> >> the number of queues. We have plenty of CPU cores in the box (24= ) >> >>>> >> with >> >>>> >> HTT/SMT disabled and one CPU is mapped to a specific queue. This >> >>>> leaves us >> >>>> >> with at least 8 CPUs fully capable of running our app. If you lo= ok >> >>>> >> at >> >>>> the >> >>>> >> CPU utilization, we are at about 10% when the issue hits. >> >>>> > >> >>>> > In any case, it would be great if you could provide some profilin= g >> >>>> info since there could be >> >>>> > plenty of problematic places starting from TX rings contention to >> >>>> > some >> >>>> locks inside udp or even >> >>>> > (in)famous random entropy harvester.. >> >>>> > e.g. something like pmcstat -TS instructions -w1 might be >> >>>> > sufficient >> >>>> to determine the reason >> >>>> >> ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - >> >>>> 2.5.15> port >> >>>> >> 0x6020-0x603f mem 0xc7c00000-0xc7dfffff,0xc7e04000-0xc7e07fff ir= q >> >>>> >> 40 >> >>>> at >> >>>> >> device 0.0 on pci3 >> >>>> >> ix0: Using MSIX interrupts with 9 vectors >> >>>> >> ix0: Bound queue 0 to cpu 0 >> >>>> >> ix0: Bound queue 1 to cpu 1 >> >>>> >> ix0: Bound queue 2 to cpu 2 >> >>>> >> ix0: Bound queue 3 to cpu 3 >> >>>> >> ix0: Bound queue 4 to cpu 4 >> >>>> >> ix0: Bound queue 5 to cpu 5 >> >>>> >> ix0: Bound queue 6 to cpu 6 >> >>>> >> ix0: Bound queue 7 to cpu 7 >> >>>> >> ix0: Ethernet address: 0c:c4:7a:5e:be:64 >> >>>> >> ix0: PCI Express Bus: Speed 5.0GT/s Width x8 >> >>>> >> 001.000008 [2705] netmap_attach success for ix0 tx 8/4096 rx >> >>>> >> 8/4096 queues/slots >> >>>> >> ix1: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - >> >>>> 2.5.15> port >> >>>> >> 0x6000-0x601f mem 0xc7a00000-0xc7bfffff,0xc7e00000-0xc7e03fff ir= q >> >>>> >> 44 >> >>>> at >> >>>> >> device 0.1 on pci3 >> >>>> >> ix1: Using MSIX interrupts with 9 vectors >> >>>> >> ix1: Bound queue 0 to cpu 8 >> >>>> >> ix1: Bound queue 1 to cpu 9 >> >>>> >> ix1: Bound queue 2 to cpu 10 >> >>>> >> ix1: Bound queue 3 to cpu 11 >> >>>> >> ix1: Bound queue 4 to cpu 12 >> >>>> >> ix1: Bound queue 5 to cpu 13 >> >>>> >> ix1: Bound queue 6 to cpu 14 >> >>>> >> ix1: Bound queue 7 to cpu 15 >> >>>> >> ix1: Ethernet address: 0c:c4:7a:5e:be:65 >> >>>> >> ix1: PCI Express Bus: Speed 5.0GT/s Width x8 >> >>>> >> 001.000009 [2705] netmap_attach success for ix1 tx 8/4096 rx >> >>>> >> 8/4096 queues/slots >> >>>> >> >> >>>> >> On Tue, Aug 11, 2015 at 4:14 PM, Olivier Cochard-Labb=C3=A9 < >> >>>> olivier@cochard.me> >> >>>> >> wrote: >> >>>> >> >> >>>> >>> On Tue, Aug 11, 2015 at 11:18 PM, Maxim Sobolev < >> >>>> sobomax@freebsd.org> >> >>>> >>> wrote: >> >>>> >>> >> >>>> >>>> Hi folks, >> >>>> >>>> >> >>>> >>>> Hi, >> >>>> >>> >> >>>> >>> >> >>>> >>>> We've trying to migrate some of our high-PPS systems to a new >> >>>> hardware >> >>>> >>>> that >> >>>> >>>> has four X540-AT2 10G NICs and observed that interrupt time >> >>>> >>>> goes >> >>>> through >> >>>> >>>> roof after we cross around 200K PPS in and 200K out (two port= s >> >>>> >>>> in >> >>>> LACP). >> >>>> >>>> The previous hardware was stable up to about 350K PPS in and >> >>>> >>>> 350K >> >>>> out. I >> >>>> >>>> believe the old one was equipped with the I350 and had the >> >>>> identical LACP >> >>>> >>>> configuration. The new box also has better CPU with more core= s >> >>>> (i.e. 24 >> >>>> >>>> cores vs. 16 cores before). CPU itself is 2 x E5-2690 v3. >> >>>> >>> 200K PPS, and even 350K PPS are very low value indeed. >> >>>> >>> On a Intel Xeon L5630 (4 cores only) with one X540-AT2 >> >>>> >>> >> >>>> >>> (then 2 10Gigabit ports) I've reached about 1.8Mpps >> >>>> (fastforwarding >> >>>> >>> enabled) [1]. >> >>>> >>> But my setup didn't use lagg(4): Can you disable lagg >> >>>> >>> configuration >> >>>> and >> >>>> >>> re-measure your performance without lagg ? >> >>>> >>> >> >>>> >>> Do you let Intel NIC drivers using 8 queues for port too? >> >>>> >>> In my use case (forwarding smallest UDP packet size), I obtain >> >>>> better >> >>>> >>> behaviour by limiting NIC queues to 4 (hw.ix.num_queues or >> >>>> >>> hw.ixgbe.num_queues, don't remember) if my system had 8 cores. >> >>>> >>> And >> >>>> this >> >>>> >>> with Gigabit Intel[2] or Chelsio NIC [3]. >> >>>> >>> >> >>>> >>> Don't forget to disable TSO and LRO too. >> >>>> >>> >> >>>> >>> Regards, >> >>>> >>> >> >>>> >>> Olivier >> >>>> >>> >> >>>> >>> [1] >> >>>> >>> >> >>>> >> >>>> http://bsdrp.net/documentation/examples/forwarding_performance_lab_= of_an_ibm_system_x3550_m3_with_10-gigabit_intel_x540-at2#graphs >> >>>> >>> [2] >> >>>> >>> >> >>>> >> >>>> http://bsdrp.net/documentation/examples/forwarding_performance_lab_= of_a_superserver_5018a-ftn4#graph1 >> >>>> >>> [3] >> >>>> >>> >> >>>> >> >>>> http://bsdrp.net/documentation/examples/forwarding_performance_lab_= of_a_hp_proliant_dl360p_gen8_with_10-gigabit_with_10-gigabit_chelsio_t540-c= r#reducing_nic_queues >> >>>> >> _______________________________________________ >> >>>> >> freebsd-net@freebsd.org mailing list >> >>>> >> http://lists.freebsd.org/mailman/listinfo/freebsd-net >> >>>> >> To unsubscribe, send any mail to >> >>>> >> "freebsd-net-unsubscribe@freebsd.org >> >>>> " >> >>>> > _______________________________________________ >> >>>> > freebsd-net@freebsd.org mailing list >> >>>> > http://lists.freebsd.org/mailman/listinfo/freebsd-net >> >>>> > To unsubscribe, send any mail to >> >>>> > "freebsd-net-unsubscribe@freebsd.org" >> >>>> _______________________________________________ >> >>>> freebsd-net@freebsd.org mailing list >> >>>> http://lists.freebsd.org/mailman/listinfo/freebsd-net >> >>>> To unsubscribe, send any mail to >> >>>> "freebsd-net-unsubscribe@freebsd.org" >> >>> >> >>> >> >>> >> >>> -- >> >>> >> >>> -----------------------------------------+--------------------------= ----- >> >>> Prof. Luigi RIZZO, rizzo@iet.unipi.it . Dip. di Ing. >> >>> dell'Informazione >> >>> http://www.iet.unipi.it/~luigi/ . Universita` di Pisa >> >>> TEL +39-050-2217533 . via Diotisalvi 2 >> >>> Mobile +39-338-6809875 . 56122 PISA (Italy) >> >>> >> >>> -----------------------------------------+--------------------------= ----- >> >>> >> >>> >> >> >> >> >> >> -- >> >> Maksym Sobolyev >> >> Sippy Software, Inc. >> >> Internet Telephony (VoIP) Experts >> >> Tel (Canada): +1-778-783-0474 >> >> Tel (Toll-Free): +1-855-747-7779 >> >> Fax: +1-866-857-6942 >> >> Web: http://www.sippysoft.com >> >> MSN: sales@sippysoft.com >> >> Skype: SippySoft >> >> >> > _______________________________________________ >> > freebsd-net@freebsd.org mailing list >> > http://lists.freebsd.org/mailman/listinfo/freebsd-net >> > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >> >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-VmonoS%2BD8BCRJBGUv_4RUxWL=7KX-JB1SA07WZPBmJzj5Ew>