From owner-freebsd-net@freebsd.org Fri Aug 14 17:29:56 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 79DA99B852D for ; Fri, 14 Aug 2015 17:29:56 +0000 (UTC) (envelope-from sobomax@sippysoft.com) Received: from mail-wi0-f174.google.com (mail-wi0-f174.google.com [209.85.212.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 07BFA1192 for ; Fri, 14 Aug 2015 17:29:55 +0000 (UTC) (envelope-from sobomax@sippysoft.com) Received: by wicne3 with SMTP id ne3so24768097wic.0 for ; Fri, 14 Aug 2015 10:29:48 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=ZKw0+VPiG8A5RZ2V7bPXTzRds05haJCxyh5sJUBrKEE=; b=hPi1l1Zgfdj7zPqHRTbLU0ixeRRI2VqkLScF1QTXWQGMklEC1F8d0it5e4jNYDcXa2 h0/igUos1OX/9DlqHcxbxUOoaO6VS6ddqVxIRKQxpG+jWcMOxYxA54DixlLOxlw0E+IF VVmHWdYjdmKgdKv4cPlhITzoQyjkSsq8FgR8S08L4DAaa6wrs1CHYtcI0iuATF9zg0Ve G1hQEqnIrS8DOB5KkSv1joaV9GQIIieSeiDupFt4luJA28voj5k8hPM2v0cYtLgSknaV SrrikKVw4Fyhu6VkZEtIJzYx/iXLg6W/yHXFxw55mFvoVsyDnu9T1Zq4ikGKeW/Jhzd8 Pt3g== X-Gm-Message-State: ALoCoQmFI/mVWKS/y+rymtKSW4hVeCbD2iVZUzU0JvVYVEa6Ccexc+ok9sBJSJsKyWe2HH4fn12a MIME-Version: 1.0 X-Received: by 10.194.78.164 with SMTP id c4mr1794421wjx.65.1439573388238; Fri, 14 Aug 2015 10:29:48 -0700 (PDT) Sender: sobomax@sippysoft.com Received: by 10.27.143.15 with HTTP; Fri, 14 Aug 2015 10:29:48 -0700 (PDT) In-Reply-To: References: <77171439377164@web21h.yandex.ru> <55CB2F18.40902@FreeBSD.org> Date: Fri, 14 Aug 2015 10:29:48 -0700 X-Google-Sender-Auth: 1Fmo08uwAHrfN08im56UTZ4wZRg Message-ID: Subject: Re: Poor high-PPS performance of the 10G ixgbe(9) NIC/driver in FreeBSD 10.1 From: Maxim Sobolev To: Luigi Rizzo Cc: Babak Farrokhi , "Alexander V. Chernikov" , =?UTF-8?Q?Olivier_Cochard=2DLabb=C3=A9?= , "freebsd@intel.com" , =?UTF-8?Q?Jev_Bj=C3=B6rsell?= , FreeBSD Net Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Aug 2015 17:29:56 -0000 Hi guys, unfortunately no, neither reduction of the number of queues from 8 to 6 nor pinning interrupt rate at 20000 per queue have not made any difference. The card still goes kaboom at about 200Kpps no matter what. in fact I've gone bit further, and after the first spike went on an pushed interrupt rate even further down to 10000, but again no difference either, it still blows at the same mark. Although it did have effect on interrupt rate reduction from 190K to some 130K according to the systat -vm, so that the moderation itself seems to be working fine. We will try disabling IXGBE_FDIR tomorrow and see if it helps. http://sobomax.sippysoft.com/ScreenShot391.png <- systat -vm with max_interrupt_rate =3D 20000 right before overload http://sobomax.sippysoft.com/ScreenShot392.png <- systat -vm during issue unfolding (max_interrupt_rate =3D 10000) http://sobomax.sippysoft.com/ScreenShot394.png <- cpu/net monitoring, first two spikes are with max_interrupt_rate =3D 20000, the third one max_interrupt_rate =3D 10000 -Max On Wed, Aug 12, 2015 at 5:23 AM, Luigi Rizzo wrote: > As I was telling to maxim, you should disable aim because it only matches > the max interrupt rate to the average packet size, which is the last thin= g > you want. > > Setting the interrupt rate with sysctl (one per queue) gives you precise > control on the max rate and (hence, extra latency). 20k interrupts/s give > you 50us of latency, and the 2k slots in the queue are still enough to > absorb a burst of min-sized frames hitting a single queue (the os will > start dropping long before that level, but that's another story). > > Cheers > Luigi > > On Wednesday, August 12, 2015, Babak Farrokhi > wrote: > >> I ran into the same problem with almost the same hardware (Intel X520) >> on 10-STABLE. HT/SMT is disabled and cards are configured with 8 queues, >> with the same sysctl tunings as sobomax@ did. I am not using lagg, no >> FLOWTABLE. >> >> I experimented with pmcstat (RESOURCE_STALLS) a while ago and here [1] >> [2] you can see the results, including pmc output, callchain, flamegraph >> and gprof output. >> >> I am experiencing huge number of interrupts with 200kpps load: >> >> # sysctl dev.ix | grep interrupt_rate >> dev.ix.1.queue7.interrupt_rate: 125000 >> dev.ix.1.queue6.interrupt_rate: 6329 >> dev.ix.1.queue5.interrupt_rate: 500000 >> dev.ix.1.queue4.interrupt_rate: 100000 >> dev.ix.1.queue3.interrupt_rate: 50000 >> dev.ix.1.queue2.interrupt_rate: 500000 >> dev.ix.1.queue1.interrupt_rate: 500000 >> dev.ix.1.queue0.interrupt_rate: 100000 >> dev.ix.0.queue7.interrupt_rate: 500000 >> dev.ix.0.queue6.interrupt_rate: 6097 >> dev.ix.0.queue5.interrupt_rate: 10204 >> dev.ix.0.queue4.interrupt_rate: 5208 >> dev.ix.0.queue3.interrupt_rate: 5208 >> dev.ix.0.queue2.interrupt_rate: 71428 >> dev.ix.0.queue1.interrupt_rate: 5494 >> dev.ix.0.queue0.interrupt_rate: 6250 >> >> [1] http://farrokhi.net/~farrokhi/pmc/6/ >> [2] http://farrokhi.net/~farrokhi/pmc/7/ >> >> Regards, >> Babak >> >> >> Alexander V. Chernikov wrote: >> > 12.08.2015, 02:28, "Maxim Sobolev" : >> >> Olivier, keep in mind that we are not "kernel forwarding" packets, bu= t >> "app >> >> forwarding", i.e. the packet goes full way >> >> net->kernel->recvfrom->app->sendto->kernel->net, which is why we have >> much >> >> lower PPS limits and which is why I think we are actually benefiting >> from >> >> the extra queues. Single-thread sendto() in a loop is CPU-bound at >> about >> >> 220K PPS, and while running the test I am observing that outbound >> traffic >> >> from one thread is mapped into a specific queue (well, pair of queues >> on >> >> two separate adaptors, due to lagg load balancing action). And the pe= ak >> >> performance of that test is at 7 threads, which I believe corresponds >> to >> >> the number of queues. We have plenty of CPU cores in the box (24) wit= h >> >> HTT/SMT disabled and one CPU is mapped to a specific queue. This >> leaves us >> >> with at least 8 CPUs fully capable of running our app. If you look at >> the >> >> CPU utilization, we are at about 10% when the issue hits. >> > >> > In any case, it would be great if you could provide some profiling inf= o >> since there could be >> > plenty of problematic places starting from TX rings contention to some >> locks inside udp or even >> > (in)famous random entropy harvester.. >> > e.g. something like pmcstat -TS instructions -w1 might be sufficient t= o >> determine the reason >> >> ix0: >> port >> >> 0x6020-0x603f mem 0xc7c00000-0xc7dfffff,0xc7e04000-0xc7e07fff irq 40 = at >> >> device 0.0 on pci3 >> >> ix0: Using MSIX interrupts with 9 vectors >> >> ix0: Bound queue 0 to cpu 0 >> >> ix0: Bound queue 1 to cpu 1 >> >> ix0: Bound queue 2 to cpu 2 >> >> ix0: Bound queue 3 to cpu 3 >> >> ix0: Bound queue 4 to cpu 4 >> >> ix0: Bound queue 5 to cpu 5 >> >> ix0: Bound queue 6 to cpu 6 >> >> ix0: Bound queue 7 to cpu 7 >> >> ix0: Ethernet address: 0c:c4:7a:5e:be:64 >> >> ix0: PCI Express Bus: Speed 5.0GT/s Width x8 >> >> 001.000008 [2705] netmap_attach success for ix0 tx 8/4096 rx >> >> 8/4096 queues/slots >> >> ix1: >> port >> >> 0x6000-0x601f mem 0xc7a00000-0xc7bfffff,0xc7e00000-0xc7e03fff irq 44 = at >> >> device 0.1 on pci3 >> >> ix1: Using MSIX interrupts with 9 vectors >> >> ix1: Bound queue 0 to cpu 8 >> >> ix1: Bound queue 1 to cpu 9 >> >> ix1: Bound queue 2 to cpu 10 >> >> ix1: Bound queue 3 to cpu 11 >> >> ix1: Bound queue 4 to cpu 12 >> >> ix1: Bound queue 5 to cpu 13 >> >> ix1: Bound queue 6 to cpu 14 >> >> ix1: Bound queue 7 to cpu 15 >> >> ix1: Ethernet address: 0c:c4:7a:5e:be:65 >> >> ix1: PCI Express Bus: Speed 5.0GT/s Width x8 >> >> 001.000009 [2705] netmap_attach success for ix1 tx 8/4096 rx >> >> 8/4096 queues/slots >> >> >> >> On Tue, Aug 11, 2015 at 4:14 PM, Olivier Cochard-Labb=C3=A9 < >> olivier@cochard.me> >> >> wrote: >> >> >> >>> On Tue, Aug 11, 2015 at 11:18 PM, Maxim Sobolev > > >> >>> wrote: >> >>> >> >>>> Hi folks, >> >>>> >> >>>> =E2=80=8BHi, >> >>> =E2=80=8B >> >>> >> >>>> We've trying to migrate some of our high-PPS systems to a new >> hardware >> >>>> that >> >>>> has four X540-AT2 10G NICs and observed that interrupt time goes >> through >> >>>> roof after we cross around 200K PPS in and 200K out (two ports in >> LACP). >> >>>> The previous hardware was stable up to about 350K PPS in and 350K >> out. I >> >>>> believe the old one was equipped with the I350 and had the >> identical LACP >> >>>> configuration. The new box also has better CPU with more cores >> (i.e. 24 >> >>>> cores vs. 16 cores before). CPU itself is 2 x E5-2690 v3. >> >>> =E2=80=8B200K PPS, and even 350K PPS are very low value indeed. >> >>> On a Intel Xeon L5630 (4 cores only) with one X540-AT2=E2=80=8B >> >>> >> >>> =E2=80=8B(then 2 10Gigabit ports)=E2=80=8B I've reached about 1.8Mp= ps (fastforwarding >> >>> enabled) [1]. >> >>> But my setup didn't use lagg(4): Can you disable lagg configuration >> and >> >>> re-measure your performance without lagg ? >> >>> >> >>> Do you let Intel NIC drivers using 8 queues for port too? >> >>> In my use case (forwarding smallest UDP packet size), I obtain bett= er >> >>> behaviour by limiting NIC queues to 4 (hw.ix.num_queues or >> >>> hw.ixgbe.num_queues, don't remember) if my system had 8 cores. And >> this >> >>> with Gigabit Intel[2] or Chelsio NIC [3]. >> >>> >> >>> Don't forget to disable TSO and LRO too. >> >>> >> >>> =E2=80=8BRegards, >> >>> >> >>> Olivier >> >>> >> >>> [1] >> >>> >> http://bsdrp.net/documentation/examples/forwarding_performance_lab_of_an= _ibm_system_x3550_m3_with_10-gigabit_intel_x540-at2#graphs >> >>> [2] >> >>> >> http://bsdrp.net/documentation/examples/forwarding_performance_lab_of_a_= superserver_5018a-ftn4#graph1 >> >>> [3] >> >>> >> http://bsdrp.net/documentation/examples/forwarding_performance_lab_of_a_= hp_proliant_dl360p_gen8_with_10-gigabit_with_10-gigabit_chelsio_t540-cr#red= ucing_nic_queues >> >> _______________________________________________ >> >> freebsd-net@freebsd.org mailing list >> >> http://lists.freebsd.org/mailman/listinfo/freebsd-net >> >> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org= " >> > _______________________________________________ >> > freebsd-net@freebsd.org mailing list >> > http://lists.freebsd.org/mailman/listinfo/freebsd-net >> > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > > > > -- > -----------------------------------------+------------------------------- > Prof. Luigi RIZZO, rizzo@iet.unipi.it . Dip. di Ing. dell'Informazione > http://www.iet.unipi.it/~luigi/ . Universita` di Pisa > TEL +39-050-2217533 . via Diotisalvi 2 > Mobile +39-338-6809875 . 56122 PISA (Italy) > -----------------------------------------+------------------------------- > >