From owner-freebsd-net@freebsd.org  Wed Aug 12 15:03:29 2015
Return-Path: <owner-freebsd-net@freebsd.org>
Delivered-To: freebsd-net@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 03C2C99F292
 for <freebsd-net@mailman.ysv.freebsd.org>;
 Wed, 12 Aug 2015 15:03:29 +0000 (UTC)
 (envelope-from sobomax@sippysoft.com)
Received: from mail-wi0-f181.google.com (mail-wi0-f181.google.com
 [209.85.212.181])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 873BF7B3
 for <freebsd-net@freebsd.org>; Wed, 12 Aug 2015 15:03:28 +0000 (UTC)
 (envelope-from sobomax@sippysoft.com)
Received: by wicne3 with SMTP id ne3so222294965wic.1
 for <freebsd-net@freebsd.org>; Wed, 12 Aug 2015 08:03:26 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:date
 :message-id:subject:from:to:cc:content-type;
 bh=58rgGNy+zT2P/bfuHTV0JSN6rJh5ab4hocLHPySWM+k=;
 b=W8NEfRZtNYhAOLx2HGGiD5DnwJ/oOSHREKh8DJ0///hWV5WjrTJQ/OKYSmnoW1110P
 WOblJ4GJUn6LLpN1sKWuoELTKp2YerbBwCYGghgW8jVrUpwJuTGRafhmKGZirMSZGkRn
 y6aFj/HMJtXlzzAiYktmpCA0NbRQELqu3+l9XGmlRO8obpYBEL3vUr6VDCsI/JJ+Hvyk
 gWm3L8b1W16QrcW+wz8iDiBXkIGGgiqGqBaVKImcB03WkCJ1aZdtcdGMFHVjVGYNVVbf
 Fv0nmgoURb03DL7x1kPam3wg5K6N544MHeb6b4a7q3n9vaGEOo6WyGDmV+3MWw2t72il
 Hgzw==
X-Gm-Message-State: ALoCoQkaNkvyab8WFYCSHz1CKrygP9T60iYQO2DnFXx93JikgsU/SXqsAureVYpZ1XRPcVsE2yDi
MIME-Version: 1.0
X-Received: by 10.180.78.166 with SMTP id c6mr26925969wix.8.1439391805929;
 Wed, 12 Aug 2015 08:03:25 -0700 (PDT)
Received: by 10.27.143.15 with HTTP; Wed, 12 Aug 2015 08:03:25 -0700 (PDT)
In-Reply-To: <CA+hQ2+i37JzeUh8drxLSeeXHzYaRH9ZXvFyMBpF0XLHoiMSXMg@mail.gmail.com>
References: <CAH7qZft-CZCKv_7E9PE+4ZN3EExhezMnAb3kvShQzYhRYb2jMg@mail.gmail.com>
 <77171439377164@web21h.yandex.ru> <55CB2F18.40902@FreeBSD.org>
 <CA+hQ2+i37JzeUh8drxLSeeXHzYaRH9ZXvFyMBpF0XLHoiMSXMg@mail.gmail.com>
Date: Wed, 12 Aug 2015 08:03:25 -0700
Message-ID: <CAH7qZfujwHU-bbuv6cA21PP2Lswo9C9KCkPVs6-vp-51igSjgw@mail.gmail.com>
Subject: Re: Poor high-PPS performance of the 10G ixgbe(9) NIC/driver in
 FreeBSD 10.1
From: Maxim Sobolev <sobomax@sippysoft.com>
To: Luigi Rizzo <rizzo@iet.unipi.it>
Cc: Babak Farrokhi <farrokhi@freebsd.org>,
 "Alexander V. Chernikov" <melifaro@ipfw.ru>, 
 =?UTF-8?Q?Olivier_Cochard=2DLabb=C3=A9?= <olivier@cochard.me>, 
 "freebsd@intel.com" <freebsd@intel.com>,
 =?UTF-8?Q?Jev_Bj=C3=B6rsell?= <jev@sippysoft.com>, 
 FreeBSD Net <freebsd-net@freebsd.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.20
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 12 Aug 2015 15:03:29 -0000

Ok, so my current settings are:

hw.ix.max_interrupt_rate: 20000
dev.ix.0.queue0.interrupt_rate: 20000
dev.ix.0.queue1.interrupt_rate: 20000
dev.ix.0.queue2.interrupt_rate: 20000
dev.ix.0.queue3.interrupt_rate: 20000
dev.ix.0.queue4.interrupt_rate: 20000
dev.ix.0.queue5.interrupt_rate: 20000
dev.ix.1.queue0.interrupt_rate: 20000
dev.ix.1.queue1.interrupt_rate: 20000
dev.ix.1.queue2.interrupt_rate: 20000
dev.ix.1.queue3.interrupt_rate: 20000
dev.ix.1.queue4.interrupt_rate: 20000
dev.ix.1.queue5.interrupt_rate: 20000
dev.ix.0.enable_aim: 0
dev.ix.1.enable_aim: 0
dev.ix.2.enable_aim: 0
dev.ix.3.enable_aim: 0
hw.ix.num_queues:6

We also happen to have I210-based system with only 4 hardware queues, it
would be interesting to see how it stacks up.

On Wed, Aug 12, 2015 at 5:23 AM, Luigi Rizzo <rizzo@iet.unipi.it> wrote:

> As I was telling to maxim, you should disable aim because it only matches
> the max interrupt rate to the average packet size, which is the last thin=
g
> you want.
>
> Setting the interrupt rate with sysctl (one per queue) gives you precise
> control on the max rate and (hence, extra latency). 20k interrupts/s give
> you 50us of latency, and the 2k slots in the queue are still enough to
> absorb a burst of min-sized frames hitting a single queue (the os will
> start dropping long before that level, but that's another story).
>
> Cheers
> Luigi
>
> On Wednesday, August 12, 2015, Babak Farrokhi <farrokhi@freebsd.org>
> wrote:
>
>> I ran into the same problem with almost the same hardware (Intel X520)
>> on 10-STABLE. HT/SMT is disabled and cards are configured with 8 queues,
>> with the same sysctl tunings as sobomax@ did. I am not using lagg, no
>> FLOWTABLE.
>>
>> I experimented with pmcstat (RESOURCE_STALLS) a while ago and here [1]
>> [2] you can see the results, including pmc output, callchain, flamegraph
>> and gprof output.
>>
>> I am experiencing huge number of interrupts with 200kpps load:
>>
>> # sysctl dev.ix | grep interrupt_rate
>> dev.ix.1.queue7.interrupt_rate: 125000
>> dev.ix.1.queue6.interrupt_rate: 6329
>> dev.ix.1.queue5.interrupt_rate: 500000
>> dev.ix.1.queue4.interrupt_rate: 100000
>> dev.ix.1.queue3.interrupt_rate: 50000
>> dev.ix.1.queue2.interrupt_rate: 500000
>> dev.ix.1.queue1.interrupt_rate: 500000
>> dev.ix.1.queue0.interrupt_rate: 100000
>> dev.ix.0.queue7.interrupt_rate: 500000
>> dev.ix.0.queue6.interrupt_rate: 6097
>> dev.ix.0.queue5.interrupt_rate: 10204
>> dev.ix.0.queue4.interrupt_rate: 5208
>> dev.ix.0.queue3.interrupt_rate: 5208
>> dev.ix.0.queue2.interrupt_rate: 71428
>> dev.ix.0.queue1.interrupt_rate: 5494
>> dev.ix.0.queue0.interrupt_rate: 6250
>>
>> [1] http://farrokhi.net/~farrokhi/pmc/6/
>> [2] http://farrokhi.net/~farrokhi/pmc/7/
>>
>> Regards,
>> Babak
>>
>>
>> Alexander V. Chernikov wrote:
>> > 12.08.2015, 02:28, "Maxim Sobolev" <sobomax@FreeBSD.org>:
>> >> Olivier, keep in mind that we are not "kernel forwarding" packets, bu=
t
>> "app
>> >> forwarding", i.e. the packet goes full way
>> >> net->kernel->recvfrom->app->sendto->kernel->net, which is why we have
>> much
>> >> lower PPS limits and which is why I think we are actually benefiting
>> from
>> >> the extra queues. Single-thread sendto() in a loop is CPU-bound at
>> about
>> >> 220K PPS, and while running the test I am observing that outbound
>> traffic
>> >> from one thread is mapped into a specific queue (well, pair of queues
>> on
>> >> two separate adaptors, due to lagg load balancing action). And the pe=
ak
>> >> performance of that test is at 7 threads, which I believe corresponds
>> to
>> >> the number of queues. We have plenty of CPU cores in the box (24) wit=
h
>> >> HTT/SMT disabled and one CPU is mapped to a specific queue. This
>> leaves us
>> >> with at least 8 CPUs fully capable of running our app. If you look at
>> the
>> >> CPU utilization, we are at about 10% when the issue hits.
>> >
>> > In any case, it would be great if you could provide some profiling inf=
o
>> since there could be
>> > plenty of problematic places starting from TX rings contention to some
>> locks inside udp or even
>> > (in)famous random entropy harvester..
>> > e.g. something like pmcstat -TS instructions -w1 might be sufficient t=
o
>> determine the reason
>> >> ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 2.5.15=
>
>> port
>> >> 0x6020-0x603f mem 0xc7c00000-0xc7dfffff,0xc7e04000-0xc7e07fff irq 40 =
at
>> >> device 0.0 on pci3
>> >> ix0: Using MSIX interrupts with 9 vectors
>> >> ix0: Bound queue 0 to cpu 0
>> >> ix0: Bound queue 1 to cpu 1
>> >> ix0: Bound queue 2 to cpu 2
>> >> ix0: Bound queue 3 to cpu 3
>> >> ix0: Bound queue 4 to cpu 4
>> >> ix0: Bound queue 5 to cpu 5
>> >> ix0: Bound queue 6 to cpu 6
>> >> ix0: Bound queue 7 to cpu 7
>> >> ix0: Ethernet address: 0c:c4:7a:5e:be:64
>> >> ix0: PCI Express Bus: Speed 5.0GT/s Width x8
>> >> 001.000008 [2705] netmap_attach success for ix0 tx 8/4096 rx
>> >> 8/4096 queues/slots
>> >> ix1: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 2.5.15=
>
>> port
>> >> 0x6000-0x601f mem 0xc7a00000-0xc7bfffff,0xc7e00000-0xc7e03fff irq 44 =
at
>> >> device 0.1 on pci3
>> >> ix1: Using MSIX interrupts with 9 vectors
>> >> ix1: Bound queue 0 to cpu 8
>> >> ix1: Bound queue 1 to cpu 9
>> >> ix1: Bound queue 2 to cpu 10
>> >> ix1: Bound queue 3 to cpu 11
>> >> ix1: Bound queue 4 to cpu 12
>> >> ix1: Bound queue 5 to cpu 13
>> >> ix1: Bound queue 6 to cpu 14
>> >> ix1: Bound queue 7 to cpu 15
>> >> ix1: Ethernet address: 0c:c4:7a:5e:be:65
>> >> ix1: PCI Express Bus: Speed 5.0GT/s Width x8
>> >> 001.000009 [2705] netmap_attach success for ix1 tx 8/4096 rx
>> >> 8/4096 queues/slots
>> >>
>> >> On Tue, Aug 11, 2015 at 4:14 PM, Olivier Cochard-Labb=C3=A9 <
>> olivier@cochard.me>
>> >> wrote:
>> >>
>> >>>  On Tue, Aug 11, 2015 at 11:18 PM, Maxim Sobolev <sobomax@freebsd.or=
g
>> >
>> >>>  wrote:
>> >>>
>> >>>>  Hi folks,
>> >>>>
>> >>>>  =E2=80=8BHi,
>> >>>  =E2=80=8B
>> >>>
>> >>>>  We've trying to migrate some of our high-PPS systems to a new
>> hardware
>> >>>>  that
>> >>>>  has four X540-AT2 10G NICs and observed that interrupt time goes
>> through
>> >>>>  roof after we cross around 200K PPS in and 200K out (two ports in
>> LACP).
>> >>>>  The previous hardware was stable up to about 350K PPS in and 350K
>> out. I
>> >>>>  believe the old one was equipped with the I350 and had the
>> identical LACP
>> >>>>  configuration. The new box also has better CPU with more cores
>> (i.e. 24
>> >>>>  cores vs. 16 cores before). CPU itself is 2 x E5-2690 v3.
>> >>>  =E2=80=8B200K PPS, and even 350K PPS are very low value indeed.
>> >>>  On a Intel Xeon L5630 (4 cores only) with one X540-AT2=E2=80=8B
>> >>>
>> >>>  =E2=80=8B(then 2 10Gigabit ports)=E2=80=8B I've reached about 1.8Mp=
ps (fastforwarding
>> >>>  enabled) [1].
>> >>>  But my setup didn't use lagg(4): Can you disable lagg configuration
>> and
>> >>>  re-measure your performance without lagg ?
>> >>>
>> >>>  Do you let Intel NIC drivers using 8 queues for port too?
>> >>>  In my use case (forwarding smallest UDP packet size), I obtain bett=
er
>> >>>  behaviour by limiting NIC queues to 4 (hw.ix.num_queues or
>> >>>  hw.ixgbe.num_queues, don't remember) if my system had 8 cores. And
>> this
>> >>>  with Gigabit Intel[2] or Chelsio NIC [3].
>> >>>
>> >>>  Don't forget to disable TSO and LRO too.
>> >>>
>> >>>  =E2=80=8BRegards,
>> >>>
>> >>>  Olivier
>> >>>
>> >>>  [1]
>> >>>
>> http://bsdrp.net/documentation/examples/forwarding_performance_lab_of_an=
_ibm_system_x3550_m3_with_10-gigabit_intel_x540-at2#graphs
>> >>>  [2]
>> >>>
>> http://bsdrp.net/documentation/examples/forwarding_performance_lab_of_a_=
superserver_5018a-ftn4#graph1
>> >>>  [3]
>> >>>
>> http://bsdrp.net/documentation/examples/forwarding_performance_lab_of_a_=
hp_proliant_dl360p_gen8_with_10-gigabit_with_10-gigabit_chelsio_t540-cr#red=
ucing_nic_queues
>> >> _______________________________________________
>> >> freebsd-net@freebsd.org mailing list
>> >> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>> >> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org=
"
>> > _______________________________________________
>> > freebsd-net@freebsd.org mailing list
>> > http://lists.freebsd.org/mailman/listinfo/freebsd-net
>> > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>> _______________________________________________
>> freebsd-net@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
>
>
>
> --
> -----------------------------------------+-------------------------------
>  Prof. Luigi RIZZO, rizzo@iet.unipi.it  . Dip. di Ing. dell'Informazione
>  http://www.iet.unipi.it/~luigi/        . Universita` di Pisa
>  TEL      +39-050-2217533               . via Diotisalvi 2
>  Mobile   +39-338-6809875               . 56122 PISA (Italy)
> -----------------------------------------+-------------------------------
>
>


--=20
Maksym Sobolyev
Sippy Software, Inc.
Internet Telephony (VoIP) Experts
Tel (Canada): +1-778-783-0474
Tel (Toll-Free): +1-855-747-7779
Fax: +1-866-857-6942
Web: http://www.sippysoft.com
MSN: sales@sippysoft.com
Skype: SippySoft