Date: Tue, 11 Aug 2015 16:04:58 -0700 From: Maxim Sobolev <sobomax@FreeBSD.org> To: hiren panchasara <hiren@strugglingcoder.info> Cc: Adrian Chadd <adrian.chadd@gmail.com>, FreeBSD Net <freebsd-net@freebsd.org>, freebsd@intel.com, "Jev Bj?rsell" <jev@sippysoft.com> Subject: Re: Poor high-PPS performance of the 10G ixgbe(9) NIC/driver in FreeBSD 10.1 Message-ID: <CAH7qZfsDV0rjE3Me2LyO_FncVg5FRgwyPHMUQk3SBeA=VHY_tg@mail.gmail.com> In-Reply-To: <20150811221626.GE96509@strugglingcoder.info> References: <CAH7qZftMB34PM1CqNhdg7AWhsq6YknUDgc60ASfT2Z0L1z8XCQ@mail.gmail.com> <CAJ-Vmo=7XzE0SYfG__Y7qee9jZ1qKOOuNPY2TFPJfD2-06Mk5g@mail.gmail.com> <20150811221626.GE96509@strugglingcoder.info>
next in thread | previous in thread | raw e-mail | index | archive | help
Here it is, the distribution looks pretty normal to me. dev.ix.0.queue0.tx_packets: 846233384 dev.ix.0.queue0.rx_packets: 856092418 dev.ix.0.queue1.tx_packets: 980356163 dev.ix.0.queue1.rx_packets: 922935329 dev.ix.0.queue2.tx_packets: 970700307 dev.ix.0.queue2.rx_packets: 907776311 dev.ix.0.queue3.tx_packets: 951911927 dev.ix.0.queue3.rx_packets: 903933007 dev.ix.0.queue4.tx_packets: 960075438 dev.ix.0.queue4.rx_packets: 909830391 dev.ix.0.queue5.tx_packets: 957304026 dev.ix.0.queue5.rx_packets: 889722162 dev.ix.0.queue6.tx_packets: 946175921 dev.ix.0.queue6.rx_packets: 898922310 dev.ix.0.queue7.tx_packets: 936382885 dev.ix.0.queue7.rx_packets: 890026885 dev.ix.1.queue0.tx_packets: 844847347 dev.ix.1.queue0.rx_packets: 840770906 dev.ix.1.queue1.tx_packets: 978807036 dev.ix.1.queue1.rx_packets: 906148213 dev.ix.1.queue2.tx_packets: 969026390 dev.ix.1.queue2.rx_packets: 906644000 dev.ix.1.queue3.tx_packets: 950384414 dev.ix.1.queue3.rx_packets: 890646445 dev.ix.1.queue4.tx_packets: 958536903 dev.ix.1.queue4.rx_packets: 887900309 dev.ix.1.queue5.tx_packets: 955802045 dev.ix.1.queue5.rx_packets: 884884583 dev.ix.1.queue6.tx_packets: 944802927 dev.ix.1.queue6.rx_packets: 883266179 dev.ix.1.queue7.tx_packets: 934953601 dev.ix.1.queue7.rx_packets: 886399283 On Tue, Aug 11, 2015 at 3:16 PM, hiren panchasara < hiren@strugglingcoder.info> wrote: > On 08/11/15 at 03:01P, Adrian Chadd wrote: > > hi, > > > > Are you able to graph per-queue interrupt rates? > > > > It looks like the traffic is distributed differently (the first two > > queues are taking interrupts). > > Yeah, also check out "# sysctl dev.ix | grep packets" > > > > Does 10.1 have the flow director code disabled? I remember there was > > some .. interesting behaviour with ixgbe where it'd look at traffic > > and set up flow director rules to try and "balance" things. It was > > buggy and programmed the hardware badly, so we disabled it in at least > > -HEAD. > > Looks like we don't build with IXGBE_FDIR by default on 10 so I assume > it's off. > > There were some lagg/hashing related changes recently so let us know if > that is hurting you. > > Cheers, > Hiren > > > > > > > > -adrian > > > > > > On 11 August 2015 at 14:18, Maxim Sobolev <sobomax@freebsd.org> wrote: > > > Hi folks, > > > > > > We've trying to migrate some of our high-PPS systems to a new hardware > that > > > has four X540-AT2 10G NICs and observed that interrupt time goes > through > > > roof after we cross around 200K PPS in and 200K out (two ports in > LACP). > > > The previous hardware was stable up to about 350K PPS in and 350K out. > I > > > believe the old one was equipped with the I350 and had the identical > LACP > > > configuration. The new box also has better CPU with more cores (i.e. 24 > > > cores vs. 16 cores before). CPU itself is 2 x E5-2690 v3. > > > > > > After hitting this limit with the default settings, I've tried to > tweak the > > > following settings: > > > > > > hw.ix.rx_process_limit="-1" > > > hw.ix.tx_process_limit="-1" > > > hw.ix.enable_aim="0" > > > hw.ix.max_interrupt_rate="-1" > > > hw.ix.rxd="4096" > > > hw.ix.txd="4096" > > > > > > dev.ix.0.fc=0 > > > dev.ix.1.fc=0 > > > dev.ix.2.fc=0 > > > dev.ix.3.fc=0 > > > > > > hw.intr_storm_threshold=0 > > > > > > But there is little or no effect on the performance. The workload is > just > > > lot of small UDP packets being relayed between bunch of hosts. The > symptoms > > > are always the same - the box runs nice and cool until it his the said > PPS > > > threshold, with kernel spending just few percent in the interrupts and > then > > > it jumps straight to 100% interrupt time, thereby scaring some traffic > away > > > due to packet loss and such, so that the load drops and the system goes > > > into the "cool" state again. It looks very much like some contention > in the > > > driver or in the hardware. Linked are some monitoring screenshots > > > displaying the issue unfolding as well as systat -vm screenshots from > the > > > "cool" state. > > > > > > http://sobomax.sippysoft.com/ScreenShot387.png <- CPU utilization > right > > > before the "bang event" > > > http://sobomax.sippysoft.com/ScreenShot382.png <- issue itself > > > http://sobomax.sippysoft.com/ScreenShot385.png <- systat -vm few > minutes > > > after traffic declined somewhat > > > > > > We are now trying to get customer install 1Gig NIC so that we can run > it > > > and compare performance with the rest of the hardware and software > being > > > essentially the same. > > > > > > Any ideas on how to improve/resolve this problem are welcome. Thanks! > > > _______________________________________________ > > > freebsd-net@freebsd.org mailing list > > > http://lists.freebsd.org/mailman/listinfo/freebsd-net > > > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > > _______________________________________________ > > freebsd-net@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-net > > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAH7qZfsDV0rjE3Me2LyO_FncVg5FRgwyPHMUQk3SBeA=VHY_tg>