Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 11 Aug 2015 14:18:19 -0700
From:      Maxim Sobolev <sobomax@FreeBSD.org>
To:        FreeBSD Net <freebsd-net@freebsd.org>, freebsd@intel.com
Cc:        =?UTF-8?Q?Jev_Bj=C3=B6rsell?= <jev@sippysoft.com>
Subject:   Poor high-PPS performance of the 10G ixgbe(9) NIC/driver in FreeBSD 10.1
Message-ID:  <CAH7qZftMB34PM1CqNhdg7AWhsq6YknUDgc60ASfT2Z0L1z8XCQ@mail.gmail.com>

next in thread | raw e-mail | index | archive | help
Hi folks,

We've trying to migrate some of our high-PPS systems to a new hardware that
has four X540-AT2 10G NICs and observed that interrupt time goes through
roof after we cross around 200K PPS in and 200K out (two ports in LACP).
The previous hardware was stable up to about 350K PPS in and 350K out. I
believe the old one was equipped with the I350 and had the identical LACP
configuration. The new box also has better CPU with more cores (i.e. 24
cores vs. 16 cores before). CPU itself is 2 x E5-2690 v3.

After hitting this limit with the default settings, I've tried to tweak the
following settings:

hw.ix.rx_process_limit="-1"
hw.ix.tx_process_limit="-1"
hw.ix.enable_aim="0"
hw.ix.max_interrupt_rate="-1"
hw.ix.rxd="4096"
hw.ix.txd="4096"

dev.ix.0.fc=0
dev.ix.1.fc=0
dev.ix.2.fc=0
dev.ix.3.fc=0

hw.intr_storm_threshold=0

But there is little or no effect on the performance. The workload is just
lot of small UDP packets being relayed between bunch of hosts. The symptoms
are always the same - the box runs nice and cool until it his the said PPS
threshold, with kernel spending just few percent in the interrupts and then
it jumps straight to 100% interrupt time, thereby scaring some traffic away
due to packet loss and such, so that the load drops and the system goes
into the "cool" state again. It looks very much like some contention in the
driver or in the hardware. Linked are some monitoring screenshots
displaying the issue unfolding as well as systat -vm screenshots from the
"cool" state.

http://sobomax.sippysoft.com/ScreenShot387.png <- CPU utilization right
before the "bang event"
http://sobomax.sippysoft.com/ScreenShot382.png <- issue itself
http://sobomax.sippysoft.com/ScreenShot385.png <- systat -vm few minutes
after traffic declined somewhat

We are now trying to get customer install 1Gig NIC so that we can run it
and compare performance with the rest of the hardware and software being
essentially the same.

Any ideas on how to improve/resolve this problem are welcome. Thanks!



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAH7qZftMB34PM1CqNhdg7AWhsq6YknUDgc60ASfT2Z0L1z8XCQ>