From owner-freebsd-net@freebsd.org Tue Aug 11 23:05:00 2015 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 83B1499FF3E for ; Tue, 11 Aug 2015 23:05:00 +0000 (UTC) (envelope-from sobomax@sippysoft.com) Received: from mail-wi0-f182.google.com (mail-wi0-f182.google.com [209.85.212.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 1C693A6E for ; Tue, 11 Aug 2015 23:04:59 +0000 (UTC) (envelope-from sobomax@sippysoft.com) Received: by wicne3 with SMTP id ne3so79387275wic.0 for ; Tue, 11 Aug 2015 16:04:58 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=vMRF75p09oZGRr1Ib3Cl/32DGFtuRvlL/nRJz3MG1wo=; b=QclTyAWGLlaAXxJKDAbW8SgzrYYz0h1GwbgVBOEgCf4Uvh29UMpsHRHqyMQ5Lnn9lW +p6/aACNtLY6t+U1yA94MZC3QEbEsHqgYjxG8lCCRsnK+ttLU9q8wHn0KBNGquzgxwTv HKLoyT+uHQSHJYDx9ws4g+jv9iU2lvaEG3SMZ9EaY1xROT4iP9+DYd+kP2vOeWCqRkDe Z4DNigynzTsf3kGszYaN0Llym/u3g7L5DVl5mZc2kKersNmAJyo67HDoQnEJ8eeRU++x dklgIgT8RNXAm8F1Em4NWD2HpfCWpr0FTBf79gQpg9Bzk2udLFT15qWz5rUl89Ol4OoF vGKg== X-Gm-Message-State: ALoCoQn4taKotiwSp37RTx7Ph0sLiyOzxJj/IVdv0AvfY+KGymkG5t2iMW8nBzwmsIolcXGwF32H MIME-Version: 1.0 X-Received: by 10.180.81.100 with SMTP id z4mr39617940wix.8.1439334298213; Tue, 11 Aug 2015 16:04:58 -0700 (PDT) Sender: sobomax@sippysoft.com Received: by 10.27.143.15 with HTTP; Tue, 11 Aug 2015 16:04:58 -0700 (PDT) In-Reply-To: <20150811221626.GE96509@strugglingcoder.info> References: <20150811221626.GE96509@strugglingcoder.info> Date: Tue, 11 Aug 2015 16:04:58 -0700 X-Google-Sender-Auth: nhz1L6qc9BDtNJk-2YQPno9y9hw Message-ID: Subject: Re: Poor high-PPS performance of the 10G ixgbe(9) NIC/driver in FreeBSD 10.1 From: Maxim Sobolev To: hiren panchasara Cc: Adrian Chadd , FreeBSD Net , freebsd@intel.com, "Jev Bj?rsell" Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Aug 2015 23:05:00 -0000 Here it is, the distribution looks pretty normal to me. dev.ix.0.queue0.tx_packets: 846233384 dev.ix.0.queue0.rx_packets: 856092418 dev.ix.0.queue1.tx_packets: 980356163 dev.ix.0.queue1.rx_packets: 922935329 dev.ix.0.queue2.tx_packets: 970700307 dev.ix.0.queue2.rx_packets: 907776311 dev.ix.0.queue3.tx_packets: 951911927 dev.ix.0.queue3.rx_packets: 903933007 dev.ix.0.queue4.tx_packets: 960075438 dev.ix.0.queue4.rx_packets: 909830391 dev.ix.0.queue5.tx_packets: 957304026 dev.ix.0.queue5.rx_packets: 889722162 dev.ix.0.queue6.tx_packets: 946175921 dev.ix.0.queue6.rx_packets: 898922310 dev.ix.0.queue7.tx_packets: 936382885 dev.ix.0.queue7.rx_packets: 890026885 dev.ix.1.queue0.tx_packets: 844847347 dev.ix.1.queue0.rx_packets: 840770906 dev.ix.1.queue1.tx_packets: 978807036 dev.ix.1.queue1.rx_packets: 906148213 dev.ix.1.queue2.tx_packets: 969026390 dev.ix.1.queue2.rx_packets: 906644000 dev.ix.1.queue3.tx_packets: 950384414 dev.ix.1.queue3.rx_packets: 890646445 dev.ix.1.queue4.tx_packets: 958536903 dev.ix.1.queue4.rx_packets: 887900309 dev.ix.1.queue5.tx_packets: 955802045 dev.ix.1.queue5.rx_packets: 884884583 dev.ix.1.queue6.tx_packets: 944802927 dev.ix.1.queue6.rx_packets: 883266179 dev.ix.1.queue7.tx_packets: 934953601 dev.ix.1.queue7.rx_packets: 886399283 On Tue, Aug 11, 2015 at 3:16 PM, hiren panchasara < hiren@strugglingcoder.info> wrote: > On 08/11/15 at 03:01P, Adrian Chadd wrote: > > hi, > > > > Are you able to graph per-queue interrupt rates? > > > > It looks like the traffic is distributed differently (the first two > > queues are taking interrupts). > > Yeah, also check out "# sysctl dev.ix | grep packets" > > > > Does 10.1 have the flow director code disabled? I remember there was > > some .. interesting behaviour with ixgbe where it'd look at traffic > > and set up flow director rules to try and "balance" things. It was > > buggy and programmed the hardware badly, so we disabled it in at least > > -HEAD. > > Looks like we don't build with IXGBE_FDIR by default on 10 so I assume > it's off. > > There were some lagg/hashing related changes recently so let us know if > that is hurting you. > > Cheers, > Hiren > > > > > > > > -adrian > > > > > > On 11 August 2015 at 14:18, Maxim Sobolev wrote: > > > Hi folks, > > > > > > We've trying to migrate some of our high-PPS systems to a new hardware > that > > > has four X540-AT2 10G NICs and observed that interrupt time goes > through > > > roof after we cross around 200K PPS in and 200K out (two ports in > LACP). > > > The previous hardware was stable up to about 350K PPS in and 350K out. > I > > > believe the old one was equipped with the I350 and had the identical > LACP > > > configuration. The new box also has better CPU with more cores (i.e. 24 > > > cores vs. 16 cores before). CPU itself is 2 x E5-2690 v3. > > > > > > After hitting this limit with the default settings, I've tried to > tweak the > > > following settings: > > > > > > hw.ix.rx_process_limit="-1" > > > hw.ix.tx_process_limit="-1" > > > hw.ix.enable_aim="0" > > > hw.ix.max_interrupt_rate="-1" > > > hw.ix.rxd="4096" > > > hw.ix.txd="4096" > > > > > > dev.ix.0.fc=0 > > > dev.ix.1.fc=0 > > > dev.ix.2.fc=0 > > > dev.ix.3.fc=0 > > > > > > hw.intr_storm_threshold=0 > > > > > > But there is little or no effect on the performance. The workload is > just > > > lot of small UDP packets being relayed between bunch of hosts. The > symptoms > > > are always the same - the box runs nice and cool until it his the said > PPS > > > threshold, with kernel spending just few percent in the interrupts and > then > > > it jumps straight to 100% interrupt time, thereby scaring some traffic > away > > > due to packet loss and such, so that the load drops and the system goes > > > into the "cool" state again. It looks very much like some contention > in the > > > driver or in the hardware. Linked are some monitoring screenshots > > > displaying the issue unfolding as well as systat -vm screenshots from > the > > > "cool" state. > > > > > > http://sobomax.sippysoft.com/ScreenShot387.png <- CPU utilization > right > > > before the "bang event" > > > http://sobomax.sippysoft.com/ScreenShot382.png <- issue itself > > > http://sobomax.sippysoft.com/ScreenShot385.png <- systat -vm few > minutes > > > after traffic declined somewhat > > > > > > We are now trying to get customer install 1Gig NIC so that we can run > it > > > and compare performance with the rest of the hardware and software > being > > > essentially the same. > > > > > > Any ideas on how to improve/resolve this problem are welcome. Thanks! > > > _______________________________________________ > > > freebsd-net@freebsd.org mailing list > > > http://lists.freebsd.org/mailman/listinfo/freebsd-net > > > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > > _______________________________________________ > > freebsd-net@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-net > > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >