From owner-freebsd-hackers@FreeBSD.ORG Tue May 28 14:51:21 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id E2D7A60F; Tue, 28 May 2013 14:51:21 +0000 (UTC) (envelope-from afischer@marvell.com) Received: from na3sys009aog101.obsmtp.com (na3sys009aog101.obsmtp.com [74.125.149.67]) by mx1.freebsd.org (Postfix) with ESMTP id 1D62AFDB; Tue, 28 May 2013 14:51:20 +0000 (UTC) Received: from sc-owa02.marvell.com ([199.233.58.137]) (using TLSv1) by na3sys009aob101.postini.com ([74.125.148.12]) with SMTP ID DSNKUaTEYSee9enMk03OF4IysqFNWj38JMcA@postini.com; Tue, 28 May 2013 07:51:21 PDT Received: from maili.marvell.com (10.93.76.43) by sc-owa02.marvell.com (10.93.76.22) with Microsoft SMTP Server id 8.3.213.0; Tue, 28 May 2013 07:48:30 -0700 Received: from [10.9.2.76] (unknown [10.9.2.76]) by maili.marvell.com (Postfix) with ESMTP id EE9941CCD9C; Tue, 28 May 2013 07:48:29 -0700 (PDT) Subject: Re: Low Tx-Rx performance with 10Gb NICs From: Axel Fischer To: Andre Oppermann , , In-Reply-To: <51A4BBBC.8020405@freebsd.org> References: <175CCF5F49938B4D99B2E3EF7F558EBE381FA6E5AA@SC-VEXCH4.marvell.com> <1369406798.20748.30.camel@EL-DT095.site> <51A4BBBC.8020405@freebsd.org> Content-Type: text/plain; charset="UTF-8" Date: Tue, 28 May 2013 16:48:16 +0200 Message-ID: <1369752496.14405.42.camel@EL-DT095.site> MIME-Version: 1.0 X-Mailer: Evolution 2.32.1 Content-Transfer-Encoding: 8bit X-Mailman-Approved-At: Tue, 28 May 2013 15:34:54 +0000 Cc: freebsd-hackers X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 May 2013 14:51:22 -0000 Hi Andre, all The driver we used is an Intel ixgbe driver. We use this driver as a reference for our own Marvell driver. As on Linux our approach is to guarantee a lock free data transmission between rx and tx, since our HW supports this. The ixgbe driver should not perform any serialization between rx and tx too. Furthermore we know from the FreeBSD performance guide that the Intel NIC really shows a performance of more than 18GBit/s .... The problem (low performance) that we have is likely related to the fact that we first did not enable LRO on the Intel driver, that means that we feed the protocol stack with single or multiple rx frames on a per interrupt base. Enabling LRO and setting processor affinity changed the behaviour and we finally saw 18 Gbit/s on Intel .... See my results / questions below: ================================= - We got the expected performance on FreeBSD 9.0 (32bit) and 9.1 (64bit) with (ixgbe): 1) LRO enabled (SW in Kernel,not on NIC) and 2) Processor affinity set for the receive interrupts of the NIC to one CPU (e.g.7). and 3) Processor affinity of the netperf process set to CPU 0 (rx) and CPU 1 (tx). But if we do not enable LRO or do not set processor affinity of the rx interrupts of the NIC we got a very bad performance of about 11 Gbit/s. Especially tx performance descreased dramatically durning a tx/rx netperf test using 4 tx streams and 4 rx streams ... => So my questions are: 1) Are "processor affinity" and "LRO" an essential requirement for appropriate duplex 10GBit performance ? 2) Why does tx performance decrease dramatically if we do not use LRO or proc.-affinity on the receive side ... ? 3) Is the behaviour probably related to the Intel platform (CPU related) (Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz) ...? Best Regards, Axel -------- Weitergeleitete Nachricht -------- Von: Andre Oppermann An: Axel Fischer Kopie: freebsd-hackers Betreff: Re: Low Tx-Rx performance with 10Gb NICs Datum: Tue, 28 May 2013 07:14:20 -0700 On 24.05.2013 16:46, Axel Fischer wrote: > Hi Igor, > > my name is Axel Fischer. working at Marvell SC. Hi Axel, > In addition to your reply to my colleague Lino > Sanfilippo I did some performance measurements > on FreeBSD 9 with a commercial 10 GBit network > card. Which driver? > Unlike on other OS the FreeBSD performance > for duplex rx/tx operation never exceeded the > limit about 9.5 GBit/s. Normally a performance > of at least 16 GBit (up to line speed 20 GBit > in duplex mode) is expected. > As Lino already mentioned the CPU/bus system > (in general the HW) does not set a limit. > Furthermore I noticed that the CPU(s) load > is not very high, about 30 %. Your RX/TX is probably serialized in the driver and you can only make use of one core. > Here is an overview of the measurements: > > netperf rx-tx 4 streams / 60s > > 1768.16 Mb/s Port=2001 RX > 999.33 Mb/s Port=2002 RX > 72.16 Mb/s Port=1001 TX > 61.49 Mb/s Port=1002 TX > 2302.76 Mb/s Port=2003 RX > 73.48 Mb/s Port=1003 TX > 2416.23 Mb/s Port=2004 RX > 76.02 Mb/s Port=1004 TX > ==== > RX+TX Total Result: Mb/s 7769.63 > > > CPU load: > > last pid: 1739; load averages: 0.97, 0.49, 0.21 up 0+00:02:26 > 11:02:52 > 46 processes: 2 running, 44 sleeping > CPU 0: 2.0% user, 0.0% nice, 23.2% system, 0.4% interrupt, 74.4% idle > CPU 1: 1.2% user, 0.0% nice, 19.7% system, 0.4% interrupt, 78.7% idle > CPU 2: 0.0% user, 0.0% nice, 0.0% system, 80.7% interrupt, 19.3% idle > CPU 3: 0.0% user, 0.0% nice, 0.8% system, 1.6% interrupt, 97.6% idle > CPU 4: 2.4% user, 0.0% nice, 25.6% system, 0.0% interrupt, 72.0% idle > CPU 5: 3.1% user, 0.0% nice, 25.6% system, 0.0% interrupt, 71.3% idle > CPU 6: 0.0% user, 0.0% nice, 0.4% system, 32.7% interrupt, 66.9% idle > CPU 7: 0.4% user, 0.0% nice, 1.6% system, 0.0% interrupt, 98.0% idle > Mem: 14M Active, 7548K Inact, 66M Wired, 24K Cache, 16M Buf, 3326M Free > Swap: 4096M Total, 4096M Free > > Additionally I noticed the following TCP errors > with netstat -s ...: > > 1186 data packets (1717328 bytes) retransmitted This may happen and is typically not cause for concern on a loaded system. > 6847875 window update packets Normal. > 2319 duplicate acks Related to the retransmits. > 25831 out-of-order packets (37403288 bytes) This is unusual. What kind of test setup do you have, back-to-back cards or a switch in between? Out of order normally shouldn't happen unless over the internet. > 3733 discarded due to memory problems (drops) > 1186 segment rexmits in SACK recovery episodes > 1717328 byte rexmits in SACK recovery episodes Again related to retransmits. > My questions: > > - What is the max. performance (duplex) on > FreeBSD 9 that you have measured with a 10 GBit > NIC ? > (Expected > 16 GBit/s on appropriate HW) Certain large CDN are known to push more than 20Gbit/s production traffic per machine. Please also my other message from today to hackers@ "Re: preemptive kernel" with Message-ID: <51A4B991.3070805@freebsd.org>. -- Axel Fischer | R&D Software / SW Engineer | Marvell Semiconductor Germany GmbH Office +49 (7243) 502 370 | Fax +49 (7243) 502 982 afischer@marvell.com M A R V E L L |www.marvell.com This communication, together with any attachments hereto or links contained herein, is for the sole use of the intended recipient(s) and may contain information that is confidential or legally protected. If you are not the intended recipient, you are hereby notified that any review, disclosure, copying, dissemination, distribution or use of this communication is STRICTLY PROHIBITED. If you have received this communication in error, please notify the sender immediately by return e-mail message and delete the original and all copies of the communication, along with any attachments hereto or links herein, from your system. Marvell Semiconductor Germany GmbH, Siemensstr. 23, 76275 Ettlingen, Amtsgericht Mannheim HRB 361620 Geschäftsführer: Dipl.-Volksw. Mathias Horak