From owner-freebsd-current@FreeBSD.ORG Thu Dec 8 10:06:42 2011 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8897A106564A; Thu, 8 Dec 2011 10:06:42 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.3.230]) by mx1.freebsd.org (Postfix) with ESMTP id E2E8F8FC08; Thu, 8 Dec 2011 10:06:41 +0000 (UTC) Received: from dcave.digsys.bg (dcave.digsys.bg [192.92.129.5]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.4/8.14.4) with ESMTP id pB8A6RU9060574 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Thu, 8 Dec 2011 12:06:32 +0200 (EET) (envelope-from daniel@digsys.bg) Message-ID: <4EE08C22.2040500@digsys.bg> Date: Thu, 08 Dec 2011 12:06:26 +0200 From: Daniel Kalchev User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:8.0) Gecko/20111110 Thunderbird/8.0 MIME-Version: 1.0 To: Luigi Rizzo References: <2D87D847-A2B7-4E77-B6C1-61D73C9F582F@digsys.bg> <20111205222834.GA50285@onelab2.iet.unipi.it> <4EDDF9F4.9070508@digsys.bg> <4EDE259B.4010502@digsys.bg> <20111206210625.GB62605@onelab2.iet.unipi.it> <4EDF471F.1030202@freebsd.org> <20111207180807.GA71878@onelab2.iet.unipi.it> <20111207202341.GA72820@onelab2.iet.unipi.it> In-Reply-To: <20111207202341.GA72820@onelab2.iet.unipi.it> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Andre Oppermann , Jack Vogel , current@freebsd.org Subject: Re: quick summary results with ixgbe (was Re: datapoints on 10G throughput with TCP ? X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Dec 2011 10:06:42 -0000 On 07.12.11 22:23, Luigi Rizzo wrote: > > Sorry, forgot to mention that the above is with TSO DISABLED > (which is not the default). TSO seems to have a very bad > interaction with HWCSUM and non-zero mitigation. I have this on both sender and receiver # ifconfig ix1 ix1: flags=8843 metric 0 mtu 1500 options=4bb ether 00:25:90:35:22:f1 inet 10.2.101.11 netmask 0xffffff00 broadcast 10.2.101.255 media: Ethernet autoselect (autoselect ) status: active without LRO on either end # nuttcp -t -T 5 -w 128 -v 10.2.101.11 nuttcp-t: v6.1.2: socket nuttcp-t: buflen=65536, nstream=1, port=5001 tcp -> 10.2.101.11 nuttcp-t: time limit = 5.00 seconds nuttcp-t: connect to 10.2.101.11 with mss=1448, RTT=0.051 ms nuttcp-t: send window size = 131768, receive window size = 66608 nuttcp-t: 1802.4049 MB in 5.06 real seconds = 365077.76 KB/sec = 2990.7170 Mbps nuttcp-t: host-retrans = 0 nuttcp-t: 28839 I/O calls, msec/call = 0.18, calls/sec = 5704.44 nuttcp-t: 0.0user 4.5sys 0:05real 90% 108i+1459d 630maxrss 0+2pf 87706+1csw nuttcp-r: v6.1.2: socket nuttcp-r: buflen=65536, nstream=1, port=5001 tcp nuttcp-r: accept from 10.2.101.12 nuttcp-r: send window size = 33304, receive window size = 131768 nuttcp-r: 1802.4049 MB in 5.18 real seconds = 356247.49 KB/sec = 2918.3794 Mbps nuttcp-r: 529295 I/O calls, msec/call = 0.01, calls/sec = 102163.86 nuttcp-r: 0.1user 3.7sys 0:05real 73% 116i+1567d 618maxrss 0+15pf 230404+0csw with LRO on receiver # nuttcp -t -T 5 -w 128 -v 10.2.101.11 nuttcp-t: v6.1.2: socket nuttcp-t: buflen=65536, nstream=1, port=5001 tcp -> 10.2.101.11 nuttcp-t: time limit = 5.00 seconds nuttcp-t: connect to 10.2.101.11 with mss=1448, RTT=0.067 ms nuttcp-t: send window size = 131768, receive window size = 66608 nuttcp-t: 2420.5000 MB in 5.02 real seconds = 493701.04 KB/sec = 4044.3989 Mbps nuttcp-t: host-retrans = 2 nuttcp-t: 38728 I/O calls, msec/call = 0.13, calls/sec = 7714.08 nuttcp-t: 0.0user 4.1sys 0:05real 83% 107i+1436d 630maxrss 0+2pf 4896+0csw nuttcp-r: v6.1.2: socket nuttcp-r: buflen=65536, nstream=1, port=5001 tcp nuttcp-r: accept from 10.2.101.12 nuttcp-r: send window size = 33304, receive window size = 131768 nuttcp-r: 2420.5000 MB in 5.15 real seconds = 481679.37 KB/sec = 3945.9174 Mbps nuttcp-r: 242266 I/O calls, msec/call = 0.02, calls/sec = 47080.98 nuttcp-r: 0.0user 2.4sys 0:05real 49% 112i+1502d 618maxrss 0+15pf 156333+0csw About 1/4 improvement... With LRO on both sender and receiver # nuttcp -t -T 5 -w 128 -v 10.2.101.11 nuttcp-t: v6.1.2: socket nuttcp-t: buflen=65536, nstream=1, port=5001 tcp -> 10.2.101.11 nuttcp-t: time limit = 5.00 seconds nuttcp-t: connect to 10.2.101.11 with mss=1448, RTT=0.049 ms nuttcp-t: send window size = 131768, receive window size = 66608 nuttcp-t: 2585.7500 MB in 5.02 real seconds = 527402.83 KB/sec = 4320.4840 Mbps nuttcp-t: host-retrans = 1 nuttcp-t: 41372 I/O calls, msec/call = 0.12, calls/sec = 8240.67 nuttcp-t: 0.0user 4.6sys 0:05real 93% 106i+1421d 630maxrss 0+2pf 4286+0csw nuttcp-r: v6.1.2: socket nuttcp-r: buflen=65536, nstream=1, port=5001 tcp nuttcp-r: accept from 10.2.101.12 nuttcp-r: send window size = 33304, receive window size = 131768 nuttcp-r: 2585.7500 MB in 5.15 real seconds = 514585.31 KB/sec = 4215.4829 Mbps nuttcp-r: 282820 I/O calls, msec/call = 0.02, calls/sec = 54964.34 nuttcp-r: 0.0user 2.7sys 0:05real 55% 114i+1540d 618maxrss 0+15pf 188794+147csw Even better... With LRO on sender only: # nuttcp -t -T 5 -w 128 -v 10.2.101.11 nuttcp-t: v6.1.2: socket nuttcp-t: buflen=65536, nstream=1, port=5001 tcp -> 10.2.101.11 nuttcp-t: time limit = 5.00 seconds nuttcp-t: connect to 10.2.101.11 with mss=1448, RTT=0.054 ms nuttcp-t: send window size = 131768, receive window size = 66608 nuttcp-t: 2077.5437 MB in 5.02 real seconds = 423740.81 KB/sec = 3471.2847 Mbps nuttcp-t: host-retrans = 0 nuttcp-t: 33241 I/O calls, msec/call = 0.15, calls/sec = 6621.01 nuttcp-t: 0.0user 4.5sys 0:05real 92% 109i+1468d 630maxrss 0+2pf 49532+25csw nuttcp-r: v6.1.2: socket nuttcp-r: buflen=65536, nstream=1, port=5001 tcp nuttcp-r: accept from 10.2.101.12 nuttcp-r: send window size = 33304, receive window size = 131768 nuttcp-r: 2077.5437 MB in 5.15 real seconds = 413415.33 KB/sec = 3386.6984 Mbps nuttcp-r: 531979 I/O calls, msec/call = 0.01, calls/sec = 103378.67 nuttcp-r: 0.0user 4.5sys 0:05real 88% 110i+1474d 618maxrss 0+15pf 117367+0csw > also remember that hw.ixgbe.max_interrupt_rate has only > effect at module load -- i.e. you set it with the bootloader, > or with kenv before loading the module. I have this in /boot/loader.conf kern.ipc.nmbclusters=512000 hw.ixgbe.max_interrupt_rate=0 on both sender and receiver. > Please retry the measurements disabling tso (on both sides, but > it really matters only on the sender). Also, LRO requires HWCSUM. How do I set HWCSUM? Is this different from RXCSUM/TXCSUM? Still I get nowhere near what you get on my hardware... Here is what pciconf -vlbc has to say ix0@pci0:3:0:0: class=0x020000 card=0xffffffff chip=0x10fc8086 rev=0x01 hdr=0x00 vendor = 'Intel Corporation' class = network subclass = ethernet bar [10] = type Memory, range 64, base 0xfbc00000, size 2097152, enabled bar [18] = type I/O Port, range 32, base 0xdc00, size 32, enabled bar [20] = type Memory, range 64, base 0xfbbfc000, size 16384, enabled cap 01[40] = powerspec 3 supports D0 D3 current D0 cap 05[50] = MSI supports 1 message, 64 bit, vector masks cap 11[70] = MSI-X supports 64 messages in map 0x20 enabled cap 10[a0] = PCI-Express 2 endpoint max data 256(512) link x8(x8) cap 03[e0] = VPD ecap 0001[100] = AER 1 0 fatal 0 non-fatal 1 corrected ecap 0003[140] = Serial 1 002590ffff363f80 ecap 000e[150] = unknown 1 ecap 0010[160] = unknown 1 ix1@pci0:3:0:1: class=0x020000 card=0xffffffff chip=0x10fc8086 rev=0x01 hdr=0x00 vendor = 'Intel Corporation' class = network subclass = ethernet bar [10] = type Memory, range 64, base 0xfb800000, size 2097152, enabled bar [18] = type I/O Port, range 32, base 0xd880, size 32, enabled bar [20] = type Memory, range 64, base 0xfbbf8000, size 16384, enabled cap 01[40] = powerspec 3 supports D0 D3 current D0 cap 05[50] = MSI supports 1 message, 64 bit, vector masks cap 11[70] = MSI-X supports 64 messages in map 0x20 enabled cap 10[a0] = PCI-Express 2 endpoint max data 256(512) link x8(x8) cap 03[e0] = VPD ecap 0001[100] = AER 1 0 fatal 0 non-fatal 1 corrected ecap 0003[140] = Serial 1 002590ffff363f80 ecap 000e[150] = unknown 1 ecap 0010[160] = unknown 1 I am using ix1, as the blade enclosure has only one 10G switch and it happens to be on the 'second' position. Daniel