From owner-freebsd-net@freebsd.org Thu Feb 2 01:09:26 2017 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4A65ECCB6AD for ; Thu, 2 Feb 2017 01:09:26 +0000 (UTC) (envelope-from olivier@freebsd.org) Received: from smtp.smtpout.orange.fr (smtp06.smtpout.orange.fr [80.12.242.128]) by mx1.freebsd.org (Postfix) with ESMTP id 91D8E10BC for ; Thu, 2 Feb 2017 01:09:24 +0000 (UTC) (envelope-from olivier@freebsd.org) Received: from mail-qt0-f177.google.com ([209.85.216.177]) by mwinf5d82 with ME id fd1k1u0083qDe2x03d1lJ6; Thu, 02 Feb 2017 02:01:46 +0100 X-ME-Helo: mail-qt0-f177.google.com X-ME-Auth: Y29jaGFyZC1sYWJiZS5vbGl2aWVyQG9yYW5nZS5mcg== X-ME-Date: Thu, 02 Feb 2017 02:01:46 +0100 X-ME-IP: 209.85.216.177 Received: by mail-qt0-f177.google.com with SMTP id v23so2667354qtb.0 for ; Wed, 01 Feb 2017 17:01:45 -0800 (PST) X-Gm-Message-State: AMke39ls+nSese5lGY60bskWTnEKyjfmZc+59hsSkX33MRZyq8j2km52vAKwXGEeM/ygr3b1aZ0+4qeIspkOnQ== X-Received: by 10.55.180.129 with SMTP id d123mr5421425qkf.158.1485997304235; Wed, 01 Feb 2017 17:01:44 -0800 (PST) MIME-Version: 1.0 Received: by 10.200.49.99 with HTTP; Wed, 1 Feb 2017 17:01:23 -0800 (PST) In-Reply-To: <0cdc69d4-e23f-4beb-c4af-59259529287f@gmail.com> References: <8f637e2e-cd59-dc65-8476-30989bea516b@gmail.com> <20170103174627.GW37118@zxy.spb.ru> <0cdc69d4-e23f-4beb-c4af-59259529287f@gmail.com> From: =?UTF-8?Q?Olivier_Cochard=2DLabb=C3=A9?= Date: Thu, 2 Feb 2017 02:01:23 +0100 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Disappointing packets-per-second performance results on a Dell, PE R530 To: Jordan Caraballo Cc: "freebsd-net@freebsd.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Feb 2017 01:09:26 -0000 =E2=80=8B=E2=80=8B =E2=80=8B=E2=80=8B On Wed, Feb 1, 2017 at 1:00 AM, Jordan Caraballo < jordancaraballo87@gmail.com> wrote: > Hi Oliver, my bad, I missed that one. Here is the info: > > * Switch with 48x 10G ports and 12x 40G ports was used > * (48) 10G connected nodes were used. > * (24) nodes on each side of the firewall > * Packet per second (PPS) tests were run using 'iperf' > * Bandwidth tests were run using 'nuttcp' > * Parallelization was handled by using pdsh > * Each of the 24 sending nodes ran either: > iperf3 -c "" -u -A 5 -l 512 -b 0 -t "" -J > nuttcp -fparse -l 128k -w1m -T "" "" > > =E2=80=8BYour current performance (1.5Mpps) seems to indicate that only one= , perhaps 2 cores maximum are used. Can you confirm that during your iperf bench there are 24 distinct flows simultaneously (different source/destination IP) ? source-IP-1 -> target-IP-1 source-IP-2 -> target-IP-2 ... until source-IP-24 -> target-IP-24 On your dual CPU with 18 cores, chelsio drivers should create per each port= : - 8 RX queues (rxq NIC) - 16 TX queues (txq NIC) Can you check on /var/run/dmesg.boot that you've got something like this: t5nex0: mem 0xfb780000-0xfb7fffff,0xfa0000 00-0xfaffffff,0xf9ff0000-0xf9ff1fff irq 40 at device 0.4 numa-domain 0 on pci7 cxl0: numa-domain 0 on t5nex0 cxl0: Ethernet address: 00:07:43:2e:e4:70 cxl0: 16 txq, 8 rxq (NIC); 8 txq, 2 rxq (TOE) =3D> Notice the 16 txq, 8 rxq (NIC) lines Now, like Slawa says, we should see the 8 IRQ assigned to these 8 rxq and equally used. After your bench, output of a "vmstat -ia | grep t5nex0:0a" should display a minimum of 8 lines with equally distributed number like this example: [root@hp]~# vmstat -ia | grep t5nex0:0a irq292: t5nex0:0a0 37 0 irq293: t5nex0:0a1 288498 629 irq294: t5nex0:0a2 225410 492 irq295: t5nex0:0a3 306227 668 irq296: t5nex0:0a4 282679 617 irq297: t5nex0:0a5 313143 683 irq298: t5nex0:0a6 318727 695 irq299: t5nex0:0a7 308669 673 (my example seems not perfect because queue0 seems under-utilized, but you've got the idea)