From owner-freebsd-net@FreeBSD.ORG Fri Jul 11 17:28:26 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 9AA7E58C for ; Fri, 11 Jul 2014 17:28:26 +0000 (UTC) Received: from mail-ig0-x233.google.com (mail-ig0-x233.google.com [IPv6:2607:f8b0:4001:c05::233]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 66133221A for ; Fri, 11 Jul 2014 17:28:26 +0000 (UTC) Received: by mail-ig0-f179.google.com with SMTP id h18so37249igc.12 for ; Fri, 11 Jul 2014 10:28:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject :content-type:content-transfer-encoding; bh=KC8IXfalIp1wXlsoBr9z9i5Nc0DkcNndfzrgVXlzc78=; b=xRfraO1tusq/eVE141u6+fA6fjLSapYapSGjP6FXBR+6kD3Q7j0hmE3wYy7CRfroBD sbjjuzYdjqKMZet8VCHv9w8TuMU8YENPV243h8vM9Y72L1zzIwZjc82kvS8bDlKhUcH1 2TGyYe4TQwkVXjEXqLHNBjdgq2Jmb/2dnbaRvaXYy2x/tSV//tRiYLPrJQl35GJo7A3t 7RUnf5noFJUQqoR0TxjC6XoGdExVylLF8x2+iJsyLkEZ4iyF2Hb6lCS8BSMoov3l1OMp nuVfZPP0By+gn9xPwabp0WF3AiErSQmnpWLc0eTrS/+LsAD3hOL6+iCIcq/TubQSCHZ9 UhgQ== X-Received: by 10.50.80.116 with SMTP id q20mr6493233igx.22.1405099705606; Fri, 11 Jul 2014 10:28:25 -0700 (PDT) Received: from [10.1.68.187] (gs-sv-1-49-ac1.gsfc.nasa.gov. [198.119.56.43]) by mx.google.com with ESMTPSA id dz3sm7710410igb.3.2014.07.11.10.28.22 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 11 Jul 2014 10:28:24 -0700 (PDT) Message-ID: <53C01EB5.6090701@gmail.com> Date: Fri, 11 Jul 2014 13:28:21 -0400 From: John Jasem User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: FreeBSD Net , Navdeep Parhar Subject: tuning routing using cxgbe and T580-CR cards? X-Enigmail-Version: 1.5.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Jul 2014 17:28:26 -0000 In testing two Chelsio T580-CR dual port cards with FreeBSD 10-STABLE, I've been able to use a collection of clients to generate approximately 1.5-1.6 million TCP packets per second sustained, and routinely hit 10GB/s, both measured by netstat -d -b -w1 -W (I usually use -h for the quick read, accepting the loss of granularity). While performance has so far been stellar, and I'm honestly speculating I will need more CPU depth and horsepower to get much faster, I'm curious if there is any gain to tweaking performance settings. I'm seeing, under multiple streams, with N targets connecting to N servers, interrupts on all CPUs peg at 99-100%, and I'm curious if tweaking configs will help, or its a free clue to get more horsepower. So, far, except for temporarily turning off pflogd, and setting the following sysctl variables, I've not done any performance tuning on the system yet. /etc/sysctl.conf net.inet.ip.fastforwarding=1 kern.random.sys.harvest.ethernet=0 kern.random.sys.harvest.point_to_point=0 kern.random.sys.harvest.interrupt=0 a) One of the first things I did in prior testing was to turn hyperthreading off. I presume this is still prudent, as HT doesn't help with interrupt handling? b) I briefly experimented with using cpuset(1) to stick interrupts to physical CPUs, but it offered no performance enhancements, and indeed, appeared to decrease performance by 10-20%. Has anyone else tried this? What were your results? c) the defaults for the cxgbe driver appear to be 8 rx queues, and N tx queues, with N being the number of CPUs detected. For a system running multiple cards, routing or firewalling, does this make sense, or would balancing tx and rx be more ideal? And would reducing queues per card based on NUMBER-CPUS and NUM-CHELSIO-PORTS make sense at all? d) dev.cxl.$PORT.qsize_rxq: 1024 and dev.cxl.$PORT.qsize_txq: 1024. These appear to not be writeable when if_cxgbe is loaded, so I speculate they are not to be messed with, or are loader.conf variables? Is there any benefit to messing with them? e) dev.t5nex.$CARD.toe.sndbuf: 262144. These are writeable, but messing with values did not yield an immediate benefit. Am I barking up the wrong tree, trying? f) based on prior experiments with other vendors, I tried tweaks to net.isr.* settings, but did not see any benefits worth discussing. Am I correct in this speculation, based on others experience? g) Are there other settings I should be looking at, that may squeeze out a few more packets? Thanks in advance! -- John Jasen (jjasen@gmail.com)