Date: Fri, 11 Jul 2014 13:28:21 -0400 From: John Jasem <jjasen@gmail.com> To: FreeBSD Net <freebsd-net@freebsd.org>, Navdeep Parhar <nparhar@gmail.com> Subject: tuning routing using cxgbe and T580-CR cards? Message-ID: <53C01EB5.6090701@gmail.com>
next in thread | raw e-mail | index | archive | help
In testing two Chelsio T580-CR dual port cards with FreeBSD 10-STABLE, I've been able to use a collection of clients to generate approximately 1.5-1.6 million TCP packets per second sustained, and routinely hit 10GB/s, both measured by netstat -d -b -w1 -W (I usually use -h for the quick read, accepting the loss of granularity). While performance has so far been stellar, and I'm honestly speculating I will need more CPU depth and horsepower to get much faster, I'm curious if there is any gain to tweaking performance settings. I'm seeing, under multiple streams, with N targets connecting to N servers, interrupts on all CPUs peg at 99-100%, and I'm curious if tweaking configs will help, or its a free clue to get more horsepower. So, far, except for temporarily turning off pflogd, and setting the following sysctl variables, I've not done any performance tuning on the system yet. /etc/sysctl.conf net.inet.ip.fastforwarding=1 kern.random.sys.harvest.ethernet=0 kern.random.sys.harvest.point_to_point=0 kern.random.sys.harvest.interrupt=0 a) One of the first things I did in prior testing was to turn hyperthreading off. I presume this is still prudent, as HT doesn't help with interrupt handling? b) I briefly experimented with using cpuset(1) to stick interrupts to physical CPUs, but it offered no performance enhancements, and indeed, appeared to decrease performance by 10-20%. Has anyone else tried this? What were your results? c) the defaults for the cxgbe driver appear to be 8 rx queues, and N tx queues, with N being the number of CPUs detected. For a system running multiple cards, routing or firewalling, does this make sense, or would balancing tx and rx be more ideal? And would reducing queues per card based on NUMBER-CPUS and NUM-CHELSIO-PORTS make sense at all? d) dev.cxl.$PORT.qsize_rxq: 1024 and dev.cxl.$PORT.qsize_txq: 1024. These appear to not be writeable when if_cxgbe is loaded, so I speculate they are not to be messed with, or are loader.conf variables? Is there any benefit to messing with them? e) dev.t5nex.$CARD.toe.sndbuf: 262144. These are writeable, but messing with values did not yield an immediate benefit. Am I barking up the wrong tree, trying? f) based on prior experiments with other vendors, I tried tweaks to net.isr.* settings, but did not see any benefits worth discussing. Am I correct in this speculation, based on others experience? g) Are there other settings I should be looking at, that may squeeze out a few more packets? Thanks in advance! -- John Jasen (jjasen@gmail.com)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?53C01EB5.6090701>