From owner-freebsd-net@FreeBSD.ORG Thu Jul 24 13:07:00 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 76FA6C6 for ; Thu, 24 Jul 2014 13:07:00 +0000 (UTC) Received: from mail-ig0-x22c.google.com (mail-ig0-x22c.google.com [IPv6:2607:f8b0:4001:c05::22c]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 424F12E27 for ; Thu, 24 Jul 2014 13:07:00 +0000 (UTC) Received: by mail-ig0-f172.google.com with SMTP id h15so6417386igd.17 for ; Thu, 24 Jul 2014 06:06:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:cc:subject:references :in-reply-to:content-type:content-transfer-encoding; bh=dnXr8UfeGbEg1LfHO5cDD9Ou19N2YILj4Cffe0lVhho=; b=FGP/QoCxPgZco/gbV7+yPI3weIuNLY4kYUGDN/YSRmB5aNA9n4ZRljE8+mAYov0Z1j PUhj1WZzOk281+3e1Faaql5ngxXkWKInv+2m725oZXjJTdUkW+Hv7/xoeS5KOTFhFPuT dgCOx/KJe644DhWZAKh+jPAq5hTiMbCaxVt49vQHD3OXiJlQA275W1A8XtxE9iqnFFCH oIJFtBiZJUaetRm6osVsioLu5CbD3jbcIES25LnoY5cY5CFRQvYkNG3WHIV1xyYjCdVV hGRx0WnKTtQLWZmbg4/wjQoFEWhijhF7P1YtE9gcxjHJZPXimy1ezAk8heM0uPhHAfoX Z4DA== X-Received: by 10.43.156.77 with SMTP id ll13mr12275021icc.81.1406207219566; Thu, 24 Jul 2014 06:06:59 -0700 (PDT) Received: from [10.0.0.215] ([96.234.167.12]) by mx.google.com with ESMTPSA id d4sm23192418igc.5.2014.07.24.06.06.57 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 24 Jul 2014 06:06:58 -0700 (PDT) Message-ID: <53D104F0.6010103@gmail.com> Date: Thu, 24 Jul 2014 09:06:56 -0400 From: John Jasen User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 CC: FreeBSD Net Subject: Re: fastforward/routing: a 3 million packet-per-second system? References: <53CE80DD.9090109@gmail.com> In-Reply-To: X-Enigmail-Version: 1.5.2 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Jul 2014 13:07:00 -0000 As a follow-up, from man cxgbe(4), I changed the following /boot/loader.conf settings. # in discussions with cxgbe maintainer and testing. higher than 48 seems to cause kernel panics. hw.cxgbe.ntxq10g=48 # note, to go above 30-some odd tx queues, you need to set the following to turn off toe queues hw.cxgbe.toecaps_allowed=0 # testing hw.cxgbe.cong_drop options hw.cxgbe.cong_drop=1 Changing ntxq10g and toecaps improved pmc metrics, which previously had shown a lot of time in transmit states, Changing cong_drop is ... probably not a good idea for production environments. However, testing with 1, which allows drop as opposed to tx_pause, I saw two things: a) packet per second rate reliably crossed over 4 million packets per second, as measured via netstat -db 1. b) there were several occasions where the test system became completely unresponsive over the ssh sessions I had established to run netstat, top, et al. On 07/22/2014 02:07 PM, Adrian Chadd wrote: > Hi! > > Well, what's missing is some dtrace/pmc/lockdebugging investigations > into the system to see where it's currently maxing out at. > > I wonder if you're seeing contention on the transmit paths as drivers > queue frames from one set of driver threads/queues to another > potentially completely different set of driver transmit > threads/queues. > > > > > -a > > > On 22 July 2014 08:18, John Jasen wrote: >> Feedback and/or tips and tricks more than welcome. >> >> Outstanding questions: >> >> Would increasing the number of processor cores help? >> >> Would a system where both processor QPI ports connect to each other >> mitigate QPI bottlenecks? >> >> Are there further performance optimizations I am missing? >> >> Server Description: >> >> The system in question is a Dell Poweredge R820, 16GB of RAM, and two >> Intel(R) Xeon(R) CPU E5-4610 0 @ 2.40GHz. >> >> Onboard, in a 16x PCIe slot, I have one Chelsio T-580-CR two-port 40GbE >> NIC, and in an 8x slot, another T-580-CR dual port. >> >> I am running FreeBSD 10.0-STABLE. >> >> BIOS tweaks: >> >> Hyperthreading (or Logical Processors) is turned off. >> Memory Node Interleaving is turned off, but did not appear to impact >> performance. >> >> /boot/loader.conf contents: >> #for CARP+PF testing >> carp_load="YES" >> #load cxgbe drivers. >> cxgbe_load="YES" >> #maxthreads appears to not exceed CPU. >> net.isr.maxthreads=12 >> #bindthreads may be indicated when using cpuset(1) on interrupts >> net.isr.bindthreads=1 >> #random guess based on googling >> net.isr.maxqlimit=60480 >> net.link.ifqmaxlen=90000 >> #discussions with cxgbe maintainer and list led me to trying this. >> Allows more interrupts >> #to be fixed to CPUs, which in some cases, improves interrupt balancing. >> hw.cxgbe.ntxq10g=16 >> hw.cxgbe.nrxq10g=16 >> >> /etc/sysctl.conf contents: >> >> #the following is also enabled by rc.conf gateway_enable. >> net.inet.ip.fastforwarding=1 >> #recommendations from BSD router project >> kern.random.sys.harvest.ethernet=0 >> kern.random.sys.harvest.point_to_point=0 >> kern.random.sys.harvest.interrupt=0 >> #probably should be removed, as cxgbe does not seem to affect/be >> affected by irq storm settings >> hw.intr_storm_threshold=25000000 >> #based on Calomel.Org performance suggestions. 4x40GbE, seemed >> reasonable to use 100GbE settings >> kern.ipc.maxsockbuf=1258291200 >> net.inet.tcp.recvbuf_max=1258291200 >> net.inet.tcp.sendbuf_max=1258291200 >> #attempting to play with ULE scheduler, making it serve packets versus >> netstat >> kern.sched.slice=1 >> kern.sched.interact=1 >> >> /etc/rc.conf contains: >> >> hostname="fbge1" >> #should remove, especially given below duplicate entry >> ifconfig_igb0="DHCP" >> sshd_enable="YES" >> #ntpd_enable="YES" >> # Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable >> dumpdev="AUTO" >> # OpenBSD PF options to play with later. very bad for raw packet rates. >> #pf_enable="YES" >> #pflog_enable="YES" >> # enable packet forwarding >> # these enable forwarding and fastforwarding sysctls. inet6 does not >> have fastforward >> gateway_enable="YES" >> ipv6_gateway_enable="YES" >> # enable OpenBSD ftp-proxy >> # should comment out until actively playing with PF >> ftpproxy_enable="YES" >> #left in place, commented out from prior testing >> #ifconfig_mlxen1="inet 172.16.2.1 netmask 255.255.255.0 mtu 9000" >> #ifconfig_mlxen0="inet 172.16.1.1 netmask 255.255.255.0 mtu 9000" >> #ifconfig_mlxen3="inet 172.16.7.1 netmask 255.255.255.0 mtu 9000" >> #ifconfig_mlxen2="inet 172.16.8.1 netmask 255.255.255.0 mtu 9000" >> # -lro and -tso options added per mailing list suggestion from Bjoern A. >> Zeeb (bzeeb-lists at lists.zabbadoz.net) >> ifconfig_cxl0="inet 172.16.3.1 netmask 255.255.255.0 mtu 9000 -lro -tso up" >> ifconfig_cxl1="inet 172.16.4.1 netmask 255.255.255.0 mtu 9000 -lro -tso up" >> ifconfig_cxl2="inet 172.16.5.1 netmask 255.255.255.0 mtu 9000 -lro -tso up" >> ifconfig_cxl3="inet 172.16.6.1 netmask 255.255.255.0 mtu 9000 -lro -tso up" >> # aliases instead of reconfiguring test clients. See above commented out >> entries >> ifconfig_cxl0_alias0="172.16.7.1 netmask 255.255.255.0" >> ifconfig_cxl1_alias0="172.16.8.1 netmask 255.255.255.0" >> ifconfig_cxl2_alias0="172.16.1.1 netmask 255.255.255.0" >> ifconfig_cxl3_alias0="172.16.2.1 netmask 255.255.255.0" >> # for remote monitoring/admin of the test device >> ifconfig_igb0="inet 172.30.60.60 netmask 255.255.0.0" >> >> Additional configurations: >> cpuset-chelsio-6cpu-high >> # Original provided by Navdeep Parhar >> # takes vmstat -ai output into a list, and assigns interrupts in order to >> # the available CPU cores. >> # Modified: to assign only to the 'high CPUs', ie: on core1. >> # See: http://lists.freebsd.org/pipermail/freebsd-net/2014-July/039317.html >> #!/usr/local/bin/bash >> ncpu=12 >> irqlist=$(vmstat -ia | egrep 't4nex|t5nex|cxgbc' | cut -f1 -d: | cut -c4-) >> i=6 >> for irq in $irqlist; do >> cpuset -l $i -x $irq >> i=$((i+1)) >> [ $i -ge $ncpu ] && i=6 >> done >> >> Client Description: >> >> Two Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz processors >> 64 GB ram >> Mellanox Technologies MT27500 Family [ConnectX-3] >> Centos 6.4 with updates >> iperf3 installed from yum repositories: iperf3-3.0.3-3.el6.x86_64 >> >> Test setup: >> >> I've found about 3 streams between Centos clients is about the best way >> to get the most out of them. >> Above certain points, the -b flag does not change results. >> -N is an artifact from using TCP >> -l is needed, as -M doesn't work for UDP. >> >> I usually use launch scripts similar to the following: >> >> for i in `seq 41 60`; do ssh loader$i "export TIME=120; export >> STREAMS=1; export PORT=52$i; export PKT=64; export RATE=2000m; >> /root/iperf-test-8port-udp" & done >> >> The scripts execute the following on each host. >> >> #!/bin/bash >> PORT1=$PORT >> PORT2=$(($PORT+1000)) >> PORT3=$(($PORT+2000)) >> iperf3 -c loader41-40gbe -u -b 10000m -i 0 -N -l $PKT -t$TIME >> -P$STREAMS -p$PORT1 & >> iperf3 -c loader42-40gbe -u -b 10000m -i 0 -N -l $PKT -t$TIME >> -P$STREAMS -p$PORT1 & >> iperf3 -c loader43-40gbe -u -b 10000m -i 0 -N -l $PKT -t$TIME >> -P$STREAMS -p$PORT1 & >> ... (through all clients and all three ports) ... >> iperf3 -c loader60-40gbe -u -b 10000m -i 0 -N -l $PKT -t$TIME >> -P$STREAMS -p$PORT3 & >> >> >> Results: >> >> Summarized, netstat -w 1 -q 240 -bd, run through: >> cat test4-tuning | egrep -v {'packets | input '} | awk '{ipackets+=$1} >> {idrops+=$3} {opackets+=$5} {odrops+=$9} END {print "input " >> ipackets/NR, "idrops " idrops/NR, "opackets " opackets/NR, "odrops " >> odrops/NR}' >> >> input 1.10662e+07 idrops 8.01783e+06 opackets 3.04516e+06 odrops 3152.4 >> >> Snapshot of raw output: >> >> input (Total) output >> packets errs idrops bytes packets errs bytes colls drops >> 11189148 0 7462453 1230805216 3725006 0 409750710 0 799 >> 10527505 0 6746901 1158024978 3779096 0 415700708 0 127 >> 10606163 0 6850760 1166676673 3751780 0 412695761 0 1535 >> 10749324 0 7132014 1182425799 3617558 0 397930956 0 5972 >> 10695667 0 7022717 1176521907 3669342 0 403627236 0 1461 >> 10441173 0 6762134 1148528662 3675048 0 404255540 0 6021 >> 10683773 0 7005635 1175215014 3676962 0 404465671 0 2606 >> 10869859 0 7208696 1195683372 3658432 0 402427698 0 979 >> 11948989 0 8310926 1314387881 3633773 0 399714986 0 725 >> 12426195 0 8864415 1366877194 3562311 0 391853156 0 2762 >> 13006059 0 9432389 1430661751 3570067 0 392706552 0 5158 >> 12822243 0 9098871 1410443600 3715177 0 408668500 0 4064 >> 13317864 0 9683602 1464961374 3632156 0 399536131 0 3684 >> 13701905 0 10182562 1507207982 3523101 0 387540859 0 >> 8690 >> 13820227 0 10244870 1520221820 3562038 0 391823322 0 >> 2426 >> 14437060 0 10955483 1588073033 3480105 0 382810557 0 >> 2619 >> 14518471 0 11119573 1597028105 3397439 0 373717355 0 >> 5691 >> 14890287 0 11675003 1637926521 3199812 0 351978304 0 >> 11007 >> 14923610 0 11749091 1641594441 3171436 0 348857468 0 >> 7389 >> 14738704 0 11609730 1621254991 3117715 0 342948394 0 >> 2597 >> 14753975 0 11549735 1622935026 3207393 0 352812846 0 >> 4798 >> >> >> >> >> >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"