Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 24 Jul 2014 09:06:56 -0400
From:      John Jasen <jjasen@gmail.com>
Cc:        FreeBSD Net <freebsd-net@freebsd.org>
Subject:   Re: fastforward/routing: a 3 million packet-per-second system?
Message-ID:  <53D104F0.6010103@gmail.com>
In-Reply-To: <CAJ-VmomWpc=3dtasbDhhrUpGywPio3_9W2b-RTAeJjq3nahhOQ@mail.gmail.com>
References:  <53CE80DD.9090109@gmail.com> <CAJ-VmomWpc=3dtasbDhhrUpGywPio3_9W2b-RTAeJjq3nahhOQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
As a follow-up, from man cxgbe(4), I changed the following
/boot/loader.conf settings.

<snip>
# in discussions with cxgbe maintainer and testing. higher than 48 seems
to cause kernel panics.
hw.cxgbe.ntxq10g=48
# note, to go above 30-some odd tx queues, you need to set the following
to turn off toe queues
hw.cxgbe.toecaps_allowed=0
# testing hw.cxgbe.cong_drop options
hw.cxgbe.cong_drop=1
</snip>

Changing ntxq10g and toecaps improved pmc metrics, which previously had
shown a lot of time in transmit states,


Changing cong_drop is ... probably not a good idea for production
environments. However, testing with 1, which allows drop as opposed to
tx_pause, I saw two things:

a) packet per second rate reliably crossed over 4 million packets per
second, as measured via netstat -db 1.

b) there were several occasions where the test system became completely
unresponsive over the ssh sessions I had established to run netstat,
top, et al.


On 07/22/2014 02:07 PM, Adrian Chadd wrote:
> Hi!
>
> Well, what's missing is some dtrace/pmc/lockdebugging investigations
> into the system to see where it's currently maxing out at.
>
> I wonder if you're seeing contention on the transmit paths as drivers
> queue frames from one set of driver threads/queues to another
> potentially completely different set of driver transmit
> threads/queues.
>
>
>
>
> -a
>
>
> On 22 July 2014 08:18, John Jasen <jjasen@gmail.com> wrote:
>> Feedback and/or tips and tricks more than welcome.
>>
>> Outstanding questions:
>>
>> Would increasing the number of processor cores help?
>>
>> Would a system where both processor QPI ports connect to each other
>> mitigate QPI bottlenecks?
>>
>> Are there further performance optimizations I am missing?
>>
>> Server Description:
>>
>> The system in question is a Dell Poweredge R820, 16GB of RAM, and two
>> Intel(R) Xeon(R) CPU E5-4610 0 @ 2.40GHz.
>>
>> Onboard, in a 16x PCIe slot, I have one Chelsio T-580-CR two-port 40GbE
>> NIC, and in an 8x slot, another T-580-CR dual port.
>>
>> I am running FreeBSD 10.0-STABLE.
>>
>> BIOS tweaks:
>>
>> Hyperthreading (or Logical Processors) is turned off.
>> Memory Node Interleaving is turned off, but did not appear to impact
>> performance.
>>
>> /boot/loader.conf contents:
>> #for CARP+PF testing
>> carp_load="YES"
>> #load cxgbe drivers.
>> cxgbe_load="YES"
>> #maxthreads appears to not exceed CPU.
>> net.isr.maxthreads=12
>> #bindthreads may be indicated when using cpuset(1) on interrupts
>> net.isr.bindthreads=1
>> #random guess based on googling
>> net.isr.maxqlimit=60480
>> net.link.ifqmaxlen=90000
>> #discussions with cxgbe maintainer and list led me to trying this.
>> Allows more interrupts
>> #to be fixed to CPUs, which in some cases, improves interrupt balancing.
>> hw.cxgbe.ntxq10g=16
>> hw.cxgbe.nrxq10g=16
>>
>> /etc/sysctl.conf contents:
>>
>> #the following is also enabled by rc.conf gateway_enable.
>> net.inet.ip.fastforwarding=1
>> #recommendations from BSD router project
>> kern.random.sys.harvest.ethernet=0
>> kern.random.sys.harvest.point_to_point=0
>> kern.random.sys.harvest.interrupt=0
>> #probably should be removed, as cxgbe does not seem to affect/be
>> affected by irq storm settings
>> hw.intr_storm_threshold=25000000
>> #based on Calomel.Org performance suggestions. 4x40GbE, seemed
>> reasonable to use 100GbE settings
>> kern.ipc.maxsockbuf=1258291200
>> net.inet.tcp.recvbuf_max=1258291200
>> net.inet.tcp.sendbuf_max=1258291200
>> #attempting to play with ULE scheduler, making it serve packets versus
>> netstat
>> kern.sched.slice=1
>> kern.sched.interact=1
>>
>> /etc/rc.conf contains:
>>
>> hostname="fbge1"
>> #should remove, especially given below duplicate entry
>> ifconfig_igb0="DHCP"
>> sshd_enable="YES"
>> #ntpd_enable="YES"
>> # Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable
>> dumpdev="AUTO"
>> # OpenBSD PF options to play with later. very bad for raw packet rates.
>> #pf_enable="YES"
>> #pflog_enable="YES"
>> # enable packet forwarding
>> # these enable forwarding and fastforwarding sysctls. inet6 does not
>> have fastforward
>> gateway_enable="YES"
>> ipv6_gateway_enable="YES"
>> # enable OpenBSD ftp-proxy
>> # should comment out until actively playing with PF
>> ftpproxy_enable="YES"
>> #left in place, commented out from prior testing
>> #ifconfig_mlxen1="inet 172.16.2.1 netmask 255.255.255.0 mtu 9000"
>> #ifconfig_mlxen0="inet 172.16.1.1 netmask 255.255.255.0 mtu 9000"
>> #ifconfig_mlxen3="inet 172.16.7.1 netmask 255.255.255.0 mtu 9000"
>> #ifconfig_mlxen2="inet 172.16.8.1 netmask 255.255.255.0 mtu 9000"
>> # -lro and -tso options added per mailing list suggestion from Bjoern A.
>> Zeeb (bzeeb-lists at lists.zabbadoz.net)
>> ifconfig_cxl0="inet 172.16.3.1 netmask 255.255.255.0 mtu 9000 -lro -tso up"
>> ifconfig_cxl1="inet 172.16.4.1 netmask 255.255.255.0 mtu 9000 -lro -tso up"
>> ifconfig_cxl2="inet 172.16.5.1 netmask 255.255.255.0 mtu 9000 -lro -tso up"
>> ifconfig_cxl3="inet 172.16.6.1 netmask 255.255.255.0 mtu 9000 -lro -tso up"
>> # aliases instead of reconfiguring test clients. See above commented out
>> entries
>> ifconfig_cxl0_alias0="172.16.7.1 netmask 255.255.255.0"
>> ifconfig_cxl1_alias0="172.16.8.1 netmask 255.255.255.0"
>> ifconfig_cxl2_alias0="172.16.1.1 netmask 255.255.255.0"
>> ifconfig_cxl3_alias0="172.16.2.1 netmask 255.255.255.0"
>> # for remote monitoring/admin of the test device
>> ifconfig_igb0="inet 172.30.60.60 netmask 255.255.0.0"
>>
>> Additional configurations:
>> cpuset-chelsio-6cpu-high
>> # Original provided by  Navdeep Parhar <nparhar@gmail.com>
>> # takes vmstat -ai output into a list, and assigns interrupts in order to
>> # the available CPU cores.
>> # Modified: to assign only to the 'high CPUs', ie: on core1.
>> # See: http://lists.freebsd.org/pipermail/freebsd-net/2014-July/039317.html
>> #!/usr/local/bin/bash
>> ncpu=12
>> irqlist=$(vmstat -ia | egrep 't4nex|t5nex|cxgbc' | cut -f1 -d: | cut -c4-)
>> i=6
>> for irq in $irqlist; do
>>         cpuset -l $i -x $irq
>>         i=$((i+1))
>>         [ $i -ge $ncpu ] && i=6
>> done
>>
>> Client Description:
>>
>> Two Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz processors
>> 64 GB ram
>> Mellanox Technologies MT27500 Family [ConnectX-3]
>> Centos 6.4 with updates
>> iperf3 installed from yum repositories: iperf3-3.0.3-3.el6.x86_64
>>
>> Test setup:
>>
>> I've found about 3 streams between Centos clients is about the best way
>> to get the most out of them.
>> Above certain points, the -b flag does not change results.
>> -N is an artifact from using TCP
>> -l is needed, as -M doesn't work for UDP.
>>
>> I usually use launch scripts similar to the following:
>>
>>  for i in `seq 41 60`; do ssh loader$i "export TIME=120; export
>> STREAMS=1; export PORT=52$i; export PKT=64; export RATE=2000m;
>> /root/iperf-test-8port-udp" & done
>>
>> The scripts execute the following on each host.
>>
>> #!/bin/bash
>> PORT1=$PORT
>> PORT2=$(($PORT+1000))
>> PORT3=$(($PORT+2000))
>> iperf3 -c loader41-40gbe -u -b 10000m -i 0  -N -l $PKT -t$TIME
>> -P$STREAMS -p$PORT1 &
>> iperf3 -c loader42-40gbe -u -b 10000m -i 0  -N -l $PKT -t$TIME
>> -P$STREAMS -p$PORT1 &
>> iperf3 -c loader43-40gbe -u -b 10000m -i 0  -N -l $PKT -t$TIME
>> -P$STREAMS -p$PORT1 &
>> ... (through all clients and all three ports) ...
>> iperf3 -c loader60-40gbe -u -b 10000m -i 0  -N -l $PKT -t$TIME
>> -P$STREAMS -p$PORT3 &
>>
>>
>> Results:
>>
>> Summarized, netstat -w 1 -q 240 -bd, run through:
>> cat test4-tuning | egrep -v {'packets | input '} | awk '{ipackets+=$1}
>> {idrops+=$3} {opackets+=$5} {odrops+=$9} END {print "input "
>> ipackets/NR, "idrops " idrops/NR, "opackets " opackets/NR, "odrops "
>> odrops/NR}'
>>
>> input 1.10662e+07 idrops 8.01783e+06 opackets 3.04516e+06 odrops 3152.4
>>
>> Snapshot of raw output:
>>
>>            input        (Total)           output
>>    packets  errs idrops      bytes    packets  errs      bytes colls drops
>>   11189148     0 7462453 1230805216    3725006     0  409750710     0   799
>>   10527505     0 6746901 1158024978    3779096     0  415700708     0   127
>>   10606163     0 6850760 1166676673    3751780     0  412695761     0  1535
>>   10749324     0 7132014 1182425799    3617558     0  397930956     0  5972
>>   10695667     0 7022717 1176521907    3669342     0  403627236     0  1461
>>   10441173     0 6762134 1148528662    3675048     0  404255540     0  6021
>>   10683773     0 7005635 1175215014    3676962     0  404465671     0  2606
>>   10869859     0 7208696 1195683372    3658432     0  402427698     0   979
>>   11948989     0 8310926 1314387881    3633773     0  399714986     0   725
>>   12426195     0 8864415 1366877194    3562311     0  391853156     0  2762
>>   13006059     0 9432389 1430661751    3570067     0  392706552     0  5158
>>   12822243     0 9098871 1410443600    3715177     0  408668500     0  4064
>>   13317864     0 9683602 1464961374    3632156     0  399536131     0  3684
>>   13701905     0 10182562 1507207982    3523101     0  387540859     0
>> 8690
>>   13820227     0 10244870 1520221820    3562038     0  391823322     0
>> 2426
>>   14437060     0 10955483 1588073033    3480105     0  382810557     0
>> 2619
>>   14518471     0 11119573 1597028105    3397439     0  373717355     0
>> 5691
>>   14890287     0 11675003 1637926521    3199812     0  351978304     0
>> 11007
>>   14923610     0 11749091 1641594441    3171436     0  348857468     0
>> 7389
>>   14738704     0 11609730 1621254991    3117715     0  342948394     0
>> 2597
>>   14753975     0 11549735 1622935026    3207393     0  352812846     0
>> 4798
>>
>>
>>
>>
>>
>> _______________________________________________
>> freebsd-net@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?53D104F0.6010103>