From owner-freebsd-net@FreeBSD.ORG Tue Jul 22 18:07:08 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 7574F66B for ; Tue, 22 Jul 2014 18:07:08 +0000 (UTC) Received: from mail-qa0-x22a.google.com (mail-qa0-x22a.google.com [IPv6:2607:f8b0:400d:c00::22a]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 368BD22E4 for ; Tue, 22 Jul 2014 18:07:08 +0000 (UTC) Received: by mail-qa0-f42.google.com with SMTP id j15so22821qaq.15 for ; Tue, 22 Jul 2014 11:07:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=xl/CJ0LnuNhFvef3XQHJQ29AMGeoKuh1+yNC3YYwaf4=; b=Jnf+QvBw2ZPH3ClPGKV+KR6Scc5mRwE/th/lN2ZXSkzS+5KflH78jQD/LHxFJq7gDP x7CNgo3TPUKPU4P/RsyjgZu00bjAUw2Fl/P2uoktO22NfNh1y2RQINJfoOxAPGv8CBwm LWOZ/kry0Yvv1ecSRehoPSQN9mHnaGJ2/YAGOE9eX6JPdnITFZ10mp0KF3VUW8vgVCpt ctTKKx0ULep5FN2XfSsh6oDgMI9HfiX7AQijcrGPV6HoaJ5gEqyG89KEWOwG2DEW0KYp 0QhAQyoIWkfNml+qtPB1dIBPQRX+O8vzEGFXb874wwWebhAd+zDD+kx4nXpnIKZbTpWj ntHg== MIME-Version: 1.0 X-Received: by 10.224.171.197 with SMTP id i5mr59861178qaz.55.1406052427375; Tue, 22 Jul 2014 11:07:07 -0700 (PDT) Sender: adrian.chadd@gmail.com Received: by 10.224.1.6 with HTTP; Tue, 22 Jul 2014 11:07:07 -0700 (PDT) In-Reply-To: <53CE80DD.9090109@gmail.com> References: <53CE80DD.9090109@gmail.com> Date: Tue, 22 Jul 2014 11:07:07 -0700 X-Google-Sender-Auth: E7SYjpZqAxwBTuVBUC6dReQcWE8 Message-ID: Subject: Re: fastforward/routing: a 3 million packet-per-second system? From: Adrian Chadd To: John Jasen Content-Type: text/plain; charset=UTF-8 Cc: FreeBSD Net X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Jul 2014 18:07:08 -0000 Hi! Well, what's missing is some dtrace/pmc/lockdebugging investigations into the system to see where it's currently maxing out at. I wonder if you're seeing contention on the transmit paths as drivers queue frames from one set of driver threads/queues to another potentially completely different set of driver transmit threads/queues. -a On 22 July 2014 08:18, John Jasen wrote: > Feedback and/or tips and tricks more than welcome. > > Outstanding questions: > > Would increasing the number of processor cores help? > > Would a system where both processor QPI ports connect to each other > mitigate QPI bottlenecks? > > Are there further performance optimizations I am missing? > > Server Description: > > The system in question is a Dell Poweredge R820, 16GB of RAM, and two > Intel(R) Xeon(R) CPU E5-4610 0 @ 2.40GHz. > > Onboard, in a 16x PCIe slot, I have one Chelsio T-580-CR two-port 40GbE > NIC, and in an 8x slot, another T-580-CR dual port. > > I am running FreeBSD 10.0-STABLE. > > BIOS tweaks: > > Hyperthreading (or Logical Processors) is turned off. > Memory Node Interleaving is turned off, but did not appear to impact > performance. > > /boot/loader.conf contents: > #for CARP+PF testing > carp_load="YES" > #load cxgbe drivers. > cxgbe_load="YES" > #maxthreads appears to not exceed CPU. > net.isr.maxthreads=12 > #bindthreads may be indicated when using cpuset(1) on interrupts > net.isr.bindthreads=1 > #random guess based on googling > net.isr.maxqlimit=60480 > net.link.ifqmaxlen=90000 > #discussions with cxgbe maintainer and list led me to trying this. > Allows more interrupts > #to be fixed to CPUs, which in some cases, improves interrupt balancing. > hw.cxgbe.ntxq10g=16 > hw.cxgbe.nrxq10g=16 > > /etc/sysctl.conf contents: > > #the following is also enabled by rc.conf gateway_enable. > net.inet.ip.fastforwarding=1 > #recommendations from BSD router project > kern.random.sys.harvest.ethernet=0 > kern.random.sys.harvest.point_to_point=0 > kern.random.sys.harvest.interrupt=0 > #probably should be removed, as cxgbe does not seem to affect/be > affected by irq storm settings > hw.intr_storm_threshold=25000000 > #based on Calomel.Org performance suggestions. 4x40GbE, seemed > reasonable to use 100GbE settings > kern.ipc.maxsockbuf=1258291200 > net.inet.tcp.recvbuf_max=1258291200 > net.inet.tcp.sendbuf_max=1258291200 > #attempting to play with ULE scheduler, making it serve packets versus > netstat > kern.sched.slice=1 > kern.sched.interact=1 > > /etc/rc.conf contains: > > hostname="fbge1" > #should remove, especially given below duplicate entry > ifconfig_igb0="DHCP" > sshd_enable="YES" > #ntpd_enable="YES" > # Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable > dumpdev="AUTO" > # OpenBSD PF options to play with later. very bad for raw packet rates. > #pf_enable="YES" > #pflog_enable="YES" > # enable packet forwarding > # these enable forwarding and fastforwarding sysctls. inet6 does not > have fastforward > gateway_enable="YES" > ipv6_gateway_enable="YES" > # enable OpenBSD ftp-proxy > # should comment out until actively playing with PF > ftpproxy_enable="YES" > #left in place, commented out from prior testing > #ifconfig_mlxen1="inet 172.16.2.1 netmask 255.255.255.0 mtu 9000" > #ifconfig_mlxen0="inet 172.16.1.1 netmask 255.255.255.0 mtu 9000" > #ifconfig_mlxen3="inet 172.16.7.1 netmask 255.255.255.0 mtu 9000" > #ifconfig_mlxen2="inet 172.16.8.1 netmask 255.255.255.0 mtu 9000" > # -lro and -tso options added per mailing list suggestion from Bjoern A. > Zeeb (bzeeb-lists at lists.zabbadoz.net) > ifconfig_cxl0="inet 172.16.3.1 netmask 255.255.255.0 mtu 9000 -lro -tso up" > ifconfig_cxl1="inet 172.16.4.1 netmask 255.255.255.0 mtu 9000 -lro -tso up" > ifconfig_cxl2="inet 172.16.5.1 netmask 255.255.255.0 mtu 9000 -lro -tso up" > ifconfig_cxl3="inet 172.16.6.1 netmask 255.255.255.0 mtu 9000 -lro -tso up" > # aliases instead of reconfiguring test clients. See above commented out > entries > ifconfig_cxl0_alias0="172.16.7.1 netmask 255.255.255.0" > ifconfig_cxl1_alias0="172.16.8.1 netmask 255.255.255.0" > ifconfig_cxl2_alias0="172.16.1.1 netmask 255.255.255.0" > ifconfig_cxl3_alias0="172.16.2.1 netmask 255.255.255.0" > # for remote monitoring/admin of the test device > ifconfig_igb0="inet 172.30.60.60 netmask 255.255.0.0" > > Additional configurations: > cpuset-chelsio-6cpu-high > # Original provided by Navdeep Parhar > # takes vmstat -ai output into a list, and assigns interrupts in order to > # the available CPU cores. > # Modified: to assign only to the 'high CPUs', ie: on core1. > # See: http://lists.freebsd.org/pipermail/freebsd-net/2014-July/039317.html > #!/usr/local/bin/bash > ncpu=12 > irqlist=$(vmstat -ia | egrep 't4nex|t5nex|cxgbc' | cut -f1 -d: | cut -c4-) > i=6 > for irq in $irqlist; do > cpuset -l $i -x $irq > i=$((i+1)) > [ $i -ge $ncpu ] && i=6 > done > > Client Description: > > Two Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz processors > 64 GB ram > Mellanox Technologies MT27500 Family [ConnectX-3] > Centos 6.4 with updates > iperf3 installed from yum repositories: iperf3-3.0.3-3.el6.x86_64 > > Test setup: > > I've found about 3 streams between Centos clients is about the best way > to get the most out of them. > Above certain points, the -b flag does not change results. > -N is an artifact from using TCP > -l is needed, as -M doesn't work for UDP. > > I usually use launch scripts similar to the following: > > for i in `seq 41 60`; do ssh loader$i "export TIME=120; export > STREAMS=1; export PORT=52$i; export PKT=64; export RATE=2000m; > /root/iperf-test-8port-udp" & done > > The scripts execute the following on each host. > > #!/bin/bash > PORT1=$PORT > PORT2=$(($PORT+1000)) > PORT3=$(($PORT+2000)) > iperf3 -c loader41-40gbe -u -b 10000m -i 0 -N -l $PKT -t$TIME > -P$STREAMS -p$PORT1 & > iperf3 -c loader42-40gbe -u -b 10000m -i 0 -N -l $PKT -t$TIME > -P$STREAMS -p$PORT1 & > iperf3 -c loader43-40gbe -u -b 10000m -i 0 -N -l $PKT -t$TIME > -P$STREAMS -p$PORT1 & > ... (through all clients and all three ports) ... > iperf3 -c loader60-40gbe -u -b 10000m -i 0 -N -l $PKT -t$TIME > -P$STREAMS -p$PORT3 & > > > Results: > > Summarized, netstat -w 1 -q 240 -bd, run through: > cat test4-tuning | egrep -v {'packets | input '} | awk '{ipackets+=$1} > {idrops+=$3} {opackets+=$5} {odrops+=$9} END {print "input " > ipackets/NR, "idrops " idrops/NR, "opackets " opackets/NR, "odrops " > odrops/NR}' > > input 1.10662e+07 idrops 8.01783e+06 opackets 3.04516e+06 odrops 3152.4 > > Snapshot of raw output: > > input (Total) output > packets errs idrops bytes packets errs bytes colls drops > 11189148 0 7462453 1230805216 3725006 0 409750710 0 799 > 10527505 0 6746901 1158024978 3779096 0 415700708 0 127 > 10606163 0 6850760 1166676673 3751780 0 412695761 0 1535 > 10749324 0 7132014 1182425799 3617558 0 397930956 0 5972 > 10695667 0 7022717 1176521907 3669342 0 403627236 0 1461 > 10441173 0 6762134 1148528662 3675048 0 404255540 0 6021 > 10683773 0 7005635 1175215014 3676962 0 404465671 0 2606 > 10869859 0 7208696 1195683372 3658432 0 402427698 0 979 > 11948989 0 8310926 1314387881 3633773 0 399714986 0 725 > 12426195 0 8864415 1366877194 3562311 0 391853156 0 2762 > 13006059 0 9432389 1430661751 3570067 0 392706552 0 5158 > 12822243 0 9098871 1410443600 3715177 0 408668500 0 4064 > 13317864 0 9683602 1464961374 3632156 0 399536131 0 3684 > 13701905 0 10182562 1507207982 3523101 0 387540859 0 > 8690 > 13820227 0 10244870 1520221820 3562038 0 391823322 0 > 2426 > 14437060 0 10955483 1588073033 3480105 0 382810557 0 > 2619 > 14518471 0 11119573 1597028105 3397439 0 373717355 0 > 5691 > 14890287 0 11675003 1637926521 3199812 0 351978304 0 > 11007 > 14923610 0 11749091 1641594441 3171436 0 348857468 0 > 7389 > 14738704 0 11609730 1621254991 3117715 0 342948394 0 > 2597 > 14753975 0 11549735 1622935026 3207393 0 352812846 0 > 4798 > > > > > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"