From owner-freebsd-net@FreeBSD.ORG Wed Apr 8 17:18:45 2015 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 562AB3A3 for ; Wed, 8 Apr 2015 17:18:45 +0000 (UTC) Received: from lauren.room52.net (lauren.room52.net [210.50.193.198]) by mx1.freebsd.org (Postfix) with ESMTP id AB7A4AA2 for ; Wed, 8 Apr 2015 17:18:44 +0000 (UTC) Received: from lgwl-lstewart2.corp.netflix.com (unknown [69.53.237.72]) by lauren.room52.net (Postfix) with ESMTPSA id 55D0C7E81E; Thu, 9 Apr 2015 03:18:35 +1000 (EST) Message-ID: <552562E9.9060103@freebsd.org> Date: Wed, 08 Apr 2015 10:18:33 -0700 From: Lawrence Stewart User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 MIME-Version: 1.0 To: Marek Salwerowicz , "freebsd-net@freebsd.org" Subject: Re: RTT and TCP Window size doubts, bandwidth issues References: <552455EB.7080609@wp.pl> In-Reply-To: <552455EB.7080609@wp.pl> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=2.4 required=5.0 tests=DNS_FROM_AHBL_RHSBL, UNPARSEABLE_RELAY,URIBL_BLOCKED autolearn=no version=3.3.2 X-Spam-Level: ** X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on lauren.room52.net X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Apr 2015 17:18:45 -0000 On 04/07/15 15:10, Marek Salwerowicz wrote: > Hi list, > > I am trying to find correct setup of sysctl's for following machines > (VMs under Vmware Workstation 8) to test large TCP window size: > > > There are 2 boxes, each of them has following setup: > - % uname -a > FreeBSD freeA 10.1-RELEASE-p6 FreeBSD 10.1-RELEASE-p6 #0: Tue Feb 24 > 19:00:21 UTC 2015 > root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 > > - 4GB of RAM > - 1 NIC > > without any modifications, iperf reports bandwidth speed ~1Gbit/s > between hosts: > > server-side: > # iperf -s > ------------------------------------------------------------ > Server listening on TCP port 5001 > TCP window size: 64.0 KByte (default) > ------------------------------------------------------------ > > client-side: > > # iperf -c 192.168.108.140 > ------------------------------------------------------------ > Client connecting to 192.168.108.140, TCP port 5001 > TCP window size: 32.5 KByte (default) > ------------------------------------------------------------ > [ 3] local 192.168.108.141 port 35282 connected with 192.168.108.140 > port 5001 > [ ID] Interval Transfer Bandwidth > [ 3] 0.0-10.0 sec 1.46 GBytes 1.25 Gbits/sec > > > > I want to simulate (using dummynet) link between hosts with bandwidth > 400mbit/s and latency ~20ms. > In order to do that, I create the ipfw pipe on one box: > IPFW="ipfw -q" > > > $IPFW pipe 1 config bw 400Mbit/s delay 10ms > > $IPFW add 1500 pipe 1 ip from any to any You should set up 2 pipes, one for each direction of traffic, and you also might want to explicitly set the queue size in bytes in addition to the bw and delay parameters. > after running ipfw, bandwidth with default kernel sysctl becomes lower: > > client-side: > ------------------------------------------------------------ > Client connecting to 192.168.108.140, TCP port 5001 > TCP window size: 32.5 KByte (default) > ------------------------------------------------------------ > [ 3] local 192.168.108.141 port 35340 connected with 192.168.108.140 > port 5001 > [ ID] Interval Transfer Bandwidth > [ 3] 0.0-10.1 sec 12.5 MBytes 10.4 Mbits/sec > > > > I'd like to achieve bandwidth ~400Mbit/s. Better calculate the BDP then: 400E6 x 20E-3 = 977Kb, so you'll want to make sure your socket buffers and window can accommodation ~1Mb of data. > I've modified following sysctl's (both on client- and server-side): > > kern.ipc.maxsockbuf=33554432 # (default 2097152) > > net.inet.tcp.sendbuf_max=33554432 # (default 2097152) > net.inet.tcp.recvbuf_max=33554432 # (default 2097152) > > net.inet.tcp.cc.algorithm=htcp # (default newreno) #enabled in > /boot/loader.conf also > net.inet.tcp.cc.htcp.adaptive_backoff=1 # (default 0 ; disabled) > > net.inet.tcp.cc.htcp.rtt_scaling=1 # (default 0 ; disabled) You shouldn't need to tune any of the above, and NewReno should be capable of filling a 1MB pipe. > net.inet.tcp.mssdflt=1460 # (default 536) > > net.inet.tcp.minmss=1300 # (default 216) You should not change either of these. > net.inet.tcp.rfc1323=1 # (default 1) > net.inet.tcp.rfc3390=1 # (default 1) 3390 has no effect if you have net.inet.tcp.experimental.initcwnd10=1 which is default on 10.X (unfortunately, IMO). > net.inet.tcp.sendspace=8388608 # (default 32768) > net.inet.tcp.recvspace=8388608 # (default 65536) You shouldn't need to tune these if sockbuf autotuning is enabled, which will grow the default up to sendbuf_max, which defaults to larger than 1MB so you should be fine. > net.inet.tcp.sendbuf_inc=32768 # (default 8192 ) > net.inet.tcp.recvbuf_inc=65536 # (default 16384) Tuning these is ok though also probably unnecessary. > But the results are not really good as I expected: > > server-side: > # iperf -s > ------------------------------------------------------------ > Server listening on TCP port 5001 > TCP window size: 8.00 MByte (default) > ------------------------------------------------------------ > > client-side: > # iperf -c 192.168.108.140 > ------------------------------------------------------------ > Client connecting to 192.168.108.140, TCP port 5001 > TCP window size: 8.00 MByte (default) > ------------------------------------------------------------ > [ 3] local 192.168.108.141 port 21894 connected with 192.168.108.140 > port 5001 > [ ID] Interval Transfer Bandwidth > [ 3] 0.0-10.1 sec 24.2 MBytes 20.2 Mbits/sec > > > I was trying to follow the articles: > - http://www.psc.edu/index.php/networking/641-tcp-tune > - https://fasterdata.es.net/host-tuning/freebsd/ > > But can't really figure out what / how should be tuned in order to > achieve good results. > > If anyone could find some time and give me some hints, I'd be pleased! On the sender (iperf client): kldload siftr sysctl net.inet.siftr.enabled=1 sysctl net.inet.siftr.enabled=0 You can then post process the /var/log/siftr.log to pull out just the iperf related lines: cat /var/log/siftr.log | grep "192.168.108.140,5001" > iperf.siftr Then you can look at socket buffer occupancies, cwnd vs snd_wnd vs time, loss recovery, etc to see why things are not behaving. Cheers, Lawrence