Date: Wed, 22 Aug 2007 14:50:47 -0700 From: "Kevin Oberman" <oberman@es.net> To: Max Laier <max@love2party.net> Cc: freebsd-net@freebsd.org Subject: Re: Unable to set socket size > 16MB Message-ID: <20070822215047.174954507D@ptavv.es.net> In-Reply-To: <200708211114.09938.max@love2party.net>
next in thread | previous in thread | raw e-mail | index | archive | help
--==_Exmh_1187819447_16164P Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Tuesday Aug. 21, 2007, Max Laier <max@love2party.net> wrote: > On Monday 20 August 2007, Kevin Oberman wrote: > > I am trying to tune a FreeBSD system for ~100 ms. RTT at 10 Gbps. (I > > posted another message about this back on 8/17). I am running current > > of late July 31. > > > > I am using iperf and I have confirmed (with gdb) that it is passing > > setsockopt a size of 67108864 and setsockopt is returning 0. When I > > capture the SYN packets, I am seeing a window of 64K and a scale > > factor of 8. For 64 MB, the scale factor should be 10. > > > > Is there some hidden limitation that would restrict this or is there a > > bug involved? I have set net.inet.tcp.(send|recv)space to > > 64m. kern.ipc.maxsockbuf=3D134217728. > > > > Here is the 3-way handshake: > > 13:57:45.571614 IP lbl.52460 > perf-bnl.commplex-link: S > > 4070670678:4070670678(0) win 65535 <mss 8960,nop,wscale > > 8,sackOK,timestamp 345761341 0> 13:57:45.665645 IP > > perf-bnl.commplex-link > lbl.52460: S 3909263475:3909263475(0) ack > > 4070670679 win 65535 <mss 8960,nop,wscale 8,nop,nop,timestamp > > 3623078172 345761341> 13:57:45.665683 IP lbl.52460 > > > perf-bnl.commplex-link: . ack 1 win 65535 <nop,nop,timestamp 345761435 > > 3623078172> > > > > Any reason for this? Any workaround or fix? Or am I missing something? > > http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet/tcp_syncache.c#rev1.104 > seems to be the culprit: > > while (wscale < TCP_MAX_WINSHIFT && > (0x1 << wscale) < tcp_minmss) > /* 216 */ > wscale++; > > It's obvious that the above will bound wscale to 8 with the default of 216 > for minmss. You should be able to set a higher minmss for a temporary > work around, but this calculation really seems wrong to me. Esp. given > the following comment for tcp_minmss: > > ... > * with packet generation and sending. Set to zero to disable MINMSS > * checking. This setting prevents us from sending too small packets. > */ Thanks, Max! That fixed this problem very nicely. I changed minmss and my window went to 64M. I really agree that the code is simply wrong. I'm a bit at a loss as to why this check is done, but I'm probably missing something obvious. In any case, the work-around worked! Now on to the next issue. This got my bandwidth up from 1.4G to 2.3G Still pretty pathetic. I look at a tcptrace and see that the receive window is always sitting at > 50 MB, but the "Outstanding Data" climbs to about 28 MB and stops there. Complete flat line from the to the end of the run. I do see a lot of packets that update the window size only. (That is they have the ACK bit set, but just keep ACKing the same sequence number and differ only in the window size.) Often I get hundreds of them in a row. I think this points out a problem, but probably not what is killing my throughput. I see an old (2004) message to net@freebsd.org which reports this and Andre asked him to submit a PR and send him the PR number. I have no idea if it was ever submitted, but the code is unchanged and I suspect it never was. In any case, I can't find it. I guess I'll go ahead and submit a PR. I'm starting to suspect that the TCP code has a number of issues in cases where there is a great deal of outstanding data due to high bandwidth and long RTTs. This is probably something that has not ever had too much attention or exercise. -- R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: oberman@es.net Phone: +1 510 486-8634 Key fingerprint:059B 2DDF 031C 9BA3 14A4 EADA 927D EBB3 987B 3751 --==_Exmh_1187819447_16164P Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (FreeBSD) Comment: Exmh version 2.5 06/03/2002 iD8DBQFGzK+3kn3rs5h7N1ERAoHbAJ9z2Z3VdKpIEJP1GLjCFsczxxiWKgCdG0LP 5xf7SzToKJc36sj/AH/B0e8= =ubTQ -----END PGP SIGNATURE----- --==_Exmh_1187819447_16164P--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070822215047.174954507D>