Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 22 Aug 2007 14:50:47 -0700
From:      "Kevin Oberman" <oberman@es.net>
To:        Max Laier <max@love2party.net>
Cc:        freebsd-net@freebsd.org
Subject:   Re: Unable to set socket size > 16MB
Message-ID:  <20070822215047.174954507D@ptavv.es.net>
In-Reply-To: <200708211114.09938.max@love2party.net>

next in thread | previous in thread | raw e-mail | index | archive | help
--==_Exmh_1187819447_16164P
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On Tuesday Aug. 21, 2007, Max Laier <max@love2party.net> wrote:

> On Monday 20 August 2007, Kevin Oberman wrote:
> > I am trying to tune a FreeBSD system for ~100 ms. RTT at 10 Gbps. (I
> > posted another message about this back on 8/17). I am running current
> > of late July 31.
> >
> > I am using iperf and I have confirmed (with gdb) that it is passing
> > setsockopt a size of 67108864 and setsockopt is returning 0. When I
> > capture the SYN packets, I am seeing a window of 64K and a scale
> > factor of 8. For 64 MB, the scale factor should be 10.
> >
> > Is there some hidden limitation that would restrict this or is there a
> > bug involved? I have set net.inet.tcp.(send|recv)space to
> > 64m. kern.ipc.maxsockbuf=3D134217728.
> >
> > Here is the 3-way handshake:
> > 13:57:45.571614 IP lbl.52460 > perf-bnl.commplex-link: S
> > 4070670678:4070670678(0) win 65535 <mss 8960,nop,wscale
> > 8,sackOK,timestamp 345761341 0> 13:57:45.665645 IP
> > perf-bnl.commplex-link > lbl.52460: S 3909263475:3909263475(0) ack
> > 4070670679 win 65535 <mss 8960,nop,wscale 8,nop,nop,timestamp
> > 3623078172 345761341> 13:57:45.665683 IP lbl.52460 >
> > perf-bnl.commplex-link: . ack 1 win 65535 <nop,nop,timestamp 345761435
> > 3623078172>
> >
> > Any reason for this? Any workaround or fix? Or am I missing something?
>  
> http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet/tcp_syncache.c#rev1.104 
> seems to be the culprit:
> 
>                         while (wscale < TCP_MAX_WINSHIFT &&
>                             (0x1 << wscale) < tcp_minmss)
> 					      /* 216 */
>                                 wscale++;
> 
> It's obvious that the above will bound wscale to 8 with the default of 216 
> for minmss.  You should be able to set a higher minmss for a temporary 
> work around, but this calculation really seems wrong to me.  Esp. given 
> the following comment for tcp_minmss:
> 
>  ...
>  * with packet generation and sending. Set to zero to disable MINMSS
>  * checking. This setting prevents us from sending too small packets.
>  */

Thanks, Max! That fixed this problem very nicely. I changed minmss and
my window went to 64M.

I really agree that the code is simply wrong. I'm a bit at a loss as to
why this check is done, but I'm probably missing something obvious. In
any case, the work-around worked!

Now on to the next issue. This got my bandwidth up from 1.4G to 2.3G
Still pretty pathetic. I look at a tcptrace and see that the receive
window is always sitting at > 50 MB, but the "Outstanding Data" climbs
to about 28 MB and stops there. Complete flat line from the to the end
of the run. 

I do see a lot of packets that update the window size only. (That is
they have the ACK bit set, but just keep ACKing the same sequence number
and differ only in the window size.) Often I get hundreds of them in a
row. I think this points out a problem, but probably not what is killing
my throughput.

I see an old (2004) message to net@freebsd.org which reports this and
Andre asked him to submit a PR and send him the PR number. I have no
idea if it was ever submitted, but the code is unchanged and I suspect
it never was. In any case, I can't find it.

I guess I'll go ahead and submit a PR.

I'm starting to suspect that the TCP code has a number of issues in
cases where there is a great deal of outstanding data due to high
bandwidth and long RTTs. This is probably something that has not ever
had too much attention or exercise.
-- 
R. Kevin Oberman, Network Engineer
Energy Sciences Network (ESnet)
Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
E-mail: oberman@es.net			Phone: +1 510 486-8634
Key fingerprint:059B 2DDF 031C 9BA3 14A4  EADA 927D EBB3 987B 3751

--==_Exmh_1187819447_16164P
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (FreeBSD)
Comment: Exmh version 2.5 06/03/2002

iD8DBQFGzK+3kn3rs5h7N1ERAoHbAJ9z2Z3VdKpIEJP1GLjCFsczxxiWKgCdG0LP
5xf7SzToKJc36sj/AH/B0e8=
=ubTQ
-----END PGP SIGNATURE-----

--==_Exmh_1187819447_16164P--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070822215047.174954507D>