From owner-freebsd-net@freebsd.org  Wed Aug 26 21:30:09 2015
Return-Path: <owner-freebsd-net@freebsd.org>
Delivered-To: freebsd-net@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 751D19C2C6A
 for <freebsd-net@mailman.ysv.freebsd.org>;
 Wed, 26 Aug 2015 21:30:09 +0000 (UTC)
 (envelope-from chris@stankevitz.com)
Received: from mango.stankevitz.com (mango.stankevitz.com [208.79.93.194])
 by mx1.freebsd.org (Postfix) with ESMTP id 51B2F155A
 for <freebsd-net@freebsd.org>; Wed, 26 Aug 2015 21:30:09 +0000 (UTC)
 (envelope-from chris@stankevitz.com)
Received: from Chriss-MacBook-Pro.local (209-203-101-124.static.twtelecom.net
 [209.203.101.124])
 (using TLSv1.2 with cipher AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mango.stankevitz.com (Postfix) with ESMTPSA id D8123145E;
 Wed, 26 Aug 2015 14:30:07 -0700 (PDT)
Message-ID: <55DE2FDF.5030707@stankevitz.com>
Date: Wed, 26 Aug 2015 14:30:07 -0700
From: Chris Stankevitz <chris@stankevitz.com>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10;
 rv:31.0) Gecko/20100101 Thunderbird/31.7.0
MIME-Version: 1.0
To: John-Mark Gurney <jmg@funkthat.com>
CC: freebsd-net@freebsd.org
Subject: Re: ssh over WAN: TCP window too small
References: <55DCF080.7080208@stankevitz.com>
 <20150826010323.GN33167@funkthat.com> <55DD2A98.2010605@stankevitz.com>
 <20150826082457.GQ33167@funkthat.com>
In-Reply-To: <20150826082457.GQ33167@funkthat.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net/>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 26 Aug 2015 21:30:09 -0000

John-Mark,

Thanks again.  I appreciate you teaching me "how to fish".  I basically 
spent all morning reading kdump output.

On 8/26/15 1:24 AM, John-Mark Gurney wrote:
 > It really looks like we should set TCPRcvBufPoll by default on
 > FreeBSD...

According to /etc/ssh/sshd_config, TCPRcvBufPoll defaults to true. 
ktrace (thank you for showing me this) shows the ssh process 
continuously polling SO_RCVBUF.

In my case with TCPRcvBufPoll on and HPNBufferSize and TCPRcvBuf unset, 
ssh never sets SO_RCVBUF or SO_SNDBUF so presumably FreeBSD has complete 
control over those values (you saw the same thing).

But perhaps all this is moot in my case because netstat shows that the 
sender and receiver have sufficiently large SND/RCV buffers:  (can you 
confirm my interpretation is correct?)

Sender   S-BMAX  662816
Receiver R-BMAX  528416

>> I will experiment with TCPRcvBuf.
>
> It does look like the values are in KB


Yes, README.hpn says they are in KB.  However, even though it is 
described in README.hpn, I cannot set TCPRcvBuf in sshd_config:

/etc/ssh/sshd_config: line 141: Bad configuration option: TCPRcvBuf
/etc/ssh/sshd_config: line 141: Bad configuration option: TcpRcvBuf

However, as I described above, I believe the buffer size is a red herring.

ktrace shows pretty clearly what is happening.  ssh isn't even bothering 
to read from /dev/zero until select(2) gives the green light.  And in my 
case select(2) blocks ssh for 0.1 second for every ~30KB of data:


  94146 ssh      6.592042 CALL  getsockopt(0x3,SOL_SOCKET,SO_RCVBUF
  94146 ssh      6.592054 RET   getsockopt 0
  94146 ssh      6.592065 CALL  select(0x7,0x803817168,0x803817178,0,0)
  94146 ssh      6.685873 RET   select 1
  94146 ssh      6.685907 CALL  read(0x3,0x7fffffffae40,0x2000)
  94146 ssh      6.685921 GIO   fd 3 read 36 bytes
   [ read of fd 3 snipped]
  94146 ssh      6.685931 RET   read 36/0x24
  94146 ssh      6.685950 CALL  getpid
  94146 ssh      6.685962 RET   getpid 94146/0x16fc2
  94146 ssh      6.685974 CALL  getpid
  94146 ssh      6.685984 RET   getpid 94146/0x16fc2
  94146 ssh      6.685996 CALL  getpid
  94146 ssh      6.686006 RET   getpid 94146/0x16fc2
  94146 ssh      6.686017 CALL  getpid
  94146 ssh      6.686027 RET   getpid 94146/0x16fc2
  94146 ssh      6.686091 CALL  getsockopt(0x3,SOL_SOCKET,SO_RCVBUF
  94146 ssh      6.686103 RET   getsockopt 0
  94146 ssh      6.686116 CALL  select(0x7,0x803817168,0x803817178,0,0)
  94146 ssh      6.686128 RET   select 2
  94146 ssh      6.686140 CALL  read(0x4,0x7fffffff6c70,0x4000)
  94146 ssh      6.686154 GIO   fd 4 read 4096 bytes
   [ read of stdin (/dev/zero) snipped)


Note in the above the blocking call to select at time 6.592065 that 
takes ~0.1 second.  This is the reason my connection is slow.  The 
content appears to be encrypted... I presume it's an application-level 
ssh ack.

BTW when I disable HPN (HPNdisabled=yes) that same select happens, but 
it blocks for less time (~0.05 second) and roughly doubles my throughput.

> Also, don't forget that if you set this in .ssh/config, you only set
> the client size recive buffer, not the server side, so you'd probably
> need to add this to the server's sshd_config to enable it for server
> receive side...

I understand.

> ktrace -i ssh <params>...

Thank you, this is awesome.  Is there a way for me to ktrace 'b' in this 
example: `a | b | c`?  I tried `ktrace a | b | c` and `ktrace test.sh` 
(which included a|b|c) but neither seemed to work.  I'm using stream 
redirection now but it doesn't afford me the bs control of dd.  Perhaps 
named pipes is the solution.

> Oh, I forgot to ask to make sure that net.inet.tcp.{send,recv}buf_auto
> is enabled:

Unfortunately it is enabled :-/

> Maybe a dump of your net.inet.tcp might also be helpful...

This should all be standard out-of-the-box:

net.inet.tcp.rfc1323: 1
net.inet.tcp.mssdflt: 536
net.inet.tcp.keepidle: 7200000
net.inet.tcp.keepintvl: 75000
net.inet.tcp.sendspace: 32768
net.inet.tcp.recvspace: 65536
net.inet.tcp.keepinit: 75000
net.inet.tcp.delacktime: 100
net.inet.tcp.v6mssdflt: 1220
net.inet.tcp.cc.algorithm: newreno
net.inet.tcp.cc.available: newreno
net.inet.tcp.hostcache.cachelimit: 15360
net.inet.tcp.hostcache.hashsize: 512
net.inet.tcp.hostcache.bucketlimit: 30
net.inet.tcp.hostcache.count: 6
net.inet.tcp.hostcache.expire: 3600
net.inet.tcp.hostcache.prune: 300
net.inet.tcp.hostcache.purge: 0
net.inet.tcp.log_in_vain: 0
net.inet.tcp.blackhole: 0
net.inet.tcp.delayed_ack: 1
net.inet.tcp.drop_synfin: 0
net.inet.tcp.rfc3042: 1
net.inet.tcp.rfc3390: 1
net.inet.tcp.experimental.initcwnd10: 1
net.inet.tcp.rfc3465: 1
net.inet.tcp.abc_l_var: 2
net.inet.tcp.ecn.enable: 0
net.inet.tcp.ecn.maxretries: 1
net.inet.tcp.insecure_rst: 0
net.inet.tcp.recvbuf_auto: 1
net.inet.tcp.recvbuf_inc: 16384
net.inet.tcp.recvbuf_max: 2097152
net.inet.tcp.path_mtu_discovery: 1
net.inet.tcp.tso: 1
net.inet.tcp.sendbuf_auto: 1
net.inet.tcp.sendbuf_inc: 8192
net.inet.tcp.sendbuf_max: 2097152
net.inet.tcp.reass.maxsegments: 255100
net.inet.tcp.reass.cursegments: 0
net.inet.tcp.reass.overflows: 471
net.inet.tcp.sack.enable: 1
net.inet.tcp.sack.maxholes: 128
net.inet.tcp.sack.globalmaxholes: 65536
net.inet.tcp.sack.globalholes: 0
net.inet.tcp.minmss: 216
net.inet.tcp.log_debug: 0
net.inet.tcp.tcbhashsize: 524288
net.inet.tcp.do_tcpdrain: 1
net.inet.tcp.pcbcount: 43
net.inet.tcp.icmp_may_rst: 1
net.inet.tcp.isn_reseed_interval: 0
net.inet.tcp.soreceive_stream: 0
net.inet.tcp.syncookies: 1
net.inet.tcp.syncookies_only: 0
net.inet.tcp.syncache.bucketlimit: 30
net.inet.tcp.syncache.cachelimit: 15375
net.inet.tcp.syncache.count: 0
net.inet.tcp.syncache.hashsize: 512
net.inet.tcp.syncache.rexmtlimit: 3
net.inet.tcp.syncache.rst_on_sock_fail: 1
net.inet.tcp.msl: 30000
net.inet.tcp.rexmit_min: 30
net.inet.tcp.rexmit_slop: 200
net.inet.tcp.always_keepalive: 1
net.inet.tcp.fast_finwait2_recycle: 0
net.inet.tcp.finwait2_timeout: 60000
net.inet.tcp.keepcnt: 8
net.inet.tcp.rexmit_drop_options: 0
net.inet.tcp.per_cpu_timers: 0
net.inet.tcp.timer_race: 0
net.inet.tcp.maxtcptw: 27767
net.inet.tcp.nolocaltimewait: 0


>> S-BCNT  57344
>
> You were probably unlucky when you sampled this value, and caught it at
> a bad time...  Also, look at how much CPU time ssh uses...  ssh can
> introduce additional latency that isn't apparent from the network...

S-BCNT is always ~60KB when HPN is enabled.  This jives with my "ssh is 
spending all of its time waiting on select(2)" theory.

> It's very possible that we don't set any of these values, so what
> happens is that ssh reads the value of the receive buffer at startup,
> which is 64k or so, and only does buffering in that size..  Then you
> end up w/ a latency not of your network, but of the speed at which
> your computer can encrypt at...  Just a thought, but you could also
> measure latency between writes using ktrace to help figure this
> out...

Yes, I believe something like this is happening now.

Thank you again for your help... this thread is proving to me one of 
those quantum leap moments for me in terms of FreeBSD knowledge.

Chris