From owner-freebsd-net@FreeBSD.ORG  Fri Aug  5 07:23:08 2011
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 11DEC1065670;
	Fri,  5 Aug 2011 07:23:08 +0000 (UTC) (envelope-from slw@zxy.spb.ru)
Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98])
	by mx1.freebsd.org (Postfix) with ESMTP id 695F68FC16;
	Fri,  5 Aug 2011 07:23:07 +0000 (UTC)
Received: from slw by zxy.spb.ru with local (Exim 4.69 (FreeBSD))
	(envelope-from <slw@zxy.spb.ru>)
	id 1QpELn-000Hd0-PL; Fri, 05 Aug 2011 10:57:43 +0400
Date: Fri, 5 Aug 2011 10:57:43 +0400
From: Slawa Olhovchenkov <slw@zxy.spb.ru>
To: Lawrence Stewart <lstewart@freebsd.org>
Message-ID: <20110805065743.GC94016@zxy.spb.ru>
References: <E18D678F05BB4F3B93ADEB304CCA8504@multiplay.co.uk>
	<1F95A4C2D54E4F369830143CBDB5FF86@multiplay.co.uk>
	<4E37C0F2.4080004@freebsd.org>
	<2B063B6D95AA4C27B004C50D96393F91@multiplay.co.uk>
	<C706DEE346684B8DB06CFC090F556E72@multiplay.co.uk>
	<4E3AA66A.6060605@freebsd.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4E3AA66A.6060605@freebsd.org>
User-Agent: Mutt/1.5.20 (2009-06-14)
X-SA-Exim-Connect-IP: <locally generated>
X-SA-Exim-Mail-From: slw@zxy.spb.ru
X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false
Cc: Andre Oppermann <andre@freebsd.org>,
	Steven Hartland <killing@multiplay.co.uk>, freebsd-net@freebsd.org
Subject: Re: tcp failing to recover from a packet loss under 8.2-RELEASE?
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 05 Aug 2011 07:23:08 -0000

On Fri, Aug 05, 2011 at 12:02:18AM +1000, Lawrence Stewart wrote:

> > Setting net.inet.tcp.reass.maxsegments=8148 and rerunning the
> > tests appears to result in a solid 14MB/s, its still running a
> > full soak test but looking very promising :)
> 
> This is exactly the necessary tuning required to drive high BDP links 
> successfully. The unfortunate problem with my reassembly change was that 
> by removing the global count of reassembly segments and using the uma 
> zone to enforce the restrictions on memory use, we wouldn't necessarily 
> have room for the last segment (particularly if a single flow has a BDP 
> larger than the max size of the reassembly queue - which is the case for 
> you and Slawa).
> 
> This is bad as Andre explained in his message, as we could stall 
> connections. I hadn't even considered the idea of allocating on the 
> stack as Andre has suggested in his patch, which I believe is an 
> appropriate solution to the the stalling problem assuming the function 
> will never return with the stack allocated tqe still in the reassembly 
> queue. My longer term goal is discussed below.
> 
> > So I suppose the question is should maxsegments be larger by
> > default due to the recent changes e.g.
> > - V_tcp_reass_maxseg = nmbclusters / 16;
> > + V_tcp_reass_maxseg = nmbclusters / 8;
> >
> > or is the correct fix something more involved?
> 
> I'm not sure if bumping the value is appropriate - we have always 
> expected users to tune their network stack to perform well when used in 
> "unusual" scenarios - a large BDP fibre path still being in the 
> "unusual" category.
> 
> The real fix which is somewhere down on my todo list is to make all 
> these memory constraints elastic and respond to VM pressure, thus 
> negating the need for a hard limit at all. This would solve many if not 
> most of the TCP tuning problems we currently have with one foul swoop 
> and would greatly reduce the need for tuning in many situations that 
> currently are in the "needs manual tuning" basket.

Autotunig w/o limits is bad idea. This is way to DoS.
May be solved this trouble by preallocation "hidden" element in tqe
for segment received in valid order and ready for send to application?
T.e. when creating reassembled queue for tcp connection we allocation
queue element (with room for data payload), used only when data ready
for application. Allocation in queue for not breaking ABI (don't
change struct tcpcb).

> Andre and Steven, I'm a bit too sleepy to properly review your combined 
> proposed changes right now and will follow up in the next few days instead.
> 
> Cheers,
> Lawrence