From owner-freebsd-net@FreeBSD.ORG  Tue Jan 15 08:29:54 2013
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: net@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 6894BB7;
 Tue, 15 Jan 2013 08:29:54 +0000 (UTC)
 (envelope-from lstewart@freebsd.org)
Received: from lauren.room52.net (lauren.room52.net [210.50.193.198])
 by mx1.freebsd.org (Postfix) with ESMTP id F3FD538F;
 Tue, 15 Jan 2013 08:29:53 +0000 (UTC)
Received: from lstewart1.loshell.room52.net
 (ppp59-167-184-191.static.internode.on.net [59.167.184.191])
 by lauren.room52.net (Postfix) with ESMTPSA id 478A37E88D;
 Tue, 15 Jan 2013 19:29:51 +1100 (EST)
Message-ID: <50F5137F.1060207@freebsd.org>
Date: Tue, 15 Jan 2013 19:29:51 +1100
From: Lawrence Stewart <lstewart@freebsd.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:13.0) Gecko/20120613 Thunderbird/13.0
MIME-Version: 1.0
To: John Baldwin <jhb@freebsd.org>
Subject: Re: Some questions about the new TCP congestion control code
References: <201301141604.29864.jhb@freebsd.org>
In-Reply-To: <201301141604.29864.jhb@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=0.0 required=5.0 tests=UNPARSEABLE_RELAY
 autolearn=unavailable version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on lauren.room52.net
Cc: net@freebsd.org
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
 <mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 15 Jan 2013 08:29:54 -0000

Hi John,

On 01/15/13 08:04, John Baldwin wrote:
> I was looking at TCP congestion control at work recently and noticed a few 

Poor you ;)

> "odd" things in the current code.  First, there is this chunk of code in 
> cc_ack_received() in tcp_input.c:
> 
> static void inline
> cc_ack_received(struct tcpcb *tp, struct tcphdr *th, uint16_t type)
> {
> 	INP_WLOCK_ASSERT(tp->t_inpcb);
> 
> 	tp->ccv->bytes_this_ack = BYTES_THIS_ACK(tp, th);
> 	if (tp->snd_cwnd == min(tp->snd_cwnd, tp->snd_wnd))
> 		tp->ccv->flags |= CCF_CWND_LIMITED;
> 	else
> 		tp->ccv->flags &= ~CCF_CWND_LIMITED;
> 
> 
> Due to hysterical raisins, snd_cwnd and snd_wnd are u_long values, not 
> integers, so the call to min() results in truncation on 64-bit hosts.

Good catch, but I don't think it matters in practice as neither snd_cwnd
or snd_wnd will grow past the 32-bit boundary.

> It should probably be ulmin() instead.  However, this line seems to be a really 
> obfuscated way to just write:
> 
> 	if (tp->snd_cwnd <= tp->snd_wnd)

You are correct, though I'd argue the meaning of the existing code as
written is clearer compared to your suggested change.

> If that is correct, I would vote for changing this to use the much simpler 
> logic.

Agreed. While I find the existing code slightly clearer in meaning, it's
not significant enough to warrant keeping it as is when your suggested
change is simpler, fixes a bug and achieves the same thing. Happy for
you to change it or I can do it if you prefer.

> Secondly, in the particular case I was investigating at work (restart of an 
> idle connnection), the newreno congestion control code in 8.x and later uses a 
> different algorithm than in 7.  Specifically, in 7 TCP would reuse the same 
> logic used for an initial cwnd (honoring ss_fltsz).  In 8 this no longer 
> happens (instead, 2 is hardcoded).  A guess at a possible fix might look 
> something like this:
> 
> Index: cc_newreno.c
> ===================================================================
> --- cc_newreno.c	(revision 243660)
> +++ cc_newreno.c	(working copy)
> @@ -169,8 +169,21 @@ newreno_after_idle(struct cc_var *ccv)
>  	if (V_tcp_do_rfc3390)
>  		rw = min(4 * CCV(ccv, t_maxseg),
>  		    max(2 * CCV(ccv, t_maxseg), 4380));
> +#if 1
>  	else
>  		rw = CCV(ccv, t_maxseg) * 2;
> +#else
> +	/* XXX: This is missing a lot of stuff that used to be in 7. */
> +#ifdef INET6
> +	else if ((isipv6 ? in6_localaddr(&CCV(ccv, t_inpcb->in6p_faddr)) :
> +		in_localaddr(CCV(ccv, t_inpcb->inp_faddr))))
> +#else
> +	else if (in_localaddr(CCV(ccv, t_inpcb->inp_faddr)))
> +#endif
> +		rw = V_ss_fltsz_local * CCV(ccv, t_maxseg);
> +	else
> +		rw = V_ss_fltsz * CCV(ccv, t_maxseg);
> +#endif
>  
>  	CCV(ccv, snd_cwnd) = min(rw, CCV(ccv, snd_cwnd));
>  }
> 
> (But using the #else clause instead of the current #if 1 code).  Was this 
> change in 8 intentional?

It was. Unlike connection initialisation which still honours ss_fltsz in
cc_conn_init(), restarting an idle connection based on ss_fltsz seemed
particularly dubious and as such was omitted from the refactored code.

The ultimate goal was to remove the ss_fltsz hack completely and
implement a smarter mechanism, but that hasn't quite happened yet. The
removal of ss_fltsz from 10.x without providing a replacement mechanism
is not ideal and should probably be addressed.

I'm guessing you're not using rfc3390 because you want to override the
initial window based on specific local knowledge of the path between
sender and receiver?

Cheers,
Lawrence