From owner-freebsd-net@FreeBSD.ORG Tue Jan 15 08:29:54 2013 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 6894BB7; Tue, 15 Jan 2013 08:29:54 +0000 (UTC) (envelope-from lstewart@freebsd.org) Received: from lauren.room52.net (lauren.room52.net [210.50.193.198]) by mx1.freebsd.org (Postfix) with ESMTP id F3FD538F; Tue, 15 Jan 2013 08:29:53 +0000 (UTC) Received: from lstewart1.loshell.room52.net (ppp59-167-184-191.static.internode.on.net [59.167.184.191]) by lauren.room52.net (Postfix) with ESMTPSA id 478A37E88D; Tue, 15 Jan 2013 19:29:51 +1100 (EST) Message-ID: <50F5137F.1060207@freebsd.org> Date: Tue, 15 Jan 2013 19:29:51 +1100 From: Lawrence Stewart User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:13.0) Gecko/20120613 Thunderbird/13.0 MIME-Version: 1.0 To: John Baldwin Subject: Re: Some questions about the new TCP congestion control code References: <201301141604.29864.jhb@freebsd.org> In-Reply-To: <201301141604.29864.jhb@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=0.0 required=5.0 tests=UNPARSEABLE_RELAY autolearn=unavailable version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on lauren.room52.net Cc: net@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Jan 2013 08:29:54 -0000 Hi John, On 01/15/13 08:04, John Baldwin wrote: > I was looking at TCP congestion control at work recently and noticed a few Poor you ;) > "odd" things in the current code. First, there is this chunk of code in > cc_ack_received() in tcp_input.c: > > static void inline > cc_ack_received(struct tcpcb *tp, struct tcphdr *th, uint16_t type) > { > INP_WLOCK_ASSERT(tp->t_inpcb); > > tp->ccv->bytes_this_ack = BYTES_THIS_ACK(tp, th); > if (tp->snd_cwnd == min(tp->snd_cwnd, tp->snd_wnd)) > tp->ccv->flags |= CCF_CWND_LIMITED; > else > tp->ccv->flags &= ~CCF_CWND_LIMITED; > > > Due to hysterical raisins, snd_cwnd and snd_wnd are u_long values, not > integers, so the call to min() results in truncation on 64-bit hosts. Good catch, but I don't think it matters in practice as neither snd_cwnd or snd_wnd will grow past the 32-bit boundary. > It should probably be ulmin() instead. However, this line seems to be a really > obfuscated way to just write: > > if (tp->snd_cwnd <= tp->snd_wnd) You are correct, though I'd argue the meaning of the existing code as written is clearer compared to your suggested change. > If that is correct, I would vote for changing this to use the much simpler > logic. Agreed. While I find the existing code slightly clearer in meaning, it's not significant enough to warrant keeping it as is when your suggested change is simpler, fixes a bug and achieves the same thing. Happy for you to change it or I can do it if you prefer. > Secondly, in the particular case I was investigating at work (restart of an > idle connnection), the newreno congestion control code in 8.x and later uses a > different algorithm than in 7. Specifically, in 7 TCP would reuse the same > logic used for an initial cwnd (honoring ss_fltsz). In 8 this no longer > happens (instead, 2 is hardcoded). A guess at a possible fix might look > something like this: > > Index: cc_newreno.c > =================================================================== > --- cc_newreno.c (revision 243660) > +++ cc_newreno.c (working copy) > @@ -169,8 +169,21 @@ newreno_after_idle(struct cc_var *ccv) > if (V_tcp_do_rfc3390) > rw = min(4 * CCV(ccv, t_maxseg), > max(2 * CCV(ccv, t_maxseg), 4380)); > +#if 1 > else > rw = CCV(ccv, t_maxseg) * 2; > +#else > + /* XXX: This is missing a lot of stuff that used to be in 7. */ > +#ifdef INET6 > + else if ((isipv6 ? in6_localaddr(&CCV(ccv, t_inpcb->in6p_faddr)) : > + in_localaddr(CCV(ccv, t_inpcb->inp_faddr)))) > +#else > + else if (in_localaddr(CCV(ccv, t_inpcb->inp_faddr))) > +#endif > + rw = V_ss_fltsz_local * CCV(ccv, t_maxseg); > + else > + rw = V_ss_fltsz * CCV(ccv, t_maxseg); > +#endif > > CCV(ccv, snd_cwnd) = min(rw, CCV(ccv, snd_cwnd)); > } > > (But using the #else clause instead of the current #if 1 code). Was this > change in 8 intentional? It was. Unlike connection initialisation which still honours ss_fltsz in cc_conn_init(), restarting an idle connection based on ss_fltsz seemed particularly dubious and as such was omitted from the refactored code. The ultimate goal was to remove the ss_fltsz hack completely and implement a smarter mechanism, but that hasn't quite happened yet. The removal of ss_fltsz from 10.x without providing a replacement mechanism is not ideal and should probably be addressed. I'm guessing you're not using rfc3390 because you want to override the initial window based on specific local knowledge of the path between sender and receiver? Cheers, Lawrence