From owner-freebsd-net@FreeBSD.ORG Wed Mar 30 12:38:25 2011 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9CBD1106566B; Wed, 30 Mar 2011 12:38:25 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 750778FC15; Wed, 30 Mar 2011 12:38:25 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 2769F46B55; Wed, 30 Mar 2011 08:38:25 -0400 (EDT) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id B89BF8A027; Wed, 30 Mar 2011 08:38:24 -0400 (EDT) From: John Baldwin To: "Stefan `Sec` Zehl" Date: Wed, 30 Mar 2011 08:38:09 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110325; KDE/4.5.5; amd64; ; ) References: <4D8B99B4.4070404@FreeBSD.org> <201103281423.52202.jhb@freebsd.org> <20110328183810.GF23803@ice.42.org> In-Reply-To: <20110328183810.GF23803@ice.42.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201103300838.09608.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Wed, 30 Mar 2011 08:38:24 -0400 (EDT) Cc: freebsd-net@freebsd.org, Doug Barton Subject: Re: The tale of a TCP bug X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 30 Mar 2011 12:38:25 -0000 On Monday, March 28, 2011 2:38:10 pm Stefan `Sec` Zehl wrote: > Hi, > > On Mon, Mar 28, 2011 at 14:23 -0400, John Baldwin wrote: > > > > No, this is not really right. Your patch from your blog is the best > > fix actually. The reason we want to let 'win' be larger than > > TCP_MAXWIN is that if the remote end sends more data than we've > > advertised but we have room in the socket buffer, we want to go ahead > > and accept the data as valid and ACK it rather than dropping the data > > that is beyond rcv_adv. My change above to rcv_wnd would break this. > > Also, for the TCPS_SYN_SENT case we don't know what 'rcv_scale' is > > until just before we update 'rcv_adv'. This should be the same > > as your patch: > > > > Index: tcp_input.c > > =================================================================== > > --- tcp_input.c (revision 220098) > > +++ tcp_input.c (working copy) > > @@ -1756,7 +1756,8 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th, > > (TF_RCVD_SCALE|TF_REQ_SCALE)) { > > tp->rcv_scale = tp->request_r_scale; > > } > > - tp->rcv_adv += tp->rcv_wnd; > > + tp->rcv_adv += imin(tp->rcv_wnd, > > + TCP_MAXWIN << tp->rcv_scale); > > tp->snd_una++; /* SYN is acked */ > > /* > > * If there's data, delay ACK; if there's also a FIN > > > > I've applied this to my test-VM, and as expected it now passes my two > testcases. As far as I'm concerned this fixes it for me. > > I'm interested to see if my adv_neg counting hack together with this > patch still registers any hits. -- If nobody beats me to it, I'll try it > out on my webserver tomorrow. There is at least one case I know of related to a bug I reported earlier where a window probe from a remote connection can cause rcv_nxt to advance past rcv_adv by one. However, I think we want to know about those cases, and we should probably be treating rcv_adv - rcv_nxt as if it is zero in that case, not -1 (my patch in my original e-mail does just that in a different place in tcp_output() when we calculate the window "for real"). -- John Baldwin