From owner-freebsd-net@FreeBSD.ORG Mon Mar 28 14:21:04 2011 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5DBC8106564A; Mon, 28 Mar 2011 14:21:04 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 168AF8FC17; Mon, 28 Mar 2011 14:21:04 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id A5ED346B49; Mon, 28 Mar 2011 10:21:03 -0400 (EDT) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 327AD8A02A; Mon, 28 Mar 2011 10:21:03 -0400 (EDT) From: John Baldwin To: "Stefan `Sec` Zehl" Date: Mon, 28 Mar 2011 10:21:00 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110311; KDE/4.5.5; amd64; ; ) References: <4D8B99B4.4070404@FreeBSD.org> <201103251640.16147.jhb@freebsd.org> <20110326140212.GB45402@ice.42.org> In-Reply-To: <20110326140212.GB45402@ice.42.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201103281021.00673.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Mon, 28 Mar 2011 10:21:03 -0400 (EDT) Cc: freebsd-net@freebsd.org, Doug Barton Subject: Re: The tale of a TCP bug X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 28 Mar 2011 14:21:04 -0000 On Saturday, March 26, 2011 10:02:12 am Stefan `Sec` Zehl wrote: > Hi again, > > On Fri, Mar 25, 2011 at 16:40 -0400, John Baldwin wrote: > > Reading some more. I'm trying to understand the breakage in your case. > > > > You are saying that FreeBSD is the sender, who has data to send, yet is not > > sending any window probes because it never starts the persist timer when the > > initial window is zero? Is that correct? > > Yes. The receiver never sends a window update on its own, but when > probed will "admit" to a bigger window. > > > And the problem is that the code that uses 'adv' to determine if it > > sound send a window update to the remote end is falsely succeeding due > > to the overflow causing tcp_output() to 'goto send' but that it then > > fails to send any data because it thinks the remote window is full? > > Yes, as far as I remember (I did that part of debugging 2 Months ago, > when I submitted the PR %-) that's what happens. > > > So one thing I don't quite follow is how you are having rcv_nxt > > > rcv_adv. I saw this when the other side would send a window probe, > > and then the receiving side would take the -1 remaining window and > > explode it into the maximum window size when it ACKd. > > No, it's not rcv_nxt > rcv_adv. It's > > (rcv_adv - rcv_nxt) > min(recwin, (long)TCP_MAXWIN << tp->rcv_scale) > > My sample case has (rcv_adv - rcv_nxt) = 65536, but > (TCP_MAXWIN << tp->rcv_scale) = 65535 (as there is no window scaling in > effect) Ahhhh. > > Are you seeing the other end of the connection send a window probe, but > > FreeBSD is not setting the persist timer so that it will send its own window > > probes? > > No, the dump looks like this: > > | 10.42.0.25.44852 > 10.42.0.2.1516: Flags [S], > | seq 3339144437, win 65535, options [...], length 0 > > FreeBSD sending the first SYN. > [rcv_adv=0, rcv_nxt=0] > > | 10.42.0.2.1516 > 10.42.0.25.44852: Flags [S.], > | seq 42, ack 3339144438, win 0, length 0 > > The other end SYN|ACKing with a window size of 0. > > | 10.42.0.25.44852 > 10.42.0.2.1516: Flags [.], > | seq 1, ack 1, win 65535, length 0 > > FreeBSD ACKing, and (correctly) sending no data. > [rcv_adv=67779, rcv_nxt=43], thus resulting in adv=-1/0xffffffff Ahh, and this is the real bug. And this goes back to the calculation of 'rcv_wnd' in tcp_input(). How about this: Index: tcp_input.c =================================================================== --- tcp_input.c (revision 220098) +++ tcp_input.c (working copy) @@ -1694,6 +1694,8 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th, win = sbspace(&so->so_rcv); if (win < 0) win = 0; + if (win > TCP_MAXWIN << tp->rcv_scale) + win = TCP_MAXWIN << tp->rcv_scale; tp->rcv_wnd = imax(win, (int)(tp->rcv_adv - tp->rcv_nxt)); /* Reset receive buffer auto scaling when not in bulk receive mode. */ This is basically the same as your patch except that it ensures that 'rcv_wnd' is accurate for any other uses. It looks like the syncache code is already correct as it uses a similar test to initialize 'sc_wnd': /* * Initial receive window: clip sbspace to [0 .. TCP_MAXWIN]. * win was derived from socket earlier in the function. */ win = imax(win, 0); win = imin(win, TCP_MAXWIN); sc->sc_wnd = win; -- John Baldwin