From owner-svn-src-head@FreeBSD.ORG Sat May 14 14:37:53 2011 Return-Path: Delivered-To: svn-src-head@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8A2F51065672; Sat, 14 May 2011 14:37:53 +0000 (UTC) (envelope-from jhb@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 350E28FC15; Sat, 14 May 2011 14:37:53 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id ADBEA46B2E; Sat, 14 May 2011 10:37:52 -0400 (EDT) Received: from John-Baldwins-Macbook-Pro.local (unknown [192.75.139.253]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 051E78A050; Sat, 14 May 2011 10:37:51 -0400 (EDT) Message-ID: <4DCE93BF.7000803@FreeBSD.org> Date: Sat, 14 May 2011 10:37:51 -0400 From: John Baldwin User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10 MIME-Version: 1.0 To: Mikolaj Golub References: <201105022105.p42L5q3j054498@svn.freebsd.org> <86pqnlbmao.fsf@kopusha.home.net> In-Reply-To: <86pqnlbmao.fsf@kopusha.home.net> Content-Type: text/plain; charset=KOI8-R; format=flowed Content-Transfer-Encoding: 7bit X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Sat, 14 May 2011 10:37:52 -0400 (EDT) Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org Subject: Re: svn commit: r221346 - head/sys/netinet X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 May 2011 14:37:53 -0000 On 5/14/11 6:37 AM, Mikolaj Golub wrote: > Hi, > > On Mon, 2 May 2011 21:05:52 +0000 (UTC) John Baldwin wrote: > > JB> Author: jhb > JB> Date: Mon May 2 21:05:52 2011 > JB> New Revision: 221346 > JB> URL: http://svn.freebsd.org/changeset/base/221346 > > JB> Log: > JB> Handle a rare edge case with nearly full TCP receive buffers. If a TCP > JB> buffer fills up causing the remote sender to enter into persist mode, but > JB> there is still room available in the receive buffer when a window probe > JB> arrives (either due to window scaling, or due to the local application > JB> very slowing draining data from the receive buffer), then the single byte > JB> of data in the window probe is accepted. However, this can cause rcv_nxt > JB> to be greater than rcv_adv. This condition will only last until the next > JB> ACK packet is pushed out via tcp_output(), and since the previous ACK > JB> advertised a zero window, the ACK should be pushed out while the TCP > JB> pcb is write-locked. > JB> > JB> During the window while rcv_nxt is greather than rcv_adv, a few places > JB> would compute the remaining receive window via rcv_adv - rcv_nxt. > JB> However, this value was then (uint32_t)-1. On a 64 bit machine this > JB> could expand to a positive 2^32 - 1 when cast to a long. In particular, > JB> when calculating the receive window in tcp_output(), the result would be > JB> that the receive window was computed as 2^32 - 1 resulting in advertising > JB> a far larger window to the remote peer than actually existed. > JB> > JB> Fix various places that compute the remaining receive window to either > JB> assert that it is not negative (i.e. rcv_nxt<= rcv_adv), or treat the > JB> window as full if rcv_nxt is greather than rcv_adv. > JB> > JB> Reviewed by: bz > JB> MFC after: 1 month > > JB> Modified: > JB> head/sys/netinet/tcp_input.c > JB> head/sys/netinet/tcp_output.c > JB> head/sys/netinet/tcp_timewait.c > > JB> Modified: head/sys/netinet/tcp_input.c > JB> ============================================================================== > JB> --- head/sys/netinet/tcp_input.c Mon May 2 21:04:37 2011 (r221345) > JB> +++ head/sys/netinet/tcp_input.c Mon May 2 21:05:52 2011 (r221346) > JB> @@ -1831,6 +1831,9 @@ tcp_do_segment(struct mbuf *m, struct tc > JB> win = sbspace(&so->so_rcv); > JB> if (win< 0) > JB> win = 0; > JB> + KASSERT(SEQ_GEQ(tp->rcv_adv, tp->rcv_nxt), > JB> + ("tcp_input negative window: tp %p rcv_nxt %u rcv_adv %u", tp, > JB> + tp->rcv_adv, tp->rcv_nxt)); > > I am getting this when running tests with HAST (both primary and secondary HAST > instances on the same host). > > HAST is synchronizing data in MAXPHYS (131072 bytes) blocks. The sender splits > them on smaller chunks of MAX_SEND_SIZE (32768 bytes), while the receiver > receives the whole block calling recv() with MSG_WAITALL option. Can you capture a tcpdump (probably easiest to do from the other host)? -- John Baldwin