Date: Sat, 14 May 2011 10:37:51 -0400 From: John Baldwin <jhb@FreeBSD.org> To: Mikolaj Golub <trociny@freebsd.org> Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org Subject: Re: svn commit: r221346 - head/sys/netinet Message-ID: <4DCE93BF.7000803@FreeBSD.org> In-Reply-To: <86pqnlbmao.fsf@kopusha.home.net> References: <201105022105.p42L5q3j054498@svn.freebsd.org> <86pqnlbmao.fsf@kopusha.home.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On 5/14/11 6:37 AM, Mikolaj Golub wrote: > Hi, > > On Mon, 2 May 2011 21:05:52 +0000 (UTC) John Baldwin wrote: > > JB> Author: jhb > JB> Date: Mon May 2 21:05:52 2011 > JB> New Revision: 221346 > JB> URL: http://svn.freebsd.org/changeset/base/221346 > > JB> Log: > JB> Handle a rare edge case with nearly full TCP receive buffers. If a TCP > JB> buffer fills up causing the remote sender to enter into persist mode, but > JB> there is still room available in the receive buffer when a window probe > JB> arrives (either due to window scaling, or due to the local application > JB> very slowing draining data from the receive buffer), then the single byte > JB> of data in the window probe is accepted. However, this can cause rcv_nxt > JB> to be greater than rcv_adv. This condition will only last until the next > JB> ACK packet is pushed out via tcp_output(), and since the previous ACK > JB> advertised a zero window, the ACK should be pushed out while the TCP > JB> pcb is write-locked. > JB> > JB> During the window while rcv_nxt is greather than rcv_adv, a few places > JB> would compute the remaining receive window via rcv_adv - rcv_nxt. > JB> However, this value was then (uint32_t)-1. On a 64 bit machine this > JB> could expand to a positive 2^32 - 1 when cast to a long. In particular, > JB> when calculating the receive window in tcp_output(), the result would be > JB> that the receive window was computed as 2^32 - 1 resulting in advertising > JB> a far larger window to the remote peer than actually existed. > JB> > JB> Fix various places that compute the remaining receive window to either > JB> assert that it is not negative (i.e. rcv_nxt<= rcv_adv), or treat the > JB> window as full if rcv_nxt is greather than rcv_adv. > JB> > JB> Reviewed by: bz > JB> MFC after: 1 month > > JB> Modified: > JB> head/sys/netinet/tcp_input.c > JB> head/sys/netinet/tcp_output.c > JB> head/sys/netinet/tcp_timewait.c > > JB> Modified: head/sys/netinet/tcp_input.c > JB> ============================================================================== > JB> --- head/sys/netinet/tcp_input.c Mon May 2 21:04:37 2011 (r221345) > JB> +++ head/sys/netinet/tcp_input.c Mon May 2 21:05:52 2011 (r221346) > JB> @@ -1831,6 +1831,9 @@ tcp_do_segment(struct mbuf *m, struct tc > JB> win = sbspace(&so->so_rcv); > JB> if (win< 0) > JB> win = 0; > JB> + KASSERT(SEQ_GEQ(tp->rcv_adv, tp->rcv_nxt), > JB> + ("tcp_input negative window: tp %p rcv_nxt %u rcv_adv %u", tp, > JB> + tp->rcv_adv, tp->rcv_nxt)); > > I am getting this when running tests with HAST (both primary and secondary HAST > instances on the same host). > > HAST is synchronizing data in MAXPHYS (131072 bytes) blocks. The sender splits > them on smaller chunks of MAX_SEND_SIZE (32768 bytes), while the receiver > receives the whole block calling recv() with MSG_WAITALL option. Can you capture a tcpdump (probably easiest to do from the other host)? -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4DCE93BF.7000803>