Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 14 May 2011 10:37:51 -0400
From:      John Baldwin <jhb@FreeBSD.org>
To:        Mikolaj Golub <trociny@freebsd.org>
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org
Subject:   Re: svn commit: r221346 - head/sys/netinet
Message-ID:  <4DCE93BF.7000803@FreeBSD.org>
In-Reply-To: <86pqnlbmao.fsf@kopusha.home.net>
References:  <201105022105.p42L5q3j054498@svn.freebsd.org> <86pqnlbmao.fsf@kopusha.home.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On 5/14/11 6:37 AM, Mikolaj Golub wrote:
> Hi,
>
> On Mon, 2 May 2011 21:05:52 +0000 (UTC) John Baldwin wrote:
>
>   JB>  Author: jhb
>   JB>  Date: Mon May  2 21:05:52 2011
>   JB>  New Revision: 221346
>   JB>  URL: http://svn.freebsd.org/changeset/base/221346
>
>   JB>  Log:
>   JB>    Handle a rare edge case with nearly full TCP receive buffers.  If a TCP
>   JB>    buffer fills up causing the remote sender to enter into persist mode, but
>   JB>    there is still room available in the receive buffer when a window probe
>   JB>    arrives (either due to window scaling, or due to the local application
>   JB>    very slowing draining data from the receive buffer), then the single byte
>   JB>    of data in the window probe is accepted.  However, this can cause rcv_nxt
>   JB>    to be greater than rcv_adv.  This condition will only last until the next
>   JB>    ACK packet is pushed out via tcp_output(), and since the previous ACK
>   JB>    advertised a zero window, the ACK should be pushed out while the TCP
>   JB>    pcb is write-locked.
>   JB>
>   JB>    During the window while rcv_nxt is greather than rcv_adv, a few places
>   JB>    would compute the remaining receive window via rcv_adv - rcv_nxt.
>   JB>    However, this value was then (uint32_t)-1.  On a 64 bit machine this
>   JB>    could expand to a positive 2^32 - 1 when cast to a long.  In particular,
>   JB>    when calculating the receive window in tcp_output(), the result would be
>   JB>    that the receive window was computed as 2^32 - 1 resulting in advertising
>   JB>    a far larger window to the remote peer than actually existed.
>   JB>
>   JB>    Fix various places that compute the remaining receive window to either
>   JB>    assert that it is not negative (i.e. rcv_nxt<= rcv_adv), or treat the
>   JB>    window as full if rcv_nxt is greather than rcv_adv.
>   JB>
>   JB>    Reviewed by:        bz
>   JB>    MFC after:        1 month
>
>   JB>  Modified:
>   JB>    head/sys/netinet/tcp_input.c
>   JB>    head/sys/netinet/tcp_output.c
>   JB>    head/sys/netinet/tcp_timewait.c
>
>   JB>  Modified: head/sys/netinet/tcp_input.c
>   JB>  ==============================================================================
>   JB>  --- head/sys/netinet/tcp_input.c        Mon May  2 21:04:37 2011        (r221345)
>   JB>  +++ head/sys/netinet/tcp_input.c        Mon May  2 21:05:52 2011        (r221346)
>   JB>  @@ -1831,6 +1831,9 @@ tcp_do_segment(struct mbuf *m, struct tc
>   JB>           win = sbspace(&so->so_rcv);
>   JB>           if (win<  0)
>   JB>                   win = 0;
>   JB>  +        KASSERT(SEQ_GEQ(tp->rcv_adv, tp->rcv_nxt),
>   JB>  +            ("tcp_input negative window: tp %p rcv_nxt %u rcv_adv %u", tp,
>   JB>  +            tp->rcv_adv, tp->rcv_nxt));
>
> I am getting this when running tests with HAST (both primary and secondary HAST
> instances on the same host).
>
> HAST is synchronizing data in MAXPHYS (131072 bytes) blocks. The sender splits
> them on smaller chunks of MAX_SEND_SIZE (32768 bytes), while the receiver
> receives the whole block calling recv() with MSG_WAITALL option.

Can you capture a tcpdump (probably easiest to do from the other host)?

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4DCE93BF.7000803>