Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 30 Jun 1997 22:11:20 -0700 (PDT)
From:      John Polstra <jdp@polstra.com>
To:        FreeBSD-gnats-submit@FreeBSD.ORG
Subject:   kern/3998: Dropped TCP connections
Message-ID:  <199707010511.WAA28279@austin.polstra.com>
Resent-Message-ID: <199707010520.WAA09254@hub.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         3998
>Category:       kern
>Synopsis:       Unusual traffic pattern drops TCP connections even using loopback interface
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    freebsd-bugs
>State:          open
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Jun 30 22:20:00 PDT 1997
>Last-Modified:
>Originator:     John Polstra
>Organization:
Polstra & Co., Seattle, WA
>Release:        FreeBSD 2.2-STABLE i386, FreeBSD-3.0 current, and probably earlier versions
>Environment:

	Not relevant.

>Description:

	Under certain circumstances, the FreeBSD TCP stack gets
	into an unusual state that causes it to drop a perfectly
	good connection.  I have diagnosed the cause of the problem,
	and I have a fix for it which I am going to commit soon.
	The purpose of this PR is to record the explanation of the
	problem and the test case which demonstrates it.

>How-To-Repeat:

	The attached test case demonstrates the bug.  The test case
	establishes a TCP connection between a client C and a server
	S.  Communication then proceeds as follows:

	* S sends data continually to C until S is blocked by flow
	  control.  During this time, C simply sleeps without reading
	  any data.  The goal is to fill up the channel S->C and put
	  S into the "persist" state.

	* After the persist state has been attained, S goes into an
	  infinite loop reading whatever data comes in from C.

	* C reads 1 byte of data.

	* C then delays for 10 seconds to allow time for S to send
	  at least one window probe.  When the probe comes in, C
	  accepts its 1 byte of data, because there is now room for
	  it in the socket's receive buffer.  This triggers the bug
	  by causing C's tp->rcv_nxt to advance beyond its tp->rcv_adv.
	  At a different place in the code, in tcp_input.c, the
	  difference (tp->rcv_adv - tp->rcv_nxt) is used as an
	  unsigned number.  But since it is negative, it appears as
	  a very large unsigned number.  This effectively creates a
	  negative-sized receive window, and causes all subsequent
	  segments from S to be dropped.

	* C then goes into a loop writing data continuously to S.
	  But S's acknowledgments all get dropped by C, because they
	  fall outside of C's bogus negative receive window.  C
	  therefore retransmits repeatedly until it eventually times
	  out and drops the connection.

>Fix:
	
	Use a signed calculation in the crucial place:
	
Index: tcp_input.c
===================================================================
RCS file: /home/ncvs/src/sys/netinet/tcp_input.c,v
retrieving revision 1.58
diff -u -r1.58 tcp_input.c
--- tcp_input.c	1997/04/27 20:01:13	1.58
+++ tcp_input.c	1997/07/01 05:05:35
@@ -604,7 +604,7 @@
	win = sbspace(&so->so_rcv);
	if (win < 0)
		win = 0;
-	tp->rcv_wnd = max(win, (int)(tp->rcv_adv - tp->rcv_nxt));
+	tp->rcv_wnd = imax(win, (int)(tp->rcv_adv - tp->rcv_nxt));
	}
 
	switch (tp->t_state) {

As mentioned above, I am going to commit this fix.

Here is the test case for posterity:

# This is a shell archive.  Save it in a file, remove anything before
# this line, and then unpack it by entering "sh file".  Note, it may
# create directories; files and directories will be owned by you and
# have default permissions.
#
# This archive contains:
#
#	Makefile
#	Makefile.inc
#	README
#	client
#	client/Makefile
#	client/client.c
#	include
#	include/looptest.h
#	server
#	server/Makefile
#	server/server.c
#
echo x - Makefile
sed 's/^X//' >Makefile << 'END-of-Makefile'
XSUBDIR=	client server
X
X.include <bsd.subdir.mk>
END-of-Makefile
echo x - Makefile.inc
sed 's/^X//' >Makefile.inc << 'END-of-Makefile.inc'
XCFLAGS+=	-I${.CURDIR}/../include -Wall
END-of-Makefile.inc
echo x - README
sed 's/^X//' >README << 'END-of-README'
XTo build the test programs:  Type "make" in the directory containing
Xthis README file.
X
XTo demonstrate the bug in the TCP stack:  First type "server/server".
XDo that in a separate window, or run it in the background.  Then
Xtype "client/client".  This will start the test using the loopback
Xaddress of 127.0.0.1.  The port that is used is 5995.  You can
Xchange it in "include/looptest.h" if necessary.
X
XWhat the test does:
X
X    The server listens at port 5995 with a wildcard IP address.
X
X    The client connects to the server.
X
X    The server sends data continuously until 3 seconds go by without
X    any progress.  The goal is to fill up the channel and get into
X    the "persist" state.  While this is going on, the client just
X    sleeps without reading any data from the connection.
X
X    The server then goes into an infinite loop reading whatever
X    comes in over the connection.
X
X    The client reads 1 byte.
X
X    The client then delays for 10 seconds to allow time for the
X    server to send at least one window probe.  When the probe comes
X    in, the client accepts its 1 byte of data, because there is
X    room for it in the socket's receive buffer.  This advances
X    rcv_nxt past rcv_adv and triggers the bug.
X
X    The client then goes into a loop writing data continuously to
X    the connection.
X
XWhat you will see:  Data flow will grind to a halt fairly quickly.
XThe server will start outputting periodic messages saying, "No data
Xreceived for 10 seconds".  If you observe with "tcpdump -n -i lo0
Xport 5995" you'll see the client side doing retransmits and the
Xserver side acknowledging them.  But the acknowledgments are dropped
Xby the client side and so the retransmits continue.  If you wait
Xlong enough, the system will drop the connection.
X
XUsing other IP addresses:  You can optionally specify the IP address
Xto connect to on the client's command line, in dotted notation.
XBy default, it uses the loopback address.  It fails just as reliably
Xif you specify the "real" IP address of the host.  I think it would
Xfail between two different hosts as well, though the time delays
Xmight need to be adjusted.
X
XWARNING: With a slightly different version of this test on my
XFreeBSD-2.2 system, specifying the "real" IP address of the host
Xhung the entire system (twice) without a panic or a message of any
Xkind.  A reboot was necessary to recover from it.  This particular
Xversion of the test doesn't seem to do that -- it just fails the
Xsame as when the loopback address is used.  I still have the version
Xthat hangs the system, and I'll use it later to investigate the
Xhangs.
X
XJohn Polstra <jdp@polstra.com>
X11 June 1997
END-of-README
echo c - client
mkdir -p client > /dev/null 2>&1
echo x - client/Makefile
sed 's/^X//' >client/Makefile << 'END-of-client/Makefile'
XSRCS=	client.c
XPROG=	client
XNOMAN=	true
X
X.include <bsd.prog.mk>
END-of-client/Makefile
echo x - client/client.c
sed 's/^X//' >client/client.c << 'END-of-client/client.c'
X#include <sys/types.h>
X#include <sys/socket.h>
X#include <sys/time.h>
X#include <sys/wait.h>
X
X#include <netinet/in.h>
X#include <arpa/inet.h>
X
X#include <assert.h>
X#include <err.h>
X#include <errno.h>
X#include <fcntl.h>
X#include <stdio.h>
X#include <stdlib.h>
X#include <string.h>
X#include <unistd.h>
X
X#include "looptest.h"
X
X#define A_BUNCH	(1024*1024)
X
Xstatic void read_a_little(int fd);
Xstatic void write_a_bunch(int fd);
X
Xint
Xmain(int argc, char **argv)
X{
X    char *ipaddr = "127.0.0.1";
X    int s;
X    struct sockaddr_in peer_addr;
X
X    if (argc > 2) {
X	char *name = strrchr(argv[0], '/');
X	if (name == NULL)
X	    name = argv[0];
X	else
X	    name++;
X	errx(1, "Usage: %s [ipaddr]", name);
X    }
X
X    if (argc > 1)
X	ipaddr = argv[1];
X
X    memset(&peer_addr, 0, sizeof peer_addr);
X    peer_addr.sin_len = sizeof peer_addr;
X    peer_addr.sin_family = AF_INET;
X    peer_addr.sin_port = htons(PORT);
X    if (inet_aton(ipaddr, &peer_addr.sin_addr) != 1)
X	errx(1, "Invalid IP address \"%s\"", ipaddr);
X
X    s = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP);
X    if (s == -1)
X	err(1, "socket");
X
X    if (connect(s, (struct sockaddr *) &peer_addr, sizeof peer_addr) == -1)
X	err(1, "connect");
X
X    printf("Delaying while server fills the channel\n");
X    sleep(9);	/* Delay long enough for server to fill up channel. */
X
X    read_a_little(s);
X
X    printf("Delaying long enough for server to send a window probe\n");
X    sleep(10);
X
X    for( ; ; )
X	write_a_bunch(s);
X
X    return 0;
X}
X
Xstatic void
Xread_a_little(int fd)
X{
X    char buf[1];
X    int n;
X
X    printf("Reading a little\n");
X    n = read(fd, buf, sizeof buf);
X    if (n == -1)
X	err(1, "read");
X    if (n == 0)
X	errx(1, "unexpected EOF");
X    if (n < sizeof buf)
X	errx(1, "short read");
X}
X
Xstatic void
Xwrite_a_bunch(int fd)
X{
X    static char zeroes[8*1024];
X    int nleft = A_BUNCH;
X
X    printf("Writing a bunch\n");
X    while (nleft > 0) {
X	int n = sizeof zeroes;
X
X	if (n > nleft)
X	    n = nleft;
X	n = write(fd, zeroes, n);
X	if (n == -1)
X	    err(1, "write");
X	nleft -= n;
X    }
X}
END-of-client/client.c
echo c - include
mkdir -p include > /dev/null 2>&1
echo x - include/looptest.h
sed 's/^X//' >include/looptest.h << 'END-of-include/looptest.h'
X#ifndef LOOPTEST_H
X#define LOOPTEST_H 1
X
X#define PORT	5995
X
X#endif
END-of-include/looptest.h
echo c - server
mkdir -p server > /dev/null 2>&1
echo x - server/Makefile
sed 's/^X//' >server/Makefile << 'END-of-server/Makefile'
XSRCS=	server.c
XPROG=	server
XNOMAN=	true
X
X.include <bsd.prog.mk>
END-of-server/Makefile
echo x - server/server.c
sed 's/^X//' >server/server.c << 'END-of-server/server.c'
X#include <sys/types.h>
X#include <sys/socket.h>
X#include <sys/time.h>
X
X#include <netinet/in.h>
X
X#include <err.h>
X#include <errno.h>
X#include <fcntl.h>
X#include <stdio.h>
X#include <string.h>
X#include <unistd.h>
X
X#include "looptest.h"
X
Xstatic void fill_up_channel(int fd);
Xstatic void read_read_read(int fd);
X
Xint
Xmain(int argc, char **argv)
X{
X    int s;
X    struct sockaddr_in my_addr;
X    struct sockaddr_in peer_addr;
X    int peer_addrlen;
X    int fd;
X    int flags;
X
X    s = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP);
X    if (s == -1)
X	err(1, "socket");
X
X    memset(&my_addr, 0, sizeof my_addr);
X    my_addr.sin_len = sizeof my_addr;
X    my_addr.sin_family = AF_INET;
X    my_addr.sin_port = htons(PORT);
X    my_addr.sin_addr.s_addr = INADDR_ANY;
X    if (bind(s, (const struct sockaddr *) &my_addr, sizeof my_addr) == -1)
X	err(1, "bind");
X
X    if (listen(s, 5) == -1)
X	err(1, "listen");
X    peer_addrlen = sizeof peer_addr;
X
X    /* Wait for a connection from the client. */
X    fd = accept(s, (struct sockaddr *) &peer_addr, &peer_addrlen);
X    if (fd == -1)
X	err(1, "accept");
X    close(s);
X
X    /* Set up non-blocking I/O on the socket. */
X    flags = fcntl(fd, F_GETFL, 0);
X    if (flags == -1)
X	err(1, "fcntl(F_GETFL)");
X    if (fcntl(fd, F_SETFL, flags | O_NONBLOCK) == -1)
X	err(1, "fcntl(F_SETFL)");
X
X    fill_up_channel(fd);
X
X    read_read_read(fd);
X
X    return 0;
X}
X
X/*
X * Fill up the channel by sending data continuously until 3 seconds goes by
X * without any progress.
X */
Xstatic void
Xfill_up_channel(int fd)
X{
X    fd_set wfds;
X    struct timeval t;
X    int n;
X    static char zeroes[8*1024];
X
X    printf("Filling up the channel to client\n");
X    FD_ZERO(&wfds);
X    for ( ; ; ) {
X	/* Wait until we can write some more, up to a limit of 3 seconds. */
X	FD_SET(fd, &wfds);
X	t.tv_sec = 3;
X	t.tv_usec = 0;
X	n = select(fd+1, NULL, &wfds, NULL, &t);
X	if (n == -1)
X	    err(1, "select");
X	if (n == 0)	/* Timed out. */
X	    break;
X	if (FD_ISSET(fd, &wfds)) {
X	    /* Write as much as possible. */
X	    do
X		n = write(fd, zeroes, sizeof zeroes);
X	    while (n > 0);
X	    if (n == -1 && errno != EAGAIN)
X		err(1, "write");
X	}
X    }
X}
X
X/*
X * Read data continuously and discard it.
X */
Xstatic void
Xread_read_read(int fd)
X{
X    fd_set rfds;
X    struct timeval t;
X    char buf[8*1024];
X    int n;
X
X    printf("Reading continuously\n");
X    FD_ZERO(&rfds);
X    for ( ; ; ) {
X	/* Wait until we can read some more, up to a limit of 10 seconds. */
X	FD_SET(fd, &rfds);
X	t.tv_sec = 10;
X	t.tv_usec = 0;
X	n = select(fd+1, &rfds, NULL, NULL, &t);
X	if (n == -1)
X	    err(1, "select");
X	if (n == 0) {	/* Timed out. */
X	    printf("No data received for 10 seconds\n");
X	    continue;
X	}
X	if (FD_ISSET(fd, &rfds)) {
X	    /* Read as much as possible. */
X	    for ( ; ; ) {
X		n = read(fd, buf, sizeof buf);
X		if (n == -1) {
X		    if (errno == EAGAIN)
X			break;
X		    err(1, "read");
X		}
X		if (n == 0)		/* EOF */
X		    errx(1, "unexpected EOF");
X	    }
X	}
X    }
X}
END-of-server/server.c
exit

>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199707010511.WAA28279>