From owner-freebsd-bugs Tue May 27 18:40:59 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.5/8.8.5) id SAA14875 for bugs-outgoing; Tue, 27 May 1997 18:40:59 -0700 (PDT) Received: from mailhub.Stanford.EDU (mailhub.Stanford.EDU [36.21.0.128]) by hub.freebsd.org (8.8.5/8.8.5) with ESMTP id SAA14853; Tue, 27 May 1997 18:40:53 -0700 (PDT) Received: from tree1.Stanford.EDU (tree1.Stanford.EDU [36.83.0.36]) by mailhub.Stanford.EDU (8.8.5/8.8.5/L) with SMTP id SAA03517; Tue, 27 May 1997 18:40:51 -0700 (PDT) Newsgroups: comp.protocols.tcp-ip Date: Tue, 27 May 1997 18:40:08 -0700 (PDT) From: "Amr A. Awadallah" To: freebsd-bugs@FreeBSD.org cc: freebsd-hackers@FreeBSD.org, Chetan Rai , Nick W McKeown Subject: Bug in FreeBSD TCP Stack: False slow-start on duplicate ACKs Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-bugs@FreeBSD.org X-Loop: FreeBSD.org Precedence: bulk Hi, We suspect we've found a bug in the TCP kernel that we are working with now (FreeBSD 2.1.6-RELEASE #32). The bug also exists in the TCP Illustrated Vol 2 page 973 (last paragraph), and in the online source for the current release of FreeBSD. We don't know if somebody has already reported the bug. The bug is as follows: after fast retransmission the current implementation increases cwnd by 1 segment for each duplicate ACK that comes in (following the original 3 dup-ACKs that caused the fast retransmission and recovery to be invoked). But if fast recovery is activated, cwnd is already above ssthresh (in fact it is ssthresh + 3 ) hence cwnd should be increased by one only for each window (congestion avoidance) rather than for each ACK (slow-start). We have traces of cwnd against time that show this bug explicitly. The bug appears as a burst of back-to-back segments and a sudden increase in the window size. We note that incrementing cwnd by 1 for each dup-ACK would be correct if fast recovery was not implemented (as in Tahoe). This is because without fast recovery the cwnd value would be dropped to 1 anyway, and doing slow-start (increment cwnd by 1 for each ACK) would be correct. But fast recovery is a part of Reno (4.3BSD onwards). Therefore the bug was probably caused by an oversight when fast recovery was added to the TCP kernel. We further stress that this bug adversely affects TCP and network performance. When TCP enters fast recovery, there has been a packet loss (probably due to congestion), and using slow start at this point leads to further congestion (given that cwnd is above ssthresh already). We checked the online source for the current release of FreeBSD at: ftp://ftp.FreeBSD.ORG/pub/FreeBSD/FreeBSD-current/src/sys/netinet/tcp_input.c and the bug is still there. The problem is in tcp_input.c (line 1284), function tcp_input(): } else if (tp->t_dupacks > tcprexmtthresh) { tp->snd_cwnd += tp->t_maxseg; (void) tcp_output(tp); goto drop; } Notice that a further problem is that cwnd is not even checked against TCP_MAXWIN. Thus for a large window with lots of duplicate ACKs cwnd may exceed TCP_MAXWIN ! The suggested fix for the bug is as follows: } else if (tp->t_dupacks > tcprexmtthresh) { tp->snd_cwnd = min( tp->snd_cwnd + (tp->t_maxseg * tp->t_maxseg / tp->snd_cwnd), TCP_MAXWIN<snd_scale ); (void) tcp_output(tp); goto drop; } We would truly appreciate it if somebody could tell us if this bug has been already reported. Thanks in advance, Amr A. Awadallah Chetan Rai ------------------------------------------------------------------------ Amr A. Awadallah, PhD Student, Computer Systems Laboratory, Electrical Engineering, Stanford University. For more info please refer to "http://www-leland.stanford.edu/~aaa"