Date: Mon, 29 Jan 1996 17:25:26 -0800 From: Jerry Chen <chen@ipsilon.com> To: bugs@FreeBSD.org Subject: a TCP bug in FreeBSD 2.1? Message-ID: <199601300125.RAA00660@relay.ipsilon.com>
next in thread | raw e-mail | index | archive | help
In tcp_output() of FreeBSD 2.1 559 /* 560 * Calculate receive window. Don't shrink window, 561 * but avoid silly window syndrome. 562 */ 563 if (win < (long)(so->so_rcv.sb_hiwat / 4) && win < (long)tp->t_maxseg) 564 win = 0; 565 if (win > (long)TCP_MAXWIN << tp->rcv_scale) 566 win = (long)TCP_MAXWIN << tp->rcv_scale; 567 if (win < (long)(tp->rcv_adv - tp->rcv_nxt)) 568 win = (long)(tp->rcv_adv - tp->rcv_nxt); 569 ti->ti_win = htons((u_short) (win>>tp->rcv_scale)); It seems to me there is a bug. To trigger it, the application has to set the recv window to 64 K bytes. The symptom is that the first time you run a test such as ttcp, it is okay. However, the second time and later when you run the same test, the recv window on the receiving side will be 0 during the 3 way handshaking (connection setup). The xmit side will not be able to xmit any data and has to wait for about 5 seconds. When the persist timer expires, the xmit side will probe by sending 1 byte data. This will cause the recv window on the receiving side to be 65535 bytes and then everything is fine. But we lose 5 seconds already and this hurts performance. Why does the recv side advertise the 0 recv window? Because the value for win is 64K in line 568 during the connection setup. In line 569, 64k becomes 0 during the long to u_short conversion. In line 566, win is set to 65535. The first time we run it, (tp->rcv_adv - tp->rcv_nxt) will be 0 during the connection setup. The second time and later when we run the same test, it will be 64K when TCP is sending out the SYN and ACK. That is why the problem does not show up when we run the test for the first time. What causes the difference? It comes from the code in tcp_input() for transaction TCP: 678 if ((to.to_flag & TOF_CC) != 0) { 679 if (taop->tao_cc != 0 && CC_GT(to.to_cc, taop->tao_cc)) { 680 taop->tao_cc = to.to_cc; 681 tp->t_state = TCPS_ESTABLISHED; 682 683 /* 684 * If there is a FIN, or if there is data and the 685 * connection is local, then delay SYN,ACK(SYN) in 686 * the hope of piggy-backing it on a response 687 * segment. Otherwise must send ACK now in case 688 * the other side is slow starting. 689 */ 690 if ((tiflags & TH_FIN) || (ti->ti_len != 0 && 691 in_localaddr(inp->inp_faddr))) 692 tp->t_flags |= (TF_DELACK | TF_NEEDSYN); 693 else 694 tp->t_flags |= (TF_ACKNOW | TF_NEEDSYN); 695 tp->rcv_adv += tp->rcv_wnd; The above code is executed when tao_cc is non-zero. The first time the test is run, tao_cc is 0. So, TCP behaves differently between the first time and later times. How should we fix the bug? I think we should swap line 565-566 with line 567-568 so it becomes the following: 565 if (win < (long)(tp->rcv_adv - tp->rcv_nxt)) 566 win = (long)(tp->rcv_adv - tp->rcv_nxt); 567 if (win > (long)TCP_MAXWIN << tp->rcv_scale) 568 win = (long)TCP_MAXWIN << tp->rcv_scale; 569 ti->ti_win = htons((u_short) (win>>tp->rcv_scale)); I never considered myself an TCP expert. Please correct me if I am wrong. Thanks. Jerry
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199601300125.RAA00660>