Date: Mon, 29 Jan 1996 17:25:26 -0800 From: Jerry Chen <chen@ipsilon.com> To: bugs@FreeBSD.org Subject: a TCP bug in FreeBSD 2.1? Message-ID: <199601300125.RAA00660@relay.ipsilon.com>
index | next in thread | raw e-mail
In tcp_output() of FreeBSD 2.1
559 /*
560 * Calculate receive window. Don't shrink window,
561 * but avoid silly window syndrome.
562 */
563 if (win < (long)(so->so_rcv.sb_hiwat / 4) && win < (long)tp->t_maxseg)
564 win = 0;
565 if (win > (long)TCP_MAXWIN << tp->rcv_scale)
566 win = (long)TCP_MAXWIN << tp->rcv_scale;
567 if (win < (long)(tp->rcv_adv - tp->rcv_nxt))
568 win = (long)(tp->rcv_adv - tp->rcv_nxt);
569 ti->ti_win = htons((u_short) (win>>tp->rcv_scale));
It seems to me there is a bug. To trigger it, the application has to set
the recv window to 64 K bytes. The symptom is that the first time you
run a test such as ttcp, it is okay. However, the second time and later
when you run the same test, the recv window on the receiving side will
be 0 during the 3 way handshaking (connection setup). The xmit side will
not be able to xmit any data and has to wait for about 5 seconds. When
the persist timer expires, the xmit side will probe by sending 1 byte data.
This will cause the recv window on the receiving side to be 65535 bytes
and then everything is fine. But we lose 5 seconds already and this hurts
performance.
Why does the recv side advertise the 0 recv window? Because the value
for win is 64K in line 568 during the connection setup. In line 569, 64k
becomes 0 during the long to u_short conversion.
In line 566, win is set to 65535.
The first time we run it, (tp->rcv_adv - tp->rcv_nxt) will be 0 during
the connection setup. The second time and later when we run the same
test, it will be 64K when TCP is sending out the SYN and ACK. That is
why the problem does not show up when we run the test for the first time.
What causes the difference? It comes from the code in tcp_input() for
transaction TCP:
678 if ((to.to_flag & TOF_CC) != 0) {
679 if (taop->tao_cc != 0 && CC_GT(to.to_cc, taop->tao_cc)) {
680 taop->tao_cc = to.to_cc;
681 tp->t_state = TCPS_ESTABLISHED;
682
683 /*
684 * If there is a FIN, or if there is data and the
685 * connection is local, then delay SYN,ACK(SYN) in
686 * the hope of piggy-backing it on a response
687 * segment. Otherwise must send ACK now in case
688 * the other side is slow starting.
689 */
690 if ((tiflags & TH_FIN) || (ti->ti_len != 0 &&
691 in_localaddr(inp->inp_faddr)))
692 tp->t_flags |= (TF_DELACK | TF_NEEDSYN);
693 else
694 tp->t_flags |= (TF_ACKNOW | TF_NEEDSYN);
695 tp->rcv_adv += tp->rcv_wnd;
The above code is executed when tao_cc is non-zero. The first time the
test is run, tao_cc is 0. So, TCP behaves differently between the first
time and later times.
How should we fix the bug? I think we should swap line 565-566 with line
567-568 so it becomes the following:
565 if (win < (long)(tp->rcv_adv - tp->rcv_nxt))
566 win = (long)(tp->rcv_adv - tp->rcv_nxt);
567 if (win > (long)TCP_MAXWIN << tp->rcv_scale)
568 win = (long)TCP_MAXWIN << tp->rcv_scale;
569 ti->ti_win = htons((u_short) (win>>tp->rcv_scale));
I never considered myself an TCP expert. Please correct me if I am wrong.
Thanks.
Jerry
help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199601300125.RAA00660>
