Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 29 Jan 1996 17:25:26 -0800
From:      Jerry Chen <chen@ipsilon.com>
To:        bugs@FreeBSD.org
Subject:   a TCP bug in FreeBSD 2.1?
Message-ID:  <199601300125.RAA00660@relay.ipsilon.com>

next in thread | raw e-mail | index | archive | help
In tcp_output() of FreeBSD 2.1

559     /*
560      * Calculate receive window.  Don't shrink window,
561      * but avoid silly window syndrome.
562      */
563     if (win < (long)(so->so_rcv.sb_hiwat / 4) && win < (long)tp->t_maxseg)  
564             win = 0;
565     if (win > (long)TCP_MAXWIN << tp->rcv_scale)
566             win = (long)TCP_MAXWIN << tp->rcv_scale;
567     if (win < (long)(tp->rcv_adv - tp->rcv_nxt))
568             win = (long)(tp->rcv_adv - tp->rcv_nxt);
569     ti->ti_win = htons((u_short) (win>>tp->rcv_scale));

It seems to me there is a bug.  To trigger it, the application has to set
the recv window to 64 K bytes.  The symptom is that the first time you
run a test such as ttcp, it is okay.  However, the second time and later
when you run the same test, the recv window on the receiving side will 
be 0 during the 3 way handshaking (connection setup).  The xmit side will 
not be able to xmit any data and has to wait for about 5 seconds.  When 
the persist timer expires, the xmit side will probe by sending 1 byte data.
This will cause the recv window on the receiving side to be 65535 bytes 
and then everything is fine.  But we lose 5 seconds already and this hurts 
performance.

Why does the recv side advertise the 0 recv window?  Because the value
for win is 64K in line 568 during the connection setup.  In line 569, 64k 
becomes 0 during the long to u_short conversion.  

In line 566, win is set to 65535.

The first time we run it, (tp->rcv_adv - tp->rcv_nxt) will be 0 during
the connection setup.  The second time and later when we run the same
test, it will be 64K when TCP is sending out the SYN and ACK.  That is 
why the problem does not show up when we run the test for the first time.  
What causes the difference?  It comes from the code in tcp_input() for 
transaction TCP:

678         if ((to.to_flag & TOF_CC) != 0) {
679             if (taop->tao_cc != 0 && CC_GT(to.to_cc, taop->tao_cc)) {
680                 taop->tao_cc = to.to_cc;
681                 tp->t_state = TCPS_ESTABLISHED;
682
683                 /*
684                  * If there is a FIN, or if there is data and the
685                  * connection is local, then delay SYN,ACK(SYN) in
686                  * the hope of piggy-backing it on a response
687                  * segment.  Otherwise must send ACK now in case
688                  * the other side is slow starting.
689                  */
690                 if ((tiflags & TH_FIN) || (ti->ti_len != 0 &&
691                     in_localaddr(inp->inp_faddr)))
692                         tp->t_flags |= (TF_DELACK | TF_NEEDSYN);
693                 else
694                         tp->t_flags |= (TF_ACKNOW | TF_NEEDSYN);
695                 tp->rcv_adv += tp->rcv_wnd;

The above code is executed when tao_cc is non-zero.  The first time the
test is run, tao_cc is 0.  So, TCP behaves differently between the first
time and later times.

How should we fix the bug?  I think we should swap line 565-566 with line
567-568 so it becomes the following:

565     if (win < (long)(tp->rcv_adv - tp->rcv_nxt))
566             win = (long)(tp->rcv_adv - tp->rcv_nxt);
567     if (win > (long)TCP_MAXWIN << tp->rcv_scale)
568             win = (long)TCP_MAXWIN << tp->rcv_scale;
569     ti->ti_win = htons((u_short) (win>>tp->rcv_scale));

I never considered myself an TCP expert.  Please correct me if I am wrong.
Thanks.

Jerry 




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199601300125.RAA00660>