Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 10 Apr 2010 11:05:32 GMT
From:      Richard Scheffenegger <rs@netapp.com>
To:        freebsd-gnats-submit@FreeBSD.org
Subject:   kern/145600: TCP/ECN behaves different to CE/CWR than ns2 reference
Message-ID:  <201004101105.o3AB5Wnc008859@www.freebsd.org>
Resent-Message-ID: <201004101110.o3ABA274029323@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         145600
>Category:       kern
>Synopsis:       TCP/ECN behaves different to CE/CWR than ns2 reference
>Confidential:   no
>Severity:       serious
>Priority:       low
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat Apr 10 11:10:02 UTC 2010
>Closed-Date:
>Last-Modified:
>Originator:     Richard Scheffenegger
>Release:        8.0-RC2
>Organization:
NetApp
>Environment:
FreeBSD rsFreeBSD.vie.demo 8.0-RC2 FreeBSD 8.0-RC2 #16: Sat Nov 14 22:25:28 CET 2009 root@rsFreeBSD.vie.demo:/usr/obj/usr/src/sys/GENERIC i386
>Description:
Used TBIT (www.icir.org/tbit/) to verify the RFC3168 compliance of the FreeBSD
ECN implementiation.

First, the stock IP stack filers out the CE and ECT(1) codepoints - even when TCP negotiates successfully for ECN.

This can be seen by "netstat -sp tcp":

>How-To-Repeat:
Run NewECN test of TBIT against an ECN activated freebsd stack (with deactivated CE/ECT(1) filtering in the IP stack).

SImilar, injecting a CWR/CE data segment into an ECN-enabled TCP stream can demonstrate the problem.
>Fix:
rsFreeBSD# diff -U 10 netinet.orig/tcp_input.c netinet/tcp_input.c
--- netinet.orig/tcp_input.c    2009-10-25 02:10:29.000000000 +0100
+++ netinet/tcp_input.c 2010-04-10 10:30:12.000000000 +0200
@@ -1127,36 +1127,37 @@
        /*
         * Unscale the window into a 32-bit value.
         * For the SYN_SENT state the scale is zero.
         */
        tiwin = th->th_win << tp->snd_scale;

        /*
         * TCP ECN processing.
         */
        if (tp->t_flags & TF_ECN_PERMIT) {
+
+               if (thflags & TH_CWR)
+                        tp->t_flags &= ~TF_ECN_SND_ECE;
+
                switch (iptos & IPTOS_ECN_MASK) {
                case IPTOS_ECN_CE:
                        tp->t_flags |= TF_ECN_SND_ECE;
                        TCPSTAT_INC(tcps_ecn_ce);
                        break;
                case IPTOS_ECN_ECT0:
                        TCPSTAT_INC(tcps_ecn_ect0);
                        break;
                case IPTOS_ECN_ECT1:
                        TCPSTAT_INC(tcps_ecn_ect1);
                        break;
                }

-               if (thflags & TH_CWR)
-                       tp->t_flags &= ~TF_ECN_SND_ECE;
-
                /*
                 * Congestion experienced.
                 * Ignore if we are already trying to recover.
                 */
                if ((thflags & TH_ECE) &&
                    SEQ_LEQ(th->th_ack, tp->snd_recover)) {
                        TCPSTAT_INC(tcps_ecn_rcwnd);
                        tcp_congestion_exp(tp);
                }
        }


Patch attached with submission follows:

rsFreeBSD# diff -U 10 netinet.orig/tcp_input.c netinet/tcp_input.c
--- netinet.orig/tcp_input.c    2009-10-25 02:10:29.000000000 +0100
+++ netinet/tcp_input.c 2010-04-10 10:30:12.000000000 +0200
@@ -1127,36 +1127,37 @@
        /*
         * Unscale the window into a 32-bit value.
         * For the SYN_SENT state the scale is zero.
         */
        tiwin = th->th_win << tp->snd_scale;

        /*
         * TCP ECN processing.
         */
        if (tp->t_flags & TF_ECN_PERMIT) {
+
+               if (thflags & TH_CWR)
+                        tp->t_flags &= ~TF_ECN_SND_ECE;
+
                switch (iptos & IPTOS_ECN_MASK) {
                case IPTOS_ECN_CE:
                        tp->t_flags |= TF_ECN_SND_ECE;
                        TCPSTAT_INC(tcps_ecn_ce);
                        break;
                case IPTOS_ECN_ECT0:
                        TCPSTAT_INC(tcps_ecn_ect0);
                        break;
                case IPTOS_ECN_ECT1:
                        TCPSTAT_INC(tcps_ecn_ect1);
                        break;
                }

-               if (thflags & TH_CWR)
-                       tp->t_flags &= ~TF_ECN_SND_ECE;
-
                /*
                 * Congestion experienced.
                 * Ignore if we are already trying to recover.
                 */
                if ((thflags & TH_ECE) &&
                    SEQ_LEQ(th->th_ack, tp->snd_recover)) {
                        TCPSTAT_INC(tcps_ecn_rcwnd);
                        tcp_congestion_exp(tp);
                }
        }


>Release-Note:
>Audit-Trail:
>Unformatted:
 >>      0 packets with ECN CE bit set
         154704 packets with ECN ECT(0) bit set
 >>      0 packets with ECN ECT(1) bit set
         7 successful ECN handshakes
         0 times ECN reduced the congestion window
 
 these two lines will stay zero even when a CE / ECT(1) frame is deliberately sent (as is the case in the TBIT NewECN test).
 
 After activating PF (with red scheduling), these two codepoints get delivered by IP to TCP.
 
 Further testing revealed, that another test case - the behavior when a CE/CWD segment is received, deviates from the reference implementation in ns2 by the original authors of RFC3168.
 
 Reference implementation (from tcp-sink.cc in ns2):
 
 
 	if ( (sf != 0 && sf->cong_action()) || of->cong_action() ) 
 		// Sender has responsed to congestion. 
 		acker_->update_ecn_unacked(0);
 	if ( (sf != 0 && sf->ect() && sf->ce())  || 
 			(of->ect() && of->ce()) )
 		// New report of congestion.  
 		acker_->update_ecn_unacked(1);
 	if ( (sf != 0 && sf->ect()) || of->ect() )
 		// Set EcnEcho bit.  
 		nf->ecnecho() = acker_->ecn_unacked();
 
 Basically, CWR is checked first, and ECE cleared; if that segment also contains the CE codepoint again, ECE is set anew.
 
 Implementation in tcp_input.c:
 
         /*
          * TCP ECN processing.
          */
         if (tp->t_flags & TF_ECN_PERMIT) {
                 switch (iptos & IPTOS_ECN_MASK) {
                 case IPTOS_ECN_CE:
                         tp->t_flags |= TF_ECN_SND_ECE;
                         TCPSTAT_INC(tcps_ecn_ce);
                         break;
                 case IPTOS_ECN_ECT0:
                         TCPSTAT_INC(tcps_ecn_ect0);
                         break;
                 case IPTOS_ECN_ECT1:
                         TCPSTAT_INC(tcps_ecn_ect1);
                         break;
                 }
 
                 if (thflags & TH_CWR)
                         tp->t_flags &= ~TF_ECN_SND_ECE;
  
                 /*
                  * Congestion experienced.
                  * Ignore if we are already trying to recover.
                  */
                 if ((thflags & TH_ECE) &&
                     SEQ_LEQ(th->th_ack, tp->snd_recover)) {
                         TCPSTAT_INC(tcps_ecn_rcwnd);
                         tcp_congestion_exp(tp);
                 }
         }
 
 Here, CE is checked first, and CWR later - resulting in a normal ACK returned for a CE/CWR, instead of a ECE/ACK.
 
 
 This issue is problematic, as it precludes further (currently investigated) enhancements of ECN senders to react more informed about ECNs.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201004101105.o3AB5Wnc008859>