From owner-freebsd-bugs@FreeBSD.ORG Sat Apr 10 11:10:02 2010 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F1E6C1065673 for ; Sat, 10 Apr 2010 11:10:02 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id B43058FC16 for ; Sat, 10 Apr 2010 11:10:02 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id o3ABA2S8029324 for ; Sat, 10 Apr 2010 11:10:02 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id o3ABA274029323; Sat, 10 Apr 2010 11:10:02 GMT (envelope-from gnats) Resent-Date: Sat, 10 Apr 2010 11:10:02 GMT Resent-Message-Id: <201004101110.o3ABA274029323@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Richard Scheffenegger Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B9D72106564A for ; Sat, 10 Apr 2010 11:05:33 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21]) by mx1.freebsd.org (Postfix) with ESMTP id A832E8FC08 for ; Sat, 10 Apr 2010 11:05:33 +0000 (UTC) Received: from www.freebsd.org (localhost [127.0.0.1]) by www.freebsd.org (8.14.3/8.14.3) with ESMTP id o3AB5WRb008860 for ; Sat, 10 Apr 2010 11:05:32 GMT (envelope-from nobody@www.freebsd.org) Received: (from nobody@localhost) by www.freebsd.org (8.14.3/8.14.3/Submit) id o3AB5Wnc008859; Sat, 10 Apr 2010 11:05:32 GMT (envelope-from nobody) Message-Id: <201004101105.o3AB5Wnc008859@www.freebsd.org> Date: Sat, 10 Apr 2010 11:05:32 GMT From: Richard Scheffenegger To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-3.1 Cc: Subject: kern/145600: TCP/ECN behaves different to CE/CWR than ns2 reference X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 10 Apr 2010 11:10:03 -0000 >Number: 145600 >Category: kern >Synopsis: TCP/ECN behaves different to CE/CWR than ns2 reference >Confidential: no >Severity: serious >Priority: low >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Sat Apr 10 11:10:02 UTC 2010 >Closed-Date: >Last-Modified: >Originator: Richard Scheffenegger >Release: 8.0-RC2 >Organization: NetApp >Environment: FreeBSD rsFreeBSD.vie.demo 8.0-RC2 FreeBSD 8.0-RC2 #16: Sat Nov 14 22:25:28 CET 2009 root@rsFreeBSD.vie.demo:/usr/obj/usr/src/sys/GENERIC i386 >Description: Used TBIT (www.icir.org/tbit/) to verify the RFC3168 compliance of the FreeBSD ECN implementiation. First, the stock IP stack filers out the CE and ECT(1) codepoints - even when TCP negotiates successfully for ECN. This can be seen by "netstat -sp tcp": >How-To-Repeat: Run NewECN test of TBIT against an ECN activated freebsd stack (with deactivated CE/ECT(1) filtering in the IP stack). SImilar, injecting a CWR/CE data segment into an ECN-enabled TCP stream can demonstrate the problem. >Fix: rsFreeBSD# diff -U 10 netinet.orig/tcp_input.c netinet/tcp_input.c --- netinet.orig/tcp_input.c 2009-10-25 02:10:29.000000000 +0100 +++ netinet/tcp_input.c 2010-04-10 10:30:12.000000000 +0200 @@ -1127,36 +1127,37 @@ /* * Unscale the window into a 32-bit value. * For the SYN_SENT state the scale is zero. */ tiwin = th->th_win << tp->snd_scale; /* * TCP ECN processing. */ if (tp->t_flags & TF_ECN_PERMIT) { + + if (thflags & TH_CWR) + tp->t_flags &= ~TF_ECN_SND_ECE; + switch (iptos & IPTOS_ECN_MASK) { case IPTOS_ECN_CE: tp->t_flags |= TF_ECN_SND_ECE; TCPSTAT_INC(tcps_ecn_ce); break; case IPTOS_ECN_ECT0: TCPSTAT_INC(tcps_ecn_ect0); break; case IPTOS_ECN_ECT1: TCPSTAT_INC(tcps_ecn_ect1); break; } - if (thflags & TH_CWR) - tp->t_flags &= ~TF_ECN_SND_ECE; - /* * Congestion experienced. * Ignore if we are already trying to recover. */ if ((thflags & TH_ECE) && SEQ_LEQ(th->th_ack, tp->snd_recover)) { TCPSTAT_INC(tcps_ecn_rcwnd); tcp_congestion_exp(tp); } } Patch attached with submission follows: rsFreeBSD# diff -U 10 netinet.orig/tcp_input.c netinet/tcp_input.c --- netinet.orig/tcp_input.c 2009-10-25 02:10:29.000000000 +0100 +++ netinet/tcp_input.c 2010-04-10 10:30:12.000000000 +0200 @@ -1127,36 +1127,37 @@ /* * Unscale the window into a 32-bit value. * For the SYN_SENT state the scale is zero. */ tiwin = th->th_win << tp->snd_scale; /* * TCP ECN processing. */ if (tp->t_flags & TF_ECN_PERMIT) { + + if (thflags & TH_CWR) + tp->t_flags &= ~TF_ECN_SND_ECE; + switch (iptos & IPTOS_ECN_MASK) { case IPTOS_ECN_CE: tp->t_flags |= TF_ECN_SND_ECE; TCPSTAT_INC(tcps_ecn_ce); break; case IPTOS_ECN_ECT0: TCPSTAT_INC(tcps_ecn_ect0); break; case IPTOS_ECN_ECT1: TCPSTAT_INC(tcps_ecn_ect1); break; } - if (thflags & TH_CWR) - tp->t_flags &= ~TF_ECN_SND_ECE; - /* * Congestion experienced. * Ignore if we are already trying to recover. */ if ((thflags & TH_ECE) && SEQ_LEQ(th->th_ack, tp->snd_recover)) { TCPSTAT_INC(tcps_ecn_rcwnd); tcp_congestion_exp(tp); } } >Release-Note: >Audit-Trail: >Unformatted: >> 0 packets with ECN CE bit set 154704 packets with ECN ECT(0) bit set >> 0 packets with ECN ECT(1) bit set 7 successful ECN handshakes 0 times ECN reduced the congestion window these two lines will stay zero even when a CE / ECT(1) frame is deliberately sent (as is the case in the TBIT NewECN test). After activating PF (with red scheduling), these two codepoints get delivered by IP to TCP. Further testing revealed, that another test case - the behavior when a CE/CWD segment is received, deviates from the reference implementation in ns2 by the original authors of RFC3168. Reference implementation (from tcp-sink.cc in ns2): if ( (sf != 0 && sf->cong_action()) || of->cong_action() ) // Sender has responsed to congestion. acker_->update_ecn_unacked(0); if ( (sf != 0 && sf->ect() && sf->ce()) || (of->ect() && of->ce()) ) // New report of congestion. acker_->update_ecn_unacked(1); if ( (sf != 0 && sf->ect()) || of->ect() ) // Set EcnEcho bit. nf->ecnecho() = acker_->ecn_unacked(); Basically, CWR is checked first, and ECE cleared; if that segment also contains the CE codepoint again, ECE is set anew. Implementation in tcp_input.c: /* * TCP ECN processing. */ if (tp->t_flags & TF_ECN_PERMIT) { switch (iptos & IPTOS_ECN_MASK) { case IPTOS_ECN_CE: tp->t_flags |= TF_ECN_SND_ECE; TCPSTAT_INC(tcps_ecn_ce); break; case IPTOS_ECN_ECT0: TCPSTAT_INC(tcps_ecn_ect0); break; case IPTOS_ECN_ECT1: TCPSTAT_INC(tcps_ecn_ect1); break; } if (thflags & TH_CWR) tp->t_flags &= ~TF_ECN_SND_ECE; /* * Congestion experienced. * Ignore if we are already trying to recover. */ if ((thflags & TH_ECE) && SEQ_LEQ(th->th_ack, tp->snd_recover)) { TCPSTAT_INC(tcps_ecn_rcwnd); tcp_congestion_exp(tp); } } Here, CE is checked first, and CWR later - resulting in a normal ACK returned for a CE/CWR, instead of a ECE/ACK. This issue is problematic, as it precludes further (currently investigated) enhancements of ECN senders to react more informed about ECNs.