From owner-svn-src-stable@freebsd.org Mon Mar 5 06:52:28 2018 Return-Path: Delivered-To: svn-src-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 034B0F362D0; Mon, 5 Mar 2018 06:52:28 +0000 (UTC) (envelope-from eadler@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id A721D86866; Mon, 5 Mar 2018 06:52:27 +0000 (UTC) (envelope-from eadler@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id A1EF2208BB; Mon, 5 Mar 2018 06:52:27 +0000 (UTC) (envelope-from eadler@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id w256qRbk071214; Mon, 5 Mar 2018 06:52:27 GMT (envelope-from eadler@FreeBSD.org) Received: (from eadler@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id w256qQvu071205; Mon, 5 Mar 2018 06:52:26 GMT (envelope-from eadler@FreeBSD.org) Message-Id: <201803050652.w256qQvu071205@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: eadler set sender to eadler@FreeBSD.org using -f From: Eitan Adler Date: Mon, 5 Mar 2018 06:52:26 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-stable@freebsd.org, svn-src-stable-11@freebsd.org Subject: svn commit: r330445 - in stable/11/sys/netinet: . cc tcp_stacks X-SVN-Group: stable-11 X-SVN-Commit-Author: eadler X-SVN-Commit-Paths: in stable/11/sys/netinet: . cc tcp_stacks X-SVN-Commit-Revision: 330445 X-SVN-Commit-Repository: base MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-stable@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: SVN commit messages for all the -stable branches of the src tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 05 Mar 2018 06:52:28 -0000 Author: eadler Date: Mon Mar 5 06:52:26 2018 New Revision: 330445 URL: https://svnweb.freebsd.org/changeset/base/330445 Log: MFC r307901,r308180: FreeBSD tcp stack used to inform respective congestion control module about the loss event but not use or obay the recommendations i.e. values set by it in some cases. Here is an attempt to solve that confusion by following relevant RFCs/drafts. Stack only sets congestion window/slow start threshold values when there is no CC module availalbe to take that action. All CC modules are inspected and updated when needed to take appropriate action on loss. tcp_stacks/fastpath module has been updated to adapt these changes. Note: Probably, the most significant change would be to not bring congestion window down to 1MSS on a loss signaled by 3-duplicate acks and letting respective CC decide that value. Modified: stable/11/sys/netinet/cc/cc_cdg.c stable/11/sys/netinet/cc/cc_chd.c stable/11/sys/netinet/cc/cc_cubic.c stable/11/sys/netinet/cc/cc_dctcp.c stable/11/sys/netinet/cc/cc_htcp.c stable/11/sys/netinet/cc/cc_newreno.c stable/11/sys/netinet/tcp_input.c stable/11/sys/netinet/tcp_stacks/fastpath.c Directory Properties: stable/11/ (props changed) Modified: stable/11/sys/netinet/cc/cc_cdg.c ============================================================================== --- stable/11/sys/netinet/cc/cc_cdg.c Mon Mar 5 06:47:28 2018 (r330444) +++ stable/11/sys/netinet/cc/cc_cdg.c Mon Mar 5 06:52:26 2018 (r330445) @@ -431,7 +431,12 @@ static void cdg_cong_signal(struct cc_var *ccv, uint32_t signal_type) { struct cdg *cdg_data = ccv->cc_data; + uint32_t cwin; + u_int mss; + cwin = CCV(ccv, snd_cwnd); + mss = CCV(ccv, t_maxseg); + switch(signal_type) { case CC_CDG_DELAY: CCV(ccv, snd_ssthresh) = cdg_window_decrease(ccv, @@ -448,7 +453,7 @@ cdg_cong_signal(struct cc_var *ccv, uint32_t signal_ty */ if (IN_CONGRECOVERY(CCV(ccv, t_flags)) || cdg_data->queue_state < CDG_Q_FULL) { - CCV(ccv, snd_ssthresh) = CCV(ccv, snd_cwnd); + CCV(ccv, snd_ssthresh) = cwin; CCV(ccv, snd_recover) = CCV(ccv, snd_max); } else { /* @@ -461,12 +466,18 @@ cdg_cong_signal(struct cc_var *ccv, uint32_t signal_ty cdg_data->shadow_w, RENO_BETA); CCV(ccv, snd_ssthresh) = ulmax(cdg_data->shadow_w, - cdg_window_decrease(ccv, CCV(ccv, snd_cwnd), - V_cdg_beta_loss)); + cdg_window_decrease(ccv, cwin, V_cdg_beta_loss)); + CCV(ccv, snd_cwnd) = CCV(ccv, snd_ssthresh); cdg_data->window_incr = cdg_data->rtt_count = 0; } ENTER_RECOVERY(CCV(ccv, t_flags)); + break; + case CC_RTO: + CCV(ccv, snd_ssthresh) = + max((CCV(ccv, snd_max) - CCV(ccv, snd_una)) / 2 / mss, 2) + * mss; + CCV(ccv, snd_cwnd) = mss; break; default: newreno_cc_algo.cong_signal(ccv, signal_type); Modified: stable/11/sys/netinet/cc/cc_chd.c ============================================================================== --- stable/11/sys/netinet/cc/cc_chd.c Mon Mar 5 06:47:28 2018 (r330444) +++ stable/11/sys/netinet/cc/cc_chd.c Mon Mar 5 06:52:26 2018 (r330445) @@ -330,10 +330,12 @@ chd_cong_signal(struct cc_var *ccv, uint32_t signal_ty struct ertt *e_t; struct chd *chd_data; int qdly; + u_int mss; e_t = khelp_get_osd(CCV(ccv, osd), ertt_id); chd_data = ccv->cc_data; qdly = imax(e_t->rtt, chd_data->maxrtt_in_rtt) - e_t->minrtt; + mss = CCV(ccv, t_maxseg); switch(signal_type) { case CC_CHD_DELAY: @@ -372,6 +374,12 @@ chd_cong_signal(struct cc_var *ccv, uint32_t signal_ty CCV(ccv, t_maxseg) / 2, 2) * CCV(ccv, t_maxseg); } ENTER_FASTRECOVERY(CCV(ccv, t_flags)); + break; + case CC_RTO: + CCV(ccv, snd_ssthresh) = + max((CCV(ccv, snd_max) - CCV(ccv, snd_una)) / 2 / mss, 2) + * mss; + CCV(ccv, snd_cwnd) = mss; break; default: Modified: stable/11/sys/netinet/cc/cc_cubic.c ============================================================================== --- stable/11/sys/netinet/cc/cc_cubic.c Mon Mar 5 06:47:28 2018 (r330444) +++ stable/11/sys/netinet/cc/cc_cubic.c Mon Mar 5 06:52:26 2018 (r330445) @@ -225,8 +225,12 @@ static void cubic_cong_signal(struct cc_var *ccv, uint32_t type) { struct cubic *cubic_data; + uint32_t cwin; + u_int mss; cubic_data = ccv->cc_data; + cwin = CCV(ccv, snd_cwnd); + mss = CCV(ccv, t_maxseg); switch (type) { case CC_NDUPACK: @@ -235,7 +239,8 @@ cubic_cong_signal(struct cc_var *ccv, uint32_t type) cubic_ssthresh_update(ccv); cubic_data->num_cong_events++; cubic_data->prev_max_cwnd = cubic_data->max_cwnd; - cubic_data->max_cwnd = CCV(ccv, snd_cwnd); + cubic_data->max_cwnd = cwin; + CCV(ccv, snd_cwnd) = CCV(ccv, snd_ssthresh); } ENTER_RECOVERY(CCV(ccv, t_flags)); } @@ -246,7 +251,7 @@ cubic_cong_signal(struct cc_var *ccv, uint32_t type) cubic_ssthresh_update(ccv); cubic_data->num_cong_events++; cubic_data->prev_max_cwnd = cubic_data->max_cwnd; - cubic_data->max_cwnd = CCV(ccv, snd_cwnd); + cubic_data->max_cwnd = cwin; cubic_data->t_last_cong = ticks; CCV(ccv, snd_cwnd) = CCV(ccv, snd_ssthresh); ENTER_CONGRECOVERY(CCV(ccv, t_flags)); @@ -261,9 +266,13 @@ cubic_cong_signal(struct cc_var *ccv, uint32_t type) * chance the first one is a false alarm and may not indicate * congestion. */ - if (CCV(ccv, t_rxtshift) >= 2) + if (CCV(ccv, t_rxtshift) >= 2) { cubic_data->num_cong_events++; cubic_data->t_last_cong = ticks; + cubic_ssthresh_update(ccv); + cubic_data->max_cwnd = cwin; + CCV(ccv, snd_cwnd) = mss; + } break; } } Modified: stable/11/sys/netinet/cc/cc_dctcp.c ============================================================================== --- stable/11/sys/netinet/cc/cc_dctcp.c Mon Mar 5 06:47:28 2018 (r330444) +++ stable/11/sys/netinet/cc/cc_dctcp.c Mon Mar 5 06:52:26 2018 (r330445) @@ -230,27 +230,29 @@ static void dctcp_cong_signal(struct cc_var *ccv, uint32_t type) { struct dctcp *dctcp_data; - u_int win, mss; + uint32_t cwin, ssthresh_on_loss; + u_int mss; dctcp_data = ccv->cc_data; - win = CCV(ccv, snd_cwnd); + cwin = CCV(ccv, snd_cwnd); mss = CCV(ccv, t_maxseg); + ssthresh_on_loss = + max((CCV(ccv, snd_max) - CCV(ccv, snd_una)) / 2 / mss, 2) + * mss; switch (type) { case CC_NDUPACK: if (!IN_FASTRECOVERY(CCV(ccv, t_flags))) { if (!IN_CONGRECOVERY(CCV(ccv, t_flags))) { - CCV(ccv, snd_ssthresh) = mss * - max(win / 2 / mss, 2); + CCV(ccv, snd_ssthresh) = ssthresh_on_loss; dctcp_data->num_cong_events++; } else { /* cwnd has already updated as congestion * recovery. Reverse cwnd value using * snd_cwnd_prev and recalculate snd_ssthresh */ - win = CCV(ccv, snd_cwnd_prev); - CCV(ccv, snd_ssthresh) = - max(win / 2 / mss, 2) * mss; + cwin = CCV(ccv, snd_cwnd_prev); + CCV(ccv, snd_ssthresh) = ssthresh_on_loss; } ENTER_RECOVERY(CCV(ccv, t_flags)); } @@ -260,18 +262,17 @@ dctcp_cong_signal(struct cc_var *ccv, uint32_t type) * Save current snd_cwnd when the host encounters both * congestion recovery and fast recovery. */ - CCV(ccv, snd_cwnd_prev) = win; + CCV(ccv, snd_cwnd_prev) = cwin; if (!IN_CONGRECOVERY(CCV(ccv, t_flags))) { if (V_dctcp_slowstart && dctcp_data->num_cong_events++ == 0) { - CCV(ccv, snd_ssthresh) = - mss * max(win / 2 / mss, 2); + CCV(ccv, snd_ssthresh) = ssthresh_on_loss; dctcp_data->alpha = MAX_ALPHA_VALUE; dctcp_data->bytes_ecn = 0; dctcp_data->bytes_total = 0; dctcp_data->save_sndnxt = CCV(ccv, snd_nxt); } else - CCV(ccv, snd_ssthresh) = max((win - ((win * + CCV(ccv, snd_ssthresh) = max((cwin - ((cwin * dctcp_data->alpha) >> 11)) / mss, 2) * mss; CCV(ccv, snd_cwnd) = CCV(ccv, snd_ssthresh); ENTER_CONGRECOVERY(CCV(ccv, t_flags)); @@ -284,6 +285,8 @@ dctcp_cong_signal(struct cc_var *ccv, uint32_t type) dctcp_update_alpha(ccv); dctcp_data->save_sndnxt += CCV(ccv, t_maxseg); dctcp_data->num_cong_events++; + CCV(ccv, snd_ssthresh) = ssthresh_on_loss; + CCV(ccv, snd_cwnd) = mss; } break; } Modified: stable/11/sys/netinet/cc/cc_htcp.c ============================================================================== --- stable/11/sys/netinet/cc/cc_htcp.c Mon Mar 5 06:47:28 2018 (r330444) +++ stable/11/sys/netinet/cc/cc_htcp.c Mon Mar 5 06:52:26 2018 (r330445) @@ -271,8 +271,12 @@ static void htcp_cong_signal(struct cc_var *ccv, uint32_t type) { struct htcp *htcp_data; + uint32_t cwin; + u_int mss; htcp_data = ccv->cc_data; + cwin = CCV(ccv, snd_cwnd); + mss = CCV(ccv, t_maxseg); switch (type) { case CC_NDUPACK: @@ -287,8 +291,9 @@ htcp_cong_signal(struct cc_var *ccv, uint32_t type) (htcp_data->maxrtt - htcp_data->minrtt) * 95) / 100; htcp_ssthresh_update(ccv); + CCV(ccv, snd_cwnd) = CCV(ccv, snd_ssthresh); htcp_data->t_last_cong = ticks; - htcp_data->prev_cwnd = CCV(ccv, snd_cwnd); + htcp_data->prev_cwnd = cwin; } ENTER_RECOVERY(CCV(ccv, t_flags)); } @@ -305,7 +310,7 @@ htcp_cong_signal(struct cc_var *ccv, uint32_t type) htcp_ssthresh_update(ccv); CCV(ccv, snd_cwnd) = CCV(ccv, snd_ssthresh); htcp_data->t_last_cong = ticks; - htcp_data->prev_cwnd = CCV(ccv, snd_cwnd); + htcp_data->prev_cwnd = cwin; ENTER_CONGRECOVERY(CCV(ccv, t_flags)); } break; @@ -320,6 +325,10 @@ htcp_cong_signal(struct cc_var *ccv, uint32_t type) */ if (CCV(ccv, t_rxtshift) >= 2) htcp_data->t_last_cong = ticks; + CCV(ccv, snd_ssthresh) = + max((CCV(ccv, snd_max) - CCV(ccv, snd_una)) / 2 / mss, 2) + * mss; + CCV(ccv, snd_cwnd) = mss; break; } } @@ -511,6 +520,10 @@ htcp_ssthresh_update(struct cc_var *ccv) CCV(ccv, snd_ssthresh) = (CCV(ccv, snd_cwnd) * htcp_data->beta) >> HTCP_SHIFT; } + + /* Align ssthresh to MSS boundary */ + CCV(ccv, snd_ssthresh) = (CCV(ccv, snd_ssthresh) / CCV(ccv, t_maxseg)) + * CCV(ccv, t_maxseg); } Modified: stable/11/sys/netinet/cc/cc_newreno.c ============================================================================== --- stable/11/sys/netinet/cc/cc_newreno.c Mon Mar 5 06:47:28 2018 (r330444) +++ stable/11/sys/netinet/cc/cc_newreno.c Mon Mar 5 06:52:26 2018 (r330445) @@ -181,29 +181,41 @@ newreno_after_idle(struct cc_var *ccv) static void newreno_cong_signal(struct cc_var *ccv, uint32_t type) { - u_int win; + uint32_t cwin, ssthresh_on_loss; + u_int mss; + cwin = CCV(ccv, snd_cwnd); + mss = CCV(ccv, t_maxseg); + ssthresh_on_loss = + max((CCV(ccv, snd_max) - CCV(ccv, snd_una)) / 2 / mss, 2) + * mss; + /* Catch algos which mistakenly leak private signal types. */ KASSERT((type & CC_SIGPRIVMASK) == 0, ("%s: congestion signal type 0x%08x is private\n", __func__, type)); - win = max(CCV(ccv, snd_cwnd) / 2 / CCV(ccv, t_maxseg), 2) * - CCV(ccv, t_maxseg); + cwin = max(cwin / 2 / mss, 2) * mss; switch (type) { case CC_NDUPACK: if (!IN_FASTRECOVERY(CCV(ccv, t_flags))) { - if (!IN_CONGRECOVERY(CCV(ccv, t_flags))) - CCV(ccv, snd_ssthresh) = win; + if (!IN_CONGRECOVERY(CCV(ccv, t_flags))) { + CCV(ccv, snd_ssthresh) = ssthresh_on_loss; + CCV(ccv, snd_cwnd) = cwin; + } ENTER_RECOVERY(CCV(ccv, t_flags)); } break; case CC_ECN: if (!IN_CONGRECOVERY(CCV(ccv, t_flags))) { - CCV(ccv, snd_ssthresh) = win; - CCV(ccv, snd_cwnd) = win; + CCV(ccv, snd_ssthresh) = ssthresh_on_loss; + CCV(ccv, snd_cwnd) = cwin; ENTER_CONGRECOVERY(CCV(ccv, t_flags)); } + break; + case CC_RTO: + CCV(ccv, snd_ssthresh) = ssthresh_on_loss; + CCV(ccv, snd_cwnd) = mss; break; } } Modified: stable/11/sys/netinet/tcp_input.c ============================================================================== --- stable/11/sys/netinet/tcp_input.c Mon Mar 5 06:47:28 2018 (r330444) +++ stable/11/sys/netinet/tcp_input.c Mon Mar 5 06:52:26 2018 (r330445) @@ -429,9 +429,16 @@ cc_cong_signal(struct tcpcb *tp, struct tcphdr *th, ui tp->t_dupacks = 0; tp->t_bytes_acked = 0; EXIT_RECOVERY(tp->t_flags); - tp->snd_ssthresh = max(2, min(tp->snd_wnd, tp->snd_cwnd) / 2 / - maxseg) * maxseg; - tp->snd_cwnd = maxseg; + if (CC_ALGO(tp)->cong_signal == NULL) { + /* + * RFC5681 Section 3.1 + * ssthresh = max (FlightSize / 2, 2*SMSS) eq (4) + */ + tp->snd_ssthresh = + max((tp->snd_max - tp->snd_una) / 2 / maxseg, 2) + * maxseg; + tp->snd_cwnd = maxseg; + } break; case CC_RTO_ERR: TCPSTAT_INC(tcps_sndrexmitbad); @@ -2592,6 +2599,15 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th, stru if (awnd < tp->snd_ssthresh) { tp->snd_cwnd += maxseg; + /* + * RFC5681 Section 3.2 talks about cwnd + * inflation on additional dupacks and + * deflation on recovering from loss. + * + * We keep cwnd into check so that + * we don't have to 'deflate' it when we + * get out of recovery. + */ if (tp->snd_cwnd > tp->snd_ssthresh) tp->snd_cwnd = tp->snd_ssthresh; } @@ -2630,19 +2646,22 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th, stru TCPSTAT_INC( tcps_sack_recovery_episode); tp->sack_newdata = tp->snd_nxt; - tp->snd_cwnd = maxseg; + if (CC_ALGO(tp)->cong_signal == NULL) + tp->snd_cwnd = maxseg; (void) tp->t_fb->tfb_tcp_output(tp); goto drop; } tp->snd_nxt = th->th_ack; - tp->snd_cwnd = maxseg; + if (CC_ALGO(tp)->cong_signal == NULL) + tp->snd_cwnd = maxseg; (void) tp->t_fb->tfb_tcp_output(tp); KASSERT(tp->snd_limited <= 2, ("%s: tp->snd_limited too big", __func__)); - tp->snd_cwnd = tp->snd_ssthresh + - maxseg * - (tp->t_dupacks - tp->snd_limited); + if (CC_ALGO(tp)->cong_signal == NULL) + tp->snd_cwnd = tp->snd_ssthresh + + maxseg * + (tp->t_dupacks - tp->snd_limited); if (SEQ_GT(onxt, tp->snd_nxt)) tp->snd_nxt = onxt; goto drop; Modified: stable/11/sys/netinet/tcp_stacks/fastpath.c ============================================================================== --- stable/11/sys/netinet/tcp_stacks/fastpath.c Mon Mar 5 06:47:28 2018 (r330444) +++ stable/11/sys/netinet/tcp_stacks/fastpath.c Mon Mar 5 06:52:26 2018 (r330445) @@ -1035,6 +1035,15 @@ tcp_do_slowpath(struct mbuf *m, struct tcphdr *th, str if (awnd < tp->snd_ssthresh) { tp->snd_cwnd += tp->t_maxseg; + /* + * RFC5681 Section 3.2 talks about cwnd + * inflation on additional dupacks and + * deflation on recovering from loss. + * + * We keep cwnd into check so that + * we don't have to 'deflate' it when we + * get out of recovery. + */ if (tp->snd_cwnd > tp->snd_ssthresh) tp->snd_cwnd = tp->snd_ssthresh; } @@ -1073,19 +1082,22 @@ tcp_do_slowpath(struct mbuf *m, struct tcphdr *th, str TCPSTAT_INC( tcps_sack_recovery_episode); tp->sack_newdata = tp->snd_nxt; - tp->snd_cwnd = tp->t_maxseg; + if (CC_ALGO(tp)->cong_signal == NULL) + tp->snd_cwnd = tp->t_maxseg; (void) tp->t_fb->tfb_tcp_output(tp); goto drop; } tp->snd_nxt = th->th_ack; - tp->snd_cwnd = tp->t_maxseg; + if (CC_ALGO(tp)->cong_signal == NULL) + tp->snd_cwnd = tp->t_maxseg; (void) tp->t_fb->tfb_tcp_output(tp); KASSERT(tp->snd_limited <= 2, ("%s: tp->snd_limited too big", __func__)); - tp->snd_cwnd = tp->snd_ssthresh + - tp->t_maxseg * - (tp->t_dupacks - tp->snd_limited); + if (CC_ALGO(tp)->cong_signal == NULL) + tp->snd_cwnd = tp->snd_ssthresh + + tp->t_maxseg * + (tp->t_dupacks - tp->snd_limited); if (SEQ_GT(onxt, tp->snd_nxt)) tp->snd_nxt = onxt; goto drop;