From nobody Thu Jun 1 15:55:58 2023 X-Original-To: dev-commits-src-main@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4QX9ht5lGrz4YhJM; Thu, 1 Jun 2023 15:55:58 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4QX9ht4B5zz3Jhh; Thu, 1 Jun 2023 15:55:58 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1685634958; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=0OTstH9vz9PM5Sbsp4cQktaTx2uYo/O9ewZKvs8BzeA=; b=cVH/sOGy+L4buA/wzF3Ns10obcA5TX/6VkcMVZCPDH0gAV1KF/eEetw/q4UqVpe7KC1zA8 poS4j5uysL9MS1Y0DjdUC3QPNHxY3kZwTKV2pulu3aMUWes+7JjyyX1ywOCW2nO7ilK1vu SswcC31yzX6eR7+ZOvmfZDg3OHhk0DgqBuf/g1oBXSfSnPQZgTym/xN6vzhXFlF9Blzs2V KOtDs5jvL3WikdAyDRbiLzNNegGkeX5cp1AneBv8ovKe/U31zno9YFzjsfyqql91N8RC7o CwuwQjUqNHEDn/c4fRz23Gsn3oscXvitpSM1YBj82gIpsHsMNkiw7N0vG04F9A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1685634958; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=0OTstH9vz9PM5Sbsp4cQktaTx2uYo/O9ewZKvs8BzeA=; b=if0WOps1dHwm27RksKlNrRzvy5uP9w05QQxDU3NAb7skB871mrnbsO/oYf5zyOGYhsBrPb JA/TZYn6F8h/BtQz820d2vtmw5vNe7Hz1tQs6Wqu1adlTS+h50oWdZS1iP9mqDO+IHPgdY wZxWb4au5+gFSJrFX+V5oE3Ogga5EmPuazDSfjDtiX9i3AjfkhJFpxRLJdL6s6Xi/+wZsr O0XRzTNmOssb/RKM03BVGN2qC7ZFWEJd3GSER9mlpBy2yWG3hPF8cTnDiEB5G/2B4783vW XV87bre6I6KuCY9/uGr7PkrUuMjJ6p0x8MXUD6XLUWI8L1vKu21jgG6hyIZmFA== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1685634958; a=rsa-sha256; cv=none; b=Yp9YfCtX1r4o9hKMfvmuT8ZPAMAiPbd4OAoI71ExwzQXrea8uJIFc0JWXdaYwPWflXJUAx izCZt6oxFS1NdWB8/OxUuT29IDveUpf4N6kjwO5ZuNjOzJQqmc/ztCz1ZpyP6g00JDW9+O uEESc2l7riNqQldmloC+0U1eNW+FwHwH/GqQXxObZ8209d5o4YHqI/l9jk/Nm/AmWekmT0 XcIrvYgsyMSkI+T6E7kNk+kQPnmctBPPPAceydyOx788ZS8VvEMsLBUFr6VDfBUo/1q6bn TFMVFq/TZhFlr6gCfSZEns4LILUUF/GZ+Q7s+8MSPII3oGDRH6ELpxENYsVFng== Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4QX9ht3GjTzNfd; Thu, 1 Jun 2023 15:55:58 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.16.1/8.16.1) with ESMTP id 351Ftw8W050367; Thu, 1 Jun 2023 15:55:58 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.16.1/8.16.1/Submit) id 351Ftwsk050366; Thu, 1 Jun 2023 15:55:58 GMT (envelope-from git) Date: Thu, 1 Jun 2023 15:55:58 GMT Message-Id: <202306011555.351Ftwsk050366@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-main@FreeBSD.org From: Cheng Cui Subject: git: a3aa6f652904 - main - cc_cubic: Use units of micro seconds (usecs) instead of ticks in rtt. List-Id: Commit messages for the main branch of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-main List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-main@freebsd.org X-BeenThere: dev-commits-src-main@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: cc X-Git-Repository: src X-Git-Refname: refs/heads/main X-Git-Reftype: branch X-Git-Commit: a3aa6f65290482cedf4aeda1d0875ca6433c7f04 Auto-Submitted: auto-generated X-ThisMailContainsUnwantedMimeParts: N The branch main has been updated by cc: URL: https://cgit.FreeBSD.org/src/commit/?id=a3aa6f65290482cedf4aeda1d0875ca6433c7f04 commit a3aa6f65290482cedf4aeda1d0875ca6433c7f04 Author: Cheng Cui AuthorDate: 2023-06-01 11:48:07 +0000 Commit: Cheng Cui CommitDate: 2023-06-01 11:55:01 +0000 cc_cubic: Use units of micro seconds (usecs) instead of ticks in rtt. This improves TCP friendly cwnd in cases of low latency high drop rate networks. Tests show +42% and +37% better performance in 1Gpbs and 10Gbps cases. Reported by: Bhaskar Pardeshi from VMware. Reviewed By: rscheff, tuexen Approved by: rscheff (mentor), tuexen (mentor) --- sys/netinet/cc/cc_cubic.c | 60 +++++++++++++++++++++++++---------------------- sys/netinet/cc/cc_cubic.h | 33 ++++++++++++++------------ 2 files changed, 50 insertions(+), 43 deletions(-) diff --git a/sys/netinet/cc/cc_cubic.c b/sys/netinet/cc/cc_cubic.c index 8992b9beba13..be9bd9859122 100644 --- a/sys/netinet/cc/cc_cubic.c +++ b/sys/netinet/cc/cc_cubic.c @@ -240,7 +240,7 @@ cubic_ack_received(struct cc_var *ccv, uint16_t type) { struct cubic *cubic_data; unsigned long w_tf, w_cubic_next; - int ticks_since_cong; + int usecs_since_cong; cubic_data = ccv->cc_data; cubic_record_rtt(ccv); @@ -253,7 +253,7 @@ cubic_ack_received(struct cc_var *ccv, uint16_t type) (ccv->flags & CCF_CWND_LIMITED)) { /* Use the logic in NewReno ack_received() for slow start. */ if (CCV(ccv, snd_cwnd) <= CCV(ccv, snd_ssthresh) || - cubic_data->min_rtt_ticks == TCPTV_SRTTBASE) { + cubic_data->min_rtt_usecs == TCPTV_SRTTBASE) { cubic_does_slow_start(ccv, cubic_data); } else { if (cubic_data->flags & CUBICFLAG_HYSTART_IN_CSS) { @@ -282,12 +282,12 @@ cubic_ack_received(struct cc_var *ccv, uint16_t type) cubic_data->K = cubic_k(cubic_data->max_cwnd / CCV(ccv, t_maxseg)); } - if ((ticks_since_cong = - ticks - cubic_data->t_last_cong) < 0) { + usecs_since_cong = (ticks - cubic_data->t_last_cong) * tick; + if (usecs_since_cong < 0) { /* * dragging t_last_cong along */ - ticks_since_cong = INT_MAX; + usecs_since_cong = INT_MAX; cubic_data->t_last_cong = ticks - INT_MAX; } /* @@ -297,13 +297,14 @@ cubic_ack_received(struct cc_var *ccv, uint16_t type) * RTT is dominated by network buffering rather than * propagation delay. */ - w_tf = tf_cwnd(ticks_since_cong, - cubic_data->mean_rtt_ticks, cubic_data->max_cwnd, - CCV(ccv, t_maxseg)); + w_tf = tf_cwnd(usecs_since_cong, cubic_data->mean_rtt_usecs, + cubic_data->max_cwnd, CCV(ccv, t_maxseg)); - w_cubic_next = cubic_cwnd(ticks_since_cong + - cubic_data->mean_rtt_ticks, cubic_data->max_cwnd, - CCV(ccv, t_maxseg), cubic_data->K); + w_cubic_next = cubic_cwnd(usecs_since_cong + + cubic_data->mean_rtt_usecs, + cubic_data->max_cwnd, + CCV(ccv, t_maxseg), + cubic_data->K); ccv->flags &= ~CCF_ABC_SENTAWND; @@ -397,8 +398,8 @@ cubic_cb_init(struct cc_var *ccv, void *ptr) /* Init some key variables with sensible defaults. */ cubic_data->t_last_cong = ticks; - cubic_data->min_rtt_ticks = TCPTV_SRTTBASE; - cubic_data->mean_rtt_ticks = 1; + cubic_data->min_rtt_usecs = TCPTV_SRTTBASE; + cubic_data->mean_rtt_usecs = 1; ccv->cc_data = cubic_data; cubic_data->flags = CUBICFLAG_HYSTART_ENABLED; @@ -549,13 +550,13 @@ cubic_post_recovery(struct cc_var *ccv) /* Calculate the average RTT between congestion epochs. */ if (cubic_data->epoch_ack_count > 0 && - cubic_data->sum_rtt_ticks >= cubic_data->epoch_ack_count) { - cubic_data->mean_rtt_ticks = (int)(cubic_data->sum_rtt_ticks / + cubic_data->sum_rtt_usecs >= cubic_data->epoch_ack_count) { + cubic_data->mean_rtt_usecs = (int)(cubic_data->sum_rtt_usecs / cubic_data->epoch_ack_count); } cubic_data->epoch_ack_count = 0; - cubic_data->sum_rtt_ticks = 0; + cubic_data->sum_rtt_usecs = 0; } /* @@ -565,13 +566,13 @@ static void cubic_record_rtt(struct cc_var *ccv) { struct cubic *cubic_data; - int t_srtt_ticks; + uint32_t t_srtt_usecs; /* Ignore srtt until a min number of samples have been taken. */ if (CCV(ccv, t_rttupdated) >= CUBIC_MIN_RTT_SAMPLES) { cubic_data = ccv->cc_data; - t_srtt_ticks = tcp_get_srtt(ccv->ccvc.tcp, - TCP_TMR_GRANULARITY_TICKS); + t_srtt_usecs = tcp_get_srtt(ccv->ccvc.tcp, + TCP_TMR_GRANULARITY_USEC); /* * Record the current SRTT as our minrtt if it's the smallest * we've seen or minrtt is currently equal to its initialised @@ -579,24 +580,27 @@ cubic_record_rtt(struct cc_var *ccv) * * XXXLAS: Should there be some hysteresis for minrtt? */ - if ((t_srtt_ticks < cubic_data->min_rtt_ticks || - cubic_data->min_rtt_ticks == TCPTV_SRTTBASE)) { - cubic_data->min_rtt_ticks = max(1, t_srtt_ticks); + if ((t_srtt_usecs < cubic_data->min_rtt_usecs || + cubic_data->min_rtt_usecs == TCPTV_SRTTBASE)) { + /* A minimal rtt is a single unshifted tick of a ticks + * timer. */ + cubic_data->min_rtt_usecs = max(tick >> TCP_RTT_SHIFT, + t_srtt_usecs); /* * If the connection is within its first congestion - * epoch, ensure we prime mean_rtt_ticks with a + * epoch, ensure we prime mean_rtt_usecs with a * reasonable value until the epoch average RTT is * calculated in cubic_post_recovery(). */ - if (cubic_data->min_rtt_ticks > - cubic_data->mean_rtt_ticks) - cubic_data->mean_rtt_ticks = - cubic_data->min_rtt_ticks; + if (cubic_data->min_rtt_usecs > + cubic_data->mean_rtt_usecs) + cubic_data->mean_rtt_usecs = + cubic_data->min_rtt_usecs; } /* Sum samples for epoch average RTT calculation. */ - cubic_data->sum_rtt_ticks += t_srtt_ticks; + cubic_data->sum_rtt_usecs += t_srtt_usecs; cubic_data->epoch_ack_count++; } } diff --git a/sys/netinet/cc/cc_cubic.h b/sys/netinet/cc/cc_cubic.h index 0749a9ebbc1a..3d408154c1a5 100644 --- a/sys/netinet/cc/cc_cubic.h +++ b/sys/netinet/cc/cc_cubic.h @@ -91,8 +91,8 @@ struct cubic { /* CUBIC K in fixed point form with CUBIC_SHIFT worth of precision. */ int64_t K; - /* Sum of RTT samples across an epoch in ticks. */ - int64_t sum_rtt_ticks; + /* Sum of RTT samples across an epoch in usecs. */ + int64_t sum_rtt_usecs; /* cwnd at the most recent congestion event. */ unsigned long max_cwnd; /* cwnd at the previous congestion event. */ @@ -101,10 +101,10 @@ struct cubic { unsigned long prev_max_cwnd_cp; /* various flags */ uint32_t flags; - /* Minimum observed rtt in ticks. */ - int min_rtt_ticks; + /* Minimum observed rtt in usecs. */ + int min_rtt_usecs; /* Mean observed rtt between congestion epochs. */ - int mean_rtt_ticks; + int mean_rtt_usecs; /* ACKs since last congestion event. */ int epoch_ack_count; /* Timestamp (in ticks) of arriving in congestion avoidance from last @@ -222,14 +222,15 @@ cubic_k(unsigned long wmax_pkts) * XXXLAS: Characterise bounds for overflow. */ static __inline unsigned long -cubic_cwnd(int ticks_since_cong, unsigned long wmax, uint32_t smss, int64_t K) +cubic_cwnd(int usecs_since_cong, unsigned long wmax, uint32_t smss, int64_t K) { int64_t cwnd; /* K is in fixed point form with CUBIC_SHIFT worth of precision. */ /* t - K, with CUBIC_SHIFT worth of precision. */ - cwnd = (((int64_t)ticks_since_cong << CUBIC_SHIFT) - (K * hz)) / hz; + cwnd = (((int64_t)usecs_since_cong << CUBIC_SHIFT) - (K * hz * tick)) / + (hz * tick); if (cwnd > CUBED_ROOT_MAX_ULONG) return INT_MAX; @@ -255,15 +256,17 @@ cubic_cwnd(int ticks_since_cong, unsigned long wmax, uint32_t smss, int64_t K) } /* - * Compute an approximation of the NewReno cwnd some number of ticks after a + * Compute an approximation of the NewReno cwnd some number of usecs after a * congestion event. RTT should be the average RTT estimate for the path * measured over the previous congestion epoch and wmax is the value of cwnd at * the last congestion event. The "TCP friendly" concept in the CUBIC I-D is * rather tricky to understand and it turns out this function is not required. * It is left here for reference. + * + * XXX: Not used */ static __inline unsigned long -reno_cwnd(int ticks_since_cong, int rtt_ticks, unsigned long wmax, +reno_cwnd(int usecs_since_cong, int rtt_usecs, unsigned long wmax, uint32_t smss) { @@ -272,26 +275,26 @@ reno_cwnd(int ticks_since_cong, int rtt_ticks, unsigned long wmax, * W_tcp(t) deals with cwnd/wmax in pkts, so because our cwnd is in * bytes, we have to multiply by smss. */ - return (((wmax * RENO_BETA) + (((ticks_since_cong * smss) - << CUBIC_SHIFT) / rtt_ticks)) >> CUBIC_SHIFT); + return (((wmax * RENO_BETA) + (((usecs_since_cong * smss) + << CUBIC_SHIFT) / rtt_usecs)) >> CUBIC_SHIFT); } /* - * Compute an approximation of the "TCP friendly" cwnd some number of ticks + * Compute an approximation of the "TCP friendly" cwnd some number of usecs * after a congestion event that is designed to yield the same average cwnd as * NewReno while using CUBIC's beta of 0.7. RTT should be the average RTT * estimate for the path measured over the previous congestion epoch and wmax is * the value of cwnd at the last congestion event. */ static __inline unsigned long -tf_cwnd(int ticks_since_cong, int rtt_ticks, unsigned long wmax, +tf_cwnd(int usecs_since_cong, int rtt_usecs, unsigned long wmax, uint32_t smss) { /* Equation 4 of I-D. */ return (((wmax * CUBIC_BETA) + - (((THREE_X_PT3 * (unsigned long)ticks_since_cong * - (unsigned long)smss) << CUBIC_SHIFT) / (TWO_SUB_PT3 * rtt_ticks))) + (((THREE_X_PT3 * (unsigned long)usecs_since_cong * + (unsigned long)smss) << CUBIC_SHIFT) / (TWO_SUB_PT3 * rtt_usecs))) >> CUBIC_SHIFT); }