From nobody Thu Apr 27 09:00:33 2023 X-Original-To: freebsd-transport@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Q6V825HNBz46blK for ; Thu, 27 Apr 2023 09:00:50 +0000 (UTC) (envelope-from biggy.pardeshi@gmail.com) Received: from mail-pg1-x52c.google.com (mail-pg1-x52c.google.com [IPv6:2607:f8b0:4864:20::52c]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Q6V7z2D7Gz3BpH; Thu, 27 Apr 2023 09:00:47 +0000 (UTC) (envelope-from biggy.pardeshi@gmail.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20221208 header.b=oX6op5ax; spf=pass (mx1.freebsd.org: domain of biggy.pardeshi@gmail.com designates 2607:f8b0:4864:20::52c as permitted sender) smtp.mailfrom=biggy.pardeshi@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pg1-x52c.google.com with SMTP id 41be03b00d2f7-51b33c72686so6079892a12.1; Thu, 27 Apr 2023 02:00:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1682586045; x=1685178045; h=cc:to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=RVlgNp5utacLhGTLgoCcFQh+xIdZW+b1qTYPrf5wYEw=; b=oX6op5ax48sBtCBR4h9rMk7UjsXzHNKK1BFeaSw0bcWcjBGUOO1dmDcCClTMMBZFmG f3VjWrGxx20bdVl+wZM4p0TtFzR72hCDKyD2tsYWKPQAW+5hPnV3dmD0sz1+zA8ZB40Y sRKOoWOl0AR968DrLhYqxCrDehc/3bsbNhfjo0mub5JtT5ePLj6nGg7J4OMmiPgAREZJ Ln25OKK93N/yn/A/wINXf2wR9xJ6A/FKlc70IatLQpZTBRd4kvpy1bXVfB9BmrRHGCUv mEQO7s/eWRkmEj+EWoH9iW0cAtsIZl+2MAA7jsW+pDKqt+hDDQ4m6bOELAPwrobq8BU2 EuwQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682586045; x=1685178045; h=cc:to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=RVlgNp5utacLhGTLgoCcFQh+xIdZW+b1qTYPrf5wYEw=; b=Qhp1Oh/f+xP6ZVjL1Ni3lg9DY/ANbaYTPkKnEf3oOF3W+n9EpmQ9Djy5yL3otGhteC z4HfWNYn37/HZL2HRauoA9ytPYbq8U55BvI6A9ooObqAGNN+uS2kwmW3JwUr+j7ghP3W rw6OTRNVJwXOyNX5k7d5vcIluIsvqUXRx4fLrTtyfY4YvXMK3ZPIeN/x4z8QxDhjzjIv 3+6ue+FGkZ3d1zlMABuTGm202h6Cdo/AsRHjYjr+5/rSDO/Id1MIxLUIOT+9UbWHTGGq 7uMBAEGX6WOPOX9T/rx8IQ5J32wZcIr1YJGInPxYqM5j2Mf8oQwZHejT8S0b5xg86dWT uorQ== X-Gm-Message-State: AC+VfDznQlMDBKxVf1g2G3rkyXtDyhZIjF9aIoJjXUbiWDZR1bCV8sRV y72eDapmHcDwk0bUfeXssllfAIcnwsGHm4p+rS8+0Ke5Eag= X-Google-Smtp-Source: ACHHUZ6uBpBcHAxbJ92hQBVdurvQmBmycuN6NXvq6of2Oor3yXy4VxM6xyG9/SNti/gP6HrjIlVSAsXM031+yTFULcM= X-Received: by 2002:a17:90b:3a8d:b0:23f:2661:f94c with SMTP id om13-20020a17090b3a8d00b0023f2661f94cmr1087572pjb.47.1682586045237; Thu, 27 Apr 2023 02:00:45 -0700 (PDT) List-Id: Discussions List-Archive: https://lists.freebsd.org/archives/freebsd-transport List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-transport@freebsd.org X-BeenThere: freebsd-transport@freebsd.org MIME-Version: 1.0 From: biggy pardeshi Date: Thu, 27 Apr 2023 14:30:33 +0530 Message-ID: Subject: FreeBSD CUBIC - Extremely low performance for short RTT setups experiencing congestion To: freebsd-transport@freebsd.org Cc: rscheff@freebsd.org Content-Type: multipart/alternative; boundary="00000000000087cdd005fa4d97b9" X-Spamd-Result: default: False [-4.00 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-0.995]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36:c]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20221208]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; FREEMAIL_FROM(0.00)[gmail.com]; BLOCKLISTDE_FAIL(0.00)[2607:f8b0:4864:20::52c:server fail]; ARC_NA(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::52c:from]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; MLMMJ_DEST(0.00)[freebsd-transport@freebsd.org]; DKIM_TRACE(0.00)[gmail.com:+]; MID_RHS_MATCH_FROMTLD(0.00)[]; TO_DN_NONE(0.00)[]; TAGGED_FROM(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; RCVD_TLS_LAST(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; FREEMAIL_ENVFROM(0.00)[gmail.com]; MIME_TRACE(0.00)[0:+,1:+,2:~]; RCVD_COUNT_TWO(0.00)[2] X-Rspamd-Queue-Id: 4Q6V7z2D7Gz3BpH X-Spamd-Bar: --- X-ThisMailContainsUnwantedMimeParts: N --00000000000087cdd005fa4d97b9 Content-Type: text/plain; charset="UTF-8" Hi team, I am a networking datapath engineer working at VMware. I primarily work on VMware's ESX's TCP/IP stack. We have been playing around with FreeBSD's CUBIC Congestion Control Algorithm (CCA) for a while. In most of our performance tests, CUBIC is performing fine. However, we saw that CUBIC results in performance degradation of around 100-150% in setups where the following conditions are met 1. sender and receiver are back-to-back connected (short RTT) 2. back-to-back connection/link experiences congestion We saw that the congestion window (cwnd) is not increasing fast enough in the case of CUBIC, after experiencing a congestion event. On the same setup, NewReno is performing very well and provides a very fast increase in the cwnd after a congestion event. https://datatracker.ietf.org/doc/html/draft-ietf-tcpm-cubic-06 - CUBIC Internet-Draft (I-D) As per the I-D, it is mentioned that for short RTT (or even low BDP) networks standard TCP CCAs perform better than CUBIC. Hence, in such cases, the TCP-friendly window size (equation 4) will always come out to be greater than the cubic window size (equation 1). Hence, we will focus our discussion only on the TCP-friendly window size being calculated during the congestion avoidance phase. Theoretically, CUBIC should perform at least as better as standard TCP for short RTT setups. However, in reality, this is not being observed. We find that the granularity of the system's ticks value is causing CUBIC to not grow the cwnd faster. I will explain this issue with an example. ------- Consider that we have a short RTT setup where the RTT is 0.05ms. Let us assume that the system's tick frequency is 1000 HZ, i.e. the ticks value will increase by 1, once every 1ms (FreeBSD probably has a 1ms tick timer). Now let us consider a period of 1s. During this 1s period, the sender will receive around 1s/0.05ms = 1000ms/0.05ms = 20000 ACKs. Consequently, we will be calling the cc_ack_received() callback for each of these ACKs. For NewReno, the cwnd will be increased for each of those ACKs, till the TCP flow becomes limited by the receiver's window. However, in CUBIC, even though the cubic_ack_received() callback is invoked for each of those 20000 ACKs, the cwnd will not be increased for each of those ACKs. This is because of the "ticks" value used to calculate the time elapsed from the last congestion event. In FreeBSD for a period of 1ms, the ticks value will be the same. In 1ms, we will receive 1ms/0.05ms = 20 ACKs. For all of these 20 ACKs, ticks value will be the same, the time elapsed from the last congestion will be the same, and finally, the TCP-friendly window estimate will be the same. Hence, cwnd will not increase for the entire 1ms duration. If a system has some other timer period (for eg. 10ms), the cwnd will stay the same for that entire period. Of these 20000 ACKs, NewReno will try to increase the cwnd value for almost every ACK. However, CUBIC will increase it only for 1000 ACKs. That too distributed over a period of 1s. I hope the issue is now clear. ---- We wanted to discuss the solution to this issue. Currently, we have thought of falling back to using the newreno way of doing congestion avoidance when we are dealing with short RTT connections. This means that if the mean RTT value maintained by CUBIC private data is less than or equal to 1, we will use NewReno's way of doing congestion avoidance to get a TCP-friendly window estimate. In other cases (non-short RTT), we will use equation 4 of I-D to get the TCP-friendly estimate. This will resolve the issue we are seeing. However, the only concern we have in this approach is that this will make CUBIC RTT-dependent for short RTT networks. As per the I-D I see the following Another notable feature of CUBIC is that its window increase rate is mostly independent of RTT, and follows a (cubic) function of the elapsed time from the beginning of congestion avoidance. So, we are not sure if logically this solution is right or not. Also, we are not sure of any other implications this change might cause in CUBIC. ---- Adding Richard to the thread directly, as I have been following his work on CUBIC for some time. Thanks, Bhaskar Pardeshi (bpardeshi@vmware.com) VMware, Inc. --00000000000087cdd005fa4d97b9 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi team,

I am a networking datapath eng= ineer working at VMware. I primarily work on VMware's ESX's TCP/IP = stack. We have been playing around with FreeBSD's CUBIC Congestion Cont= rol Algorithm (CCA) for a while. In most of our performance tests, CUBIC is= performing fine. However, we saw that CUBIC results in performance degrada= tion of around 100-150% in setups where the following conditions are met
  1. sender and receiver are back-to-back connected (short RTT)<= /li>
  2. back-to-back connection/link experiences congestion
W= e saw that the congestion window (cwnd) is not increasing fast enough in th= e case of CUBIC, after experiencing a congestion event. On the same setup, = NewReno is performing=C2=A0very well and provides a very fast increase in t= he cwnd after a congestion event.

As per the I-D, it is mentioned that for short RTT= (or even low BDP) networks standard TCP CCAs perform=C2=A0better than CUBI= C. Hence, in such cases, the TCP-friendly window size (equation 4) will alw= ays come out to be greater than the cubic window size (equation 1). Hence, = we will focus our discussion only on the TCP-friendly window size being cal= culated during the congestion avoidance phase.

The= oretically,=C2=A0CUBIC should perform at least as better as standard TCP fo= r short RTT setups. However, in reality, this is not being observed. We fin= d that the granularity of the system's ticks value is causing CUBIC to = not grow the cwnd faster. I will explain this issue with an example.
<= div>
-------

Consider that we have a= short RTT setup where the RTT is 0.05ms. Let us assume that the system'= ;s tick frequency is 1000 HZ, i.e. the ticks value will increase by 1, once= every 1ms (FreeBSD probably has a 1ms tick timer).

Now let us consider a period of 1s. During this 1s period, the sender wil= l receive around 1s/0.05ms =3D 1000ms/0.05ms =3D 20000 ACKs. Consequently, = we will be calling the cc_ack_received() callback for each of these ACKs. F= or NewReno, the cwnd will be increased for each of those ACKs, till the TCP= flow becomes limited by the receiver's window. However, in CUBIC, even= though the cubic_ack_received() callback is invoked for each of those 2000= 0 ACKs, the cwnd will not be increased for each of those ACKs. This is beca= use of the "ticks" value used to calculate the time elapsed from = the last congestion event. In FreeBSD for a period of 1ms, the ticks value = will be the same. In 1ms, we will receive 1ms/0.05ms =3D 20 ACKs. For all o= f these 20 ACKs, ticks value will be the same, the time elapsed from the la= st congestion will be the same, and finally, the TCP-friendly window estima= te will be the same. Hence, cwnd will not increase for the entire 1ms durat= ion. If a system has some other timer period (for eg. 10ms), the cwnd will = stay the same for that entire period.

Of these 200= 00 ACKs,=C2=A0 NewReno will try to increase the cwnd value for almost every= ACK. However, CUBIC will increase it only for 1000 ACKs. That too distribu= ted over a period of 1s. I hope the issue is now clear.

----
We wanted to discuss the solution to this issue. Curre= ntly, we have thought of falling back to using the newreno way of doing con= gestion avoidance when we are dealing with short RTT connections. This mean= s that if the mean RTT value maintained by CUBIC private data is less than = or equal to 1, we will use NewReno's way of doing congestion avoidance = to get a TCP-friendly window estimate. In other cases (non-short RTT), we w= ill use equation 4 of I-D to get the TCP-friendly estimate.

<= /div>
This will resolve the issue we are seeing. However, the only conc= ern we have in this approach is that this will make CUBIC RTT-dependent for= short RTT networks. As per the I-D I see the following
=
   Another notable feature of CUBIC is that its window increase rate is
   mostly independent of RTT, and follows a (cubic) function of the
   elapsed time from the beginning of congestion avoidance.
So, we are not sure if logically this solution is right or not= . Also, we are not sure of any other implications this change might cause i= n CUBIC.

----
Adding Richard to the thre= ad directly, as I have been following his work on CUBIC for some time.

Thanks,
Bhaskar Pardeshi (bpardeshi@vmware.com)
VMware, Inc= .
--00000000000087cdd005fa4d97b9--