From nobody Wed Jul 17 20:00:31 2024 X-Original-To: freebsd-net@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4WPRd92v3Mz5QHXF for ; Wed, 17 Jul 2024 20:00:45 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-vk1-f180.google.com (mail-vk1-f180.google.com [209.85.221.180]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "WR4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4WPRd850tkz4ctb; Wed, 17 Jul 2024 20:00:44 +0000 (UTC) (envelope-from asomers@gmail.com) Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=freebsd.org (policy=none); spf=pass (mx1.freebsd.org: domain of asomers@gmail.com designates 209.85.221.180 as permitted sender) smtp.mailfrom=asomers@gmail.com Received: by mail-vk1-f180.google.com with SMTP id 71dfb90a1353d-4f2f39829a9so30115e0c.2; Wed, 17 Jul 2024 13:00:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721246443; x=1721851243; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=xum+T2H/5fdMy2+AdTrPljq44v7mYxkPNqFHTnJivC8=; b=TRZ40KrMBEsuhUIasJCnjy5YJ6Z1fTiB3UjploIcDMBNlovddYwtQMgkHHbdkyM2hw tzt1+xS8svzVMY4xyh8EV7e0fN4b7mjqBVrDQgTalCqs3L55M9PL7Da7eY4FqBUzFFMi 64tVYpYjR/X14xTfm/aREmq82BIb1C9W5TxagtyjcJF8Cuht47NjpwPENNX12D/OoT3a vEhAuruX3oJVv08kdlFlcmazBDIA8rhm3xQJRCDMf/BLci9if1/PTtYhd4Q9zbkKm3Lz g32+46McEs4bM4asrc+pav5mefSL/VKZEjysS9xD5dgugo3ygESPJwmLBHairE9I6hiC bNoA== X-Forwarded-Encrypted: i=1; AJvYcCUb6Z+fAeHz0l/xt9FJhPQZMNamc/qk2nqJdV5iAiyFO4VMy76w3YwH1WbZzlKc3ZqlnzgAeax7Iq1UDFI5uQLnTtrI6od/rA== X-Gm-Message-State: AOJu0YyK8aPwUISNR0GMdWgFIu5zEwe7cKhnFUjlv6oMsWKG5CndWvc5 Kgi72VNXtYyStqfsEFLl3UL8eMyH1TnvrGuOIUpQoDN+YevEkQRja48e6fMZy87NiFjvJH6iz12 0MOauBHnpGcYXuaqG9+enBP0LVaQR6i1i X-Google-Smtp-Source: AGHT+IHZCcX0HJoMtZvDu4xwBqAbIQcN+XslNWAS4lSTIPaF/82g2t9WofmR+cliRUjIikYV5TPrwsn5+RbspI71IyY= X-Received: by 2002:a05:6122:1347:b0:4df:1a3f:2ec1 with SMTP id 71dfb90a1353d-4f4df64d2ffmr4012588e0c.1.1721246443073; Wed, 17 Jul 2024 13:00:43 -0700 (PDT) List-Id: Networking and TCP/IP with FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-net List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-net@FreeBSD.org MIME-Version: 1.0 References: In-Reply-To: From: Alan Somers Date: Wed, 17 Jul 2024 14:00:31 -0600 Message-ID: Subject: TCP Success Story (was Re: TCP_RACK, TCP_BBR, and firewalls) To: Michael Tuexen Cc: Alan Somers , FreeBSD Net Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spamd-Bar: -- X-Spamd-Result: default: False [-2.81 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-0.98)[-0.978]; NEURAL_HAM_SHORT(-0.93)[-0.934]; FORGED_SENDER(0.30)[asomers@freebsd.org,asomers@gmail.com]; R_SPF_ALLOW(-0.20)[+ip4:209.85.128.0/17:c]; MIME_GOOD(-0.10)[text/plain]; DMARC_POLICY_SOFTFAIL(0.10)[freebsd.org : SPF not aligned (relaxed), No valid DKIM,none]; ASN(0.00)[asn:15169, ipnet:209.85.128.0/17, country:US]; ARC_NA(0.00)[]; RCVD_COUNT_ONE(0.00)[1]; MISSING_XM_UA(0.00)[]; FREEMAIL_ENVFROM(0.00)[gmail.com]; FREEFALL_USER(0.00)[asomers]; MIME_TRACE(0.00)[0:+]; R_DKIM_NA(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[209.85.221.180:from]; FROM_NEQ_ENVFROM(0.00)[asomers@freebsd.org,asomers@gmail.com]; FROM_HAS_DN(0.00)[]; RWL_MAILSPIKE_POSSIBLE(0.00)[209.85.221.180:from]; TO_MATCH_ENVRCPT_ALL(0.00)[]; RCVD_TLS_LAST(0.00)[]; TO_DN_ALL(0.00)[]; MLMMJ_DEST(0.00)[freebsd-net@freebsd.org]; RCPT_COUNT_THREE(0.00)[3] X-Rspamd-Queue-Id: 4WPRd850tkz4ctb On Sat, Jul 13, 2024 at 1:50=E2=80=AFAM wrote: > > > On 13. Jul 2024, at 01:43, Alan Somers wrote: > > > > I've been experimenting with RACK and BBR. In my environment, they > > can dramatically improve single-stream TCP performance, which is > > awesome. But pf interferes. I have to disable pf in order for them > > to work at all. > > > > Is this a known limitation? If not, I will experiment some more to > > determine exactly what aspect of my pf configuration is responsible. > > If so, can anybody suggest what changes would have to happen to make > > the two compatible? > A problem with same symptoms was already reported and fixed in > https://reviews.freebsd.org/D43769 > > Which version are you using? > > Best regards > Michael > > > > -Alan TLDR; tcp_rack is good, cc_chd is better, and tcp_bbr is best I want to follow up with the list to post my conclusions. Firstly tuexen@ helped me solve my problem: in FreeBSD 14.0 there is a 3-way incompatibility between (tcp_bbr || tcp_rack) && lro && pf. I can confirm that tcp_bbr works for me if I either disable LRO, disable PF, or switch to a 14.1 server. Here's the real problem: on multiple production servers, downloading large files (or ZFS send/recv streams) was slow. After ruling out many possible causes, wireshark revealed that the connection was suffering about 0.05% packet loss. I don't know the source of that packet loss, but I don't believe it to be congestion-related. Along with a 54ms RTT, that's a fatal combination for the throughput of loss-based congestion control algorithms. According to the Mathis Formula [1], I could only expect 1.1 MBps over such a connection. That's actually worse than what I saw. With default settings (cc_cubic), I averaged 5.6 MBps. Probably Mathis's assumptions are outdated, but that's still pretty close for such a simple formula that's 27 years old. So I benchmarked all available congestion control algorithms for single download streams. The results are summarized in the table below. Algo Packet Loss Rate Average Throughput vegas 0.05% 2.0 MBps newreno 0.05% 3.2 MBps cubic 0.05% 5.6 MBps hd 0.05% 8.6 MBps cdg 0.05% 13.5 MBps rack 0.04% 14 MBps htcp 0.05% 15 MBps dctcp 0.05% 15 MBps chd 0.05% 17.3 MBps bbr 0.05% 29.2 MBps cubic 10% 159 kBps chd 10% 208 kBps bbr 10% 5.7 MBps RACK seemed to achieve about the same maximum bandwidth as BBR, though it took a lot longer to get there. Also, with RACK, wireshark reported about 10x as many retransmissions as dropped packets, which is suspicious. At one point, something went haywire and packet loss briefly spiked to the neighborhood of 10%. I took advantage of the chaos to repeat my measurements. As the table shows, all algorithms sucked under those conditions, but BBR sucked impressively less than the others. Disclaimer: there was significant run-to-run variation; the presented results are averages. And I did not attempt to measure packet loss exactly for most runs; 0.05% is merely an average of a few selected runs. These measurements were taken on a production server running a real workload, which introduces noise. Soon I hope to have the opportunity to repeat the experiment on an idle server in the same environment. In conclusion, while we'd like to use BBR, we really can't until we upgrade to 14.1, which hopefully will be soon. So in the meantime we've switched all relevant servers from cubic to chd, and we'll reevaluate BBR after the upgrade. [1]: https://www.slac.stanford.edu/comp/net/wan-mon/thru-vs-loss.html -Alan