Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 24 Apr 2026 22:30:36 +0000
From:      Colin Percival <cperciva@FreeBSD.org>
To:        src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-main@FreeBSD.org
Subject:   git: 0f7b8f79f67b - main - ena: Budget rx descriptors, not packets
Message-ID:  <69ebef0c.428ac.36a81d7c@gitrepo.freebsd.org>

index | next in thread | raw e-mail

The branch main has been updated by cperciva:

URL: https://cgit.FreeBSD.org/src/commit/?id=0f7b8f79f67b25cb0727c7b7d604eb1eec91fef1

commit 0f7b8f79f67b25cb0727c7b7d604eb1eec91fef1
Author:     Colin Percival <cperciva@FreeBSD.org>
AuthorDate: 2026-04-17 17:40:00 +0000
Commit:     Colin Percival <cperciva@FreeBSD.org>
CommitDate: 2026-04-24 22:30:13 +0000

    ena: Budget rx descriptors, not packets
    
    We had ENA_RX_BUDGET = 256 in order to allow up to 256 received
    packets to be processed before we do other cleanups (handling tx
    packets and, critically, refilling the rx buffer ring).  Since the
    ring holds 1024 buffers by default, this was fine for normal packets:
    We refill the ring when it falls below 7/8 full, and even with a large
    burst of incoming packets allowing it to fall by another 1/4 before we
    consider refilling the ring still leaves it at 7/8 - 1/4 = 5/8 full.
    
    With jumbos, the story is different: A 9k jumbo (as is used by default
    within the EC2 network) consumes 3 descriptors, so a single rx cleanup
    pass can consume 3/4 of the default-sized rx ring; if the rx buffer
    ring wasn't completely full before a packet burst arrives, this puts
    us perilously close to running out of rx buffers.
    
    This precise failure mode has been observed on some EC2 instance types
    within a Cluster Placement Group, resulting in the nominal 10 Gbps
    single-flow throughput between instances dropping to ~100 Mbps as a
    result of repeated rx overruns causing packet loss and ultimately
    retransmission timeouts.
    
    To correct this, switch from processing up to ENA_RX_BUDGET (256)
    packets to processing up to ENA_RX_DESC_BUDGET (256) descriptors (or
    slightly more, if we hit the limit in the middle of a packet).  This
    ensures that, even with jumbos, we refill the ring before processing
    most of a ring worth of descriptors, and returns the throughput to
    expected levels.
    
    Note that theoretically up to ENA_PKT_MAX_BUFS (19) descriptors can be
    used for a single packet, in which case even 54 packets would exhaust
    the default rx buffer ring; it's not clear if this ever occurs in
    practice, but this fix will address that case as well.
    
    Reviewed by:    akiyano
    Sponsored by:   Amazon
    MFC after:      6 days
    Differential Revision:  https://reviews.freebsd.org/D56479
---
 sys/dev/ena/ena.h          |  4 ++--
 sys/dev/ena/ena_datapath.c | 13 ++++++++++---
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/sys/dev/ena/ena.h b/sys/dev/ena/ena.h
index f67c7002327d..b2156437f847 100644
--- a/sys/dev/ena/ena.h
+++ b/sys/dev/ena/ena.h
@@ -99,8 +99,8 @@
  *  of TCP retransmissions.
  */
 #define ENA_TX_BUDGET	128
-/* RX cleanup budget. -1 stands for infinity. */
-#define ENA_RX_BUDGET	256
+/* RX cleanup budget, in descriptors. -1 stands for infinity. */
+#define ENA_RX_DESC_BUDGET	256
 /*
  * How many times we can repeat cleanup in the io irq handling routine if the
  * RX or TX budget was depleted.
diff --git a/sys/dev/ena/ena_datapath.c b/sys/dev/ena/ena_datapath.c
index 57148d8ef81f..91e3e3b6e4cd 100644
--- a/sys/dev/ena/ena_datapath.c
+++ b/sys/dev/ena/ena_datapath.c
@@ -571,7 +571,7 @@ ena_rx_cleanup(struct ena_ring *rx_ring)
 	uint32_t do_if_input = 0;
 	unsigned int qid;
 	int rc, i;
-	int budget = ENA_RX_BUDGET;
+	int budget = (ENA_RX_DESC_BUDGET == -1) ? INT_MAX : ENA_RX_DESC_BUDGET;
 #ifdef DEV_NETMAP
 	int done;
 #endif /* DEV_NETMAP */
@@ -680,7 +680,14 @@ ena_rx_cleanup(struct ena_ring *rx_ring)
 		counter_u64_add_protected(rx_ring->rx_stats.cnt, 1);
 		counter_u64_add_protected(adapter->hw_stats.rx_packets, 1);
 		counter_exit();
-	} while (--budget);
+
+		/*
+		 * Adjust our budget; note that we count descriptors, not
+		 * packets, since we need to ensure we don't run out of rx
+		 * buffers when receiving jumbos.
+		 */
+		budget -= ena_rx_ctx.descs;
+	} while (budget > 0);
 
 	rx_ring->next_to_clean = next_to_clean;
 
@@ -695,7 +702,7 @@ ena_rx_cleanup(struct ena_ring *rx_ring)
 
 	tcp_lro_flush_all(&rx_ring->lro);
 
-	return (budget == 0);
+	return (budget <= 0);
 }
 
 static void


home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?69ebef0c.428ac.36a81d7c>