From nobody Wed Oct 19 23:42:21 2022 X-Original-To: dev-commits-src-branches@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Mt6ht0PCsz4g4rV; Wed, 19 Oct 2022 23:42:22 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Mt6hs73Rgz3b65; Wed, 19 Oct 2022 23:42:21 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1666222942; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=fjpVYcpQyXZTd7weXJAghTOsZDAGbconoRwbD/sY/c8=; b=Wipj0ct+Kv1w8KJA0+zWp44H0IZcPGlYODseg58zj8iUXsvl6wF0dh3BdY9Lh5RvZeOZux 7XSBBVgnPRJ/3cKYQvfOIf5sxufqUh/VEYC6duwzZnKa3aHXAyiHgDkhlr/5fHxrqICQ2A zBE92GV4HjPwh28WMHQ22NAG4ZiIy/118KsP1zKoWQoiJ3qZP9izG1xFhG7rMfacbLAWgY 1tEnQ50ptZYKmKeYcdxRngVfqEBKPpVDs4bNYjoGiP4co6DEE3df93yFnGMVa040N5YRYt lO7ZEvXjS53ZKov4huKWe0yZY3m1ZuZMR3AqrbFvCOljq+TB5uaAYOXE+k04kA== Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4Mt6hs673pzhLV; Wed, 19 Oct 2022 23:42:21 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.16.1/8.16.1) with ESMTP id 29JNgLKI045331; Wed, 19 Oct 2022 23:42:21 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.16.1/8.16.1/Submit) id 29JNgLK5045330; Wed, 19 Oct 2022 23:42:21 GMT (envelope-from git) Date: Wed, 19 Oct 2022 23:42:21 GMT Message-Id: <202210192342.29JNgLK5045330@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-branches@FreeBSD.org From: Eric Joyner Subject: git: baa97013121a - stable/13 - iflib: Introduce v2 of TX Queue Select Functionality List-Id: Commits to the stable branches of the FreeBSD src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-branches List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-branches@freebsd.org X-BeenThere: dev-commits-src-branches@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: erj X-Git-Repository: src X-Git-Refname: refs/heads/stable/13 X-Git-Reftype: branch X-Git-Commit: baa97013121a915057ee54dfcb2cb87e541f7d7f Auto-Submitted: auto-generated ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1666222942; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=fjpVYcpQyXZTd7weXJAghTOsZDAGbconoRwbD/sY/c8=; b=p2pACFYfhvkUHuyYN+fd8b+HruksrEsSIzH/rZorcst6meErw/p+bA2jBglVCTWj10S5jx TGf6nZujBh/0SY0NKsyrGtkCIRQfhyZCrI5BjfoKMdxr9Lueq3Gu9NgkyQ1M55SiVQsFTj amrGSm5xDeVnSuVtp37ByOW58gFaAVQOPFJXPxqE3l5RTW8We+gRAQXS874M5fxMBluxnI GZEjh9IFd2DN1hySeO7cf+wVxPKiA5sLC3YdflZKM/Wis2AIZ56EbIfBcCvLSpObacNhOG nVoXroUIbeS8/Zqyih9OFrPSg1C7o1VHDClOkhl9LkxszzFuXOjnT+SF4w+L+w== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1666222942; a=rsa-sha256; cv=none; b=EtsN9pP76VZxIyquipFwLBI3ce0TBEwMPVzomh+OXXnboEFdAnuaQFXTMQBwzchcfqOjS3 G5j8unFkE/2JSCRNSt4HmDQm5N/YBqoPaD7c5bgNrxtSIPB/OjwCM0Y9nwRUuv3gsyp+tE 3L7IVCtU51J+tqqGJ4tbZjqmhKitrrK2CDEtUrZbfNlYD/i9lV9jXUV4rf+FeGJ8C7NQjt VSBbwKLK095n0mU2Bx3dc5nqP6rjfAE6mQtMMGJy9eCXN3k1+qcd/es6ol9OCz49n7OSzd +m18Yjpx0ElBTUp/WW2NN+rAC9kW838EINti7H/+MbIE3AKSAImMOIqkFRlMTQ== ARC-Authentication-Results: i=1; mx1.freebsd.org; none X-ThisMailContainsUnwantedMimeParts: N The branch stable/13 has been updated by erj: URL: https://cgit.FreeBSD.org/src/commit/?id=baa97013121a915057ee54dfcb2cb87e541f7d7f commit baa97013121a915057ee54dfcb2cb87e541f7d7f Author: Eric Joyner AuthorDate: 2022-10-17 21:52:20 +0000 Commit: Eric Joyner CommitDate: 2022-10-19 23:38:09 +0000 iflib: Introduce v2 of TX Queue Select Functionality For v2, iflib will parse packet headers before queueing a packet. This commit also adds a new field in the structure that holds parsed header information from packets; it stores the IP ToS/traffic class field found in the IPv4/IPv6 header. To help, it will only partially parse header packets before queueing them by using a new header parsing function that does less than the current parsing header function; for our purposes we only need up to the minimal IP header in order to get the IP ToS infromation and don't need to pull up more data. For now, v1 and v2 co-exist in this patch; v1 still offers a less-invasive method where none of the packet is parsed in iflib before queueing. This also bumps the sys/param.h version. Signed-off-by: Eric Joyner Tested by: IntelNetworking Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D34742 (cherry picked from commit 9c950139051298831ce19d01ea5fb33ec6ea7f89) --- sys/net/iflib.c | 196 +++++++++++++++++++++++++++++++++++++++++++++++++------- sys/net/iflib.h | 12 +++- sys/sys/param.h | 2 +- 3 files changed, 185 insertions(+), 25 deletions(-) diff --git a/sys/net/iflib.c b/sys/net/iflib.c index eeeadd9c3f48..39072eedc0bb 100644 --- a/sys/net/iflib.c +++ b/sys/net/iflib.c @@ -210,6 +210,7 @@ struct iflib_ctx { #define isc_rxd_flush ifc_txrx.ift_rxd_flush #define isc_legacy_intr ifc_txrx.ift_legacy_intr #define isc_txq_select ifc_txrx.ift_txq_select +#define isc_txq_select_v2 ifc_txrx.ift_txq_select_v2 eventhandler_tag ifc_vlan_attach_event; eventhandler_tag ifc_vlan_detach_event; struct ether_addr ifc_mac; @@ -3163,32 +3164,24 @@ print_pkt(if_pkt_info_t pi) #define IS_TSO6(pi) ((pi)->ipi_csum_flags & CSUM_IP6_TSO) #define IS_TX_OFFLOAD6(pi) ((pi)->ipi_csum_flags & (CSUM_IP6_TCP | CSUM_IP6_TSO)) +/** + * Parses out ethernet header information in the given mbuf. + * Returns in pi: ipi_etype (EtherType) and ipi_ehdrlen (Ethernet header length) + * + * This will account for the VLAN header if present. + * + * XXX: This doesn't handle QinQ, which could prevent TX offloads for those + * types of packets. + */ static int -iflib_parse_header(iflib_txq_t txq, if_pkt_info_t pi, struct mbuf **mp) +iflib_parse_ether_header(if_pkt_info_t pi, struct mbuf **mp, uint64_t *pullups) { - if_shared_ctx_t sctx = txq->ift_ctx->ifc_sctx; struct ether_vlan_header *eh; struct mbuf *m; m = *mp; - if ((sctx->isc_flags & IFLIB_NEED_SCRATCH) && - M_WRITABLE(m) == 0) { - if ((m = m_dup(m, M_NOWAIT)) == NULL) { - return (ENOMEM); - } else { - m_freem(*mp); - DBG_COUNTER_INC(tx_frees); - *mp = m; - } - } - - /* - * Determine where frame payload starts. - * Jump over vlan headers if already present, - * helpful for QinQ too. - */ if (__predict_false(m->m_len < sizeof(*eh))) { - txq->ift_pullups++; + (*pullups)++; if (__predict_false((m = m_pullup(m, sizeof(*eh))) == NULL)) return (ENOMEM); } @@ -3200,6 +3193,143 @@ iflib_parse_header(iflib_txq_t txq, if_pkt_info_t pi, struct mbuf **mp) pi->ipi_etype = ntohs(eh->evl_encap_proto); pi->ipi_ehdrlen = ETHER_HDR_LEN; } + *mp = m; + + return (0); +} + +/** + * Parse up to the L3 header and extract IPv4/IPv6 header information into pi. + * Currently this information includes: IP ToS value, IP header version/presence + * + * This is missing some checks and doesn't edit the packet content as it goes, + * unlike iflib_parse_header(), in order to keep the amount of code here minimal. + */ +static int +iflib_parse_header_partial(if_pkt_info_t pi, struct mbuf **mp, uint64_t *pullups) +{ + struct mbuf *m; + int err; + + *pullups = 0; + m = *mp; + if (!M_WRITABLE(m)) { + if ((m = m_dup(m, M_NOWAIT)) == NULL) { + return (ENOMEM); + } else { + m_freem(*mp); + DBG_COUNTER_INC(tx_frees); + *mp = m; + } + } + + /* Fills out pi->ipi_etype */ + err = iflib_parse_ether_header(pi, mp, pullups); + if (err) + return (err); + m = *mp; + + switch (pi->ipi_etype) { +#ifdef INET + case ETHERTYPE_IP: + { + struct mbuf *n; + struct ip *ip = NULL; + int miniplen; + + miniplen = min(m->m_pkthdr.len, pi->ipi_ehdrlen + sizeof(*ip)); + if (__predict_false(m->m_len < miniplen)) { + /* + * Check for common case where the first mbuf only contains + * the Ethernet header + */ + if (m->m_len == pi->ipi_ehdrlen) { + n = m->m_next; + MPASS(n); + /* If next mbuf contains at least the minimal IP header, then stop */ + if (n->m_len >= sizeof(*ip)) { + ip = (struct ip *)n->m_data; + } else { + (*pullups)++; + if (__predict_false((m = m_pullup(m, miniplen)) == NULL)) + return (ENOMEM); + ip = (struct ip *)(m->m_data + pi->ipi_ehdrlen); + } + } else { + (*pullups)++; + if (__predict_false((m = m_pullup(m, miniplen)) == NULL)) + return (ENOMEM); + ip = (struct ip *)(m->m_data + pi->ipi_ehdrlen); + } + } else { + ip = (struct ip *)(m->m_data + pi->ipi_ehdrlen); + } + + /* Have the IPv4 header w/ no options here */ + pi->ipi_ip_hlen = ip->ip_hl << 2; + pi->ipi_ipproto = ip->ip_p; + pi->ipi_ip_tos = ip->ip_tos; + pi->ipi_flags |= IPI_TX_IPV4; + + break; + } +#endif +#ifdef INET6 + case ETHERTYPE_IPV6: + { + struct ip6_hdr *ip6; + + if (__predict_false(m->m_len < pi->ipi_ehdrlen + sizeof(struct ip6_hdr))) { + (*pullups)++; + if (__predict_false((m = m_pullup(m, pi->ipi_ehdrlen + sizeof(struct ip6_hdr))) == NULL)) + return (ENOMEM); + } + ip6 = (struct ip6_hdr *)(m->m_data + pi->ipi_ehdrlen); + + /* Have the IPv6 fixed header here */ + pi->ipi_ip_hlen = sizeof(struct ip6_hdr); + pi->ipi_ipproto = ip6->ip6_nxt; + pi->ipi_ip_tos = IPV6_TRAFFIC_CLASS(ip6); + pi->ipi_flags |= IPI_TX_IPV6; + + break; + } +#endif + default: + pi->ipi_csum_flags &= ~CSUM_OFFLOAD; + pi->ipi_ip_hlen = 0; + break; + } + *mp = m; + + return (0); + +} + +static int +iflib_parse_header(iflib_txq_t txq, if_pkt_info_t pi, struct mbuf **mp) +{ + if_shared_ctx_t sctx = txq->ift_ctx->ifc_sctx; + struct mbuf *m; + int err; + + m = *mp; + if ((sctx->isc_flags & IFLIB_NEED_SCRATCH) && + M_WRITABLE(m) == 0) { + if ((m = m_dup(m, M_NOWAIT)) == NULL) { + return (ENOMEM); + } else { + m_freem(*mp); + DBG_COUNTER_INC(tx_frees); + *mp = m; + } + } + + /* Fills out pi->ipi_etype */ + err = iflib_parse_ether_header(pi, mp, &txq->ift_pullups); + if (__predict_false(err)) + return (err); + m = *mp; switch (pi->ipi_etype) { #ifdef INET @@ -3244,6 +3374,7 @@ iflib_parse_header(iflib_txq_t txq, if_pkt_info_t pi, struct mbuf **mp) } pi->ipi_ip_hlen = ip->ip_hl << 2; pi->ipi_ipproto = ip->ip_p; + pi->ipi_ip_tos = ip->ip_tos; pi->ipi_flags |= IPI_TX_IPV4; /* TCP checksum offload may require TCP header length */ @@ -3297,6 +3428,7 @@ iflib_parse_header(iflib_txq_t txq, if_pkt_info_t pi, struct mbuf **mp) /* XXX-BZ this will go badly in case of ext hdrs. */ pi->ipi_ipproto = ip6->ip6_nxt; + pi->ipi_ip_tos = IPV6_TRAFFIC_CLASS(ip6); pi->ipi_flags |= IPI_TX_IPV6; /* TCP checksum offload may require TCP header length */ @@ -4112,11 +4244,10 @@ iflib_if_init(void *arg) static int iflib_if_transmit(if_t ifp, struct mbuf *m) { - if_ctx_t ctx = if_getsoftc(ifp); - + if_ctx_t ctx = if_getsoftc(ifp); iflib_txq_t txq; int err, qidx; - int abdicate = ctx->ifc_sysctl_tx_abdicate; + int abdicate; if (__predict_false((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0 || !LINK_ACTIVE(ctx))) { DBG_COUNTER_INC(tx_frees); @@ -4128,7 +4259,24 @@ iflib_if_transmit(if_t ifp, struct mbuf *m) /* ALTQ-enabled interfaces always use queue 0. */ qidx = 0; /* Use driver-supplied queue selection method if it exists */ - if (ctx->isc_txq_select) + if (ctx->isc_txq_select_v2) { + struct if_pkt_info pi; + uint64_t early_pullups = 0; + pkt_info_zero(&pi); + + err = iflib_parse_header_partial(&pi, &m, &early_pullups); + if (__predict_false(err != 0)) { + /* Assign pullups for bad pkts to default queue */ + ctx->ifc_txqs[0].ift_pullups += early_pullups; + DBG_COUNTER_INC(encap_txd_encap_fail); + return (err); + } + /* Let driver make queueing decision */ + qidx = ctx->isc_txq_select_v2(ctx->ifc_softc, m, &pi); + ctx->ifc_txqs[qidx].ift_pullups += early_pullups; + } + /* Backwards compatibility w/ simpler queue select */ + else if (ctx->isc_txq_select) qidx = ctx->isc_txq_select(ctx->ifc_softc, m); /* If not, use iflib's standard method */ else if ((NTXQSETS(ctx) > 1) && M_HASHTYPE_GET(m) && !ALTQ_IS_ENABLED(&ifp->if_snd)) @@ -4173,6 +4321,8 @@ iflib_if_transmit(if_t ifp, struct mbuf *m) } #endif DBG_COUNTER_INC(tx_seen); + abdicate = ctx->ifc_sysctl_tx_abdicate; + err = ifmp_ring_enqueue(txq->ift_br, (void **)&m, 1, TX_BATCH_SIZE, abdicate); if (abdicate) diff --git a/sys/net/iflib.h b/sys/net/iflib.h index 42cab766a2fa..3f6b98347ca3 100644 --- a/sys/net/iflib.h +++ b/sys/net/iflib.h @@ -131,7 +131,9 @@ typedef struct if_pkt_info { uint8_t ipi_mflags; /* packet mbuf flags */ uint32_t ipi_tcp_seq; /* tcp seqno */ - uint32_t __spare0__; + uint8_t ipi_ip_tos; /* IP ToS field data */ + uint8_t __spare0__; + uint16_t __spare1__; } *if_pkt_info_t; typedef struct if_irq { @@ -188,6 +190,7 @@ typedef struct if_txrx { void (*ift_rxd_flush) (void *, uint16_t qsidx, uint8_t flidx, qidx_t pidx); int (*ift_legacy_intr) (void *); qidx_t (*ift_txq_select) (void *, struct mbuf *); + qidx_t (*ift_txq_select_v2) (void *, struct mbuf *, if_pkt_info_t); } *if_txrx_t; typedef struct if_softc_ctx { @@ -406,6 +409,13 @@ typedef enum { * as ift_txq_select in struct if_txrx */ #define IFLIB_FEATURE_QUEUE_SELECT 1300527 +/* + * Driver can set its own TX queue selection function + * as ift_txq_select_v2 in struct if_txrx. This includes + * having iflib send L3+ extra header information to the + * function. + */ +#define IFLIB_FEATURE_QUEUE_SELECT_V2 1301509 /* * These enum values are used in iflib_needs_restart to indicate to iflib diff --git a/sys/sys/param.h b/sys/sys/param.h index 5a468436d903..e417fed4149b 100644 --- a/sys/sys/param.h +++ b/sys/sys/param.h @@ -60,7 +60,7 @@ * in the range 5 to 9. */ #undef __FreeBSD_version -#define __FreeBSD_version 1301508 /* Master, propagated to newvers */ +#define __FreeBSD_version 1301509 /* Master, propagated to newvers */ /* * __FreeBSD_kernel__ indicates that this system uses the kernel of FreeBSD,