From owner-svn-src-all@freebsd.org Tue Sep 3 14:08:21 2019 Return-Path: Delivered-To: svn-src-all@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 48131DDB32; Tue, 3 Sep 2019 14:07:27 +0000 (UTC) (envelope-from yuripv@freebsd.org) Received: from freefall.freebsd.org (freefall.freebsd.org [96.47.72.132]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "freefall.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 46N80Z38GGz4QcD; Tue, 3 Sep 2019 14:07:26 +0000 (UTC) (envelope-from yuripv@freebsd.org) Received: by freefall.freebsd.org (Postfix, from userid 1452) id 9D2D11B634; Tue, 3 Sep 2019 14:06:39 +0000 (UTC) X-Original-To: yuripv@localmail.freebsd.org Delivered-To: yuripv@localmail.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (Client CN "mx1.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by freefall.freebsd.org (Postfix) with ESMTPS id 59CCD17808; Wed, 24 Apr 2019 13:32:08 +0000 (UTC) (envelope-from owner-src-committers@freebsd.org) Received: from freefall.freebsd.org (freefall.freebsd.org [96.47.72.132]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "freefall.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 287B38C921; Wed, 24 Apr 2019 13:32:08 +0000 (UTC) (envelope-from owner-src-committers@freebsd.org) Received: by freefall.freebsd.org (Postfix, from userid 538) id 1959717807; Wed, 24 Apr 2019 13:32:08 +0000 (UTC) Delivered-To: src-committers@localmail.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (Client CN "mx1.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by freefall.freebsd.org (Postfix) with ESMTPS id ADBE617801 for ; Wed, 24 Apr 2019 13:32:05 +0000 (UTC) (envelope-from gallatin@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 782178C911; Wed, 24 Apr 2019 13:32:05 +0000 (UTC) (envelope-from gallatin@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 5381ACDA6; Wed, 24 Apr 2019 13:32:05 +0000 (UTC) (envelope-from gallatin@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id x3ODW5dq047348; Wed, 24 Apr 2019 13:32:05 GMT (envelope-from gallatin@FreeBSD.org) Received: (from gallatin@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id x3ODW5LJ047347; Wed, 24 Apr 2019 13:32:05 GMT (envelope-from gallatin@FreeBSD.org) Message-Id: <201904241332.x3ODW5LJ047347@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: gallatin set sender to gallatin@FreeBSD.org using -f From: Andrew Gallatin To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: svn commit: r346632 - head/sys/net X-SVN-Group: head X-SVN-Commit-Author: gallatin X-SVN-Commit-Paths: head/sys/net X-SVN-Commit-Revision: 346632 X-SVN-Commit-Repository: base MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk X-Loop: FreeBSD.org Sender: owner-src-committers@freebsd.org X-Rspamd-Queue-Id: 287B38C921 X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [-2.96 / 15.00]; local_wl_from(0.00)[freebsd.org]; NEURAL_HAM_MEDIUM(-1.00)[-0.999,0]; NEURAL_HAM_SHORT(-0.96)[-0.963,0]; ASN(0.00)[asn:11403, ipnet:96.47.64.0/20, country:US]; NEURAL_HAM_LONG(-1.00)[-1.000,0] Status: O X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.29 List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Date: Tue, 03 Sep 2019 14:08:21 -0000 X-Original-Date: Wed, 24 Apr 2019 13:32:05 +0000 (UTC) X-List-Received-Date: Tue, 03 Sep 2019 14:08:21 -0000 Author: gallatin Date: Wed Apr 24 13:32:04 2019 New Revision: 346632 URL: https://svnweb.freebsd.org/changeset/base/346632 Log: iflib: Add pfil hooks As with mlx5en, the idea is to drop unwanted traffic as early in receive as possible, before mbufs are allocated and anything is passed up the stack. This can save considerable CPU time when a machine is under a flooding style DOS attack. The major change here is to remove the unneeded abstraction where callers of rxd_frag_to_sd() get back a pointer to the mbuf ring, and are responsible for NULL'ing that mbuf themselves. Now this happens directly in rxd_frag_to_sd(), and it returns an mbuf. This allows us to use the decision (and potentially mbuf) returned by the pfil hooks. The driver can now recycle mbufs to avoid re-allocation when packets are dropped. Reviewed by: marius (shurd and erj also provided feedback) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D19645 Modified: head/sys/net/iflib.c Modified: head/sys/net/iflib.c ============================================================================== --- head/sys/net/iflib.c Wed Apr 24 13:15:56 2019 (r346631) +++ head/sys/net/iflib.c Wed Apr 24 13:32:04 2019 (r346632) @@ -59,6 +59,7 @@ __FBSDID("$FreeBSD$"); #include #include #include +#include #include #include @@ -432,6 +433,7 @@ struct iflib_rxq { if_ctx_t ifr_ctx; iflib_fl_t ifr_fl; uint64_t ifr_rx_irq; + struct pfil_head *pfil; uint16_t ifr_id; uint8_t ifr_lro_enabled; uint8_t ifr_nfl; @@ -451,7 +453,6 @@ struct iflib_rxq { typedef struct if_rxsd { caddr_t *ifsd_cl; - struct mbuf **ifsd_m; iflib_fl_t ifsd_fl; qidx_t ifsd_cidx; } *if_rxsd_t; @@ -652,7 +653,6 @@ static int iflib_fast_intrs; static int iflib_rx_unavail; static int iflib_rx_ctx_inactive; static int iflib_rx_if_input; -static int iflib_rx_mbuf_null; static int iflib_rxd_flush; static int iflib_verbose_debug; @@ -669,8 +669,6 @@ SYSCTL_INT(_net_iflib, OID_AUTO, rx_ctx_inactive, CTLF &iflib_rx_ctx_inactive, 0, "# times rxeof called with inactive context"); SYSCTL_INT(_net_iflib, OID_AUTO, rx_if_input, CTLFLAG_RD, &iflib_rx_if_input, 0, "# times rxeof called if_input"); -SYSCTL_INT(_net_iflib, OID_AUTO, rx_mbuf_null, CTLFLAG_RD, - &iflib_rx_mbuf_null, 0, "# times rxeof got null mbuf"); SYSCTL_INT(_net_iflib, OID_AUTO, rxd_flush, CTLFLAG_RD, &iflib_rxd_flush, 0, "# times rxd_flush called"); SYSCTL_INT(_net_iflib, OID_AUTO, verbose_debug, CTLFLAG_RW, @@ -689,7 +687,7 @@ iflib_debug_reset(void) iflib_task_fn_rxs = iflib_rx_intr_enables = iflib_fast_intrs = iflib_rx_unavail = iflib_rx_ctx_inactive = iflib_rx_if_input = - iflib_rx_mbuf_null = iflib_rxd_flush = 0; + iflib_rxd_flush = 0; } #else @@ -2002,11 +2000,12 @@ _iflib_fl_refill(if_ctx_t ctx, iflib_fl_t fl, int coun bus_dmamap_sync(fl->ifl_buf_tag, sd_map[frag_idx], BUS_DMASYNC_PREREAD); - MPASS(sd_m[frag_idx] == NULL); - if ((m = m_gethdr(M_NOWAIT, MT_NOINIT)) == NULL) { - break; + if (sd_m[frag_idx] == NULL) { + if ((m = m_gethdr(M_NOWAIT, MT_NOINIT)) == NULL) { + break; + } + sd_m[frag_idx] = m; } - sd_m[frag_idx] = m; bit_set(fl->ifl_rx_bitmap, frag_idx); #if MEMORY_LOGGING fl->ifl_m_enqueued++; @@ -2483,13 +2482,15 @@ prefetch_pkts(iflib_fl_t fl, int cidx) prefetch(fl->ifl_sds.ifsd_cl[(cidx + 4) & (nrxd-1)]); } -static void -rxd_frag_to_sd(iflib_rxq_t rxq, if_rxd_frag_t irf, int unload, if_rxsd_t sd) +static struct mbuf * +rxd_frag_to_sd(iflib_rxq_t rxq, if_rxd_frag_t irf, bool unload, if_rxsd_t sd, + int *pf_rv, if_rxd_info_t ri) { - int flid, cidx; bus_dmamap_t map; iflib_fl_t fl; - int next; + caddr_t payload; + struct mbuf *m; + int flid, cidx, len, next; map = NULL; flid = irf->irf_flid; @@ -2497,7 +2498,7 @@ rxd_frag_to_sd(iflib_rxq_t rxq, if_rxd_frag_t irf, int fl = &rxq->ifr_fl[flid]; sd->ifsd_fl = fl; sd->ifsd_cidx = cidx; - sd->ifsd_m = &fl->ifl_sds.ifsd_m[cidx]; + m = fl->ifl_sds.ifsd_m[cidx]; sd->ifsd_cl = &fl->ifl_sds.ifsd_cl[cidx]; fl->ifl_credits--; #if MEMORY_LOGGING @@ -2513,39 +2514,89 @@ rxd_frag_to_sd(iflib_rxq_t rxq, if_rxd_frag_t irf, int /* not valid assert if bxe really does SGE from non-contiguous elements */ MPASS(fl->ifl_cidx == cidx); bus_dmamap_sync(fl->ifl_buf_tag, map, BUS_DMASYNC_POSTREAD); + + if (rxq->pfil != NULL && PFIL_HOOKED_IN(rxq->pfil) && pf_rv != NULL) { + payload = *sd->ifsd_cl; + payload += ri->iri_pad; + len = ri->iri_len - ri->iri_pad; + *pf_rv = pfil_run_hooks(rxq->pfil, payload, ri->iri_ifp, + len | PFIL_MEMPTR | PFIL_IN, NULL); + switch (*pf_rv) { + case PFIL_DROPPED: + case PFIL_CONSUMED: + /* + * The filter ate it. Everything is recycled. + */ + m = NULL; + unload = 0; + break; + case PFIL_REALLOCED: + /* + * The filter copied it. Everything is recycled. + */ + m = pfil_mem2mbuf(payload); + unload = 0; + break; + case PFIL_PASS: + /* + * Filter said it was OK, so receive like + * normal + */ + fl->ifl_sds.ifsd_m[cidx] = NULL; + break; + default: + MPASS(0); + } + } else { + fl->ifl_sds.ifsd_m[cidx] = NULL; + *pf_rv = PFIL_PASS; + } + if (unload) bus_dmamap_unload(fl->ifl_buf_tag, map); fl->ifl_cidx = (fl->ifl_cidx + 1) & (fl->ifl_size-1); if (__predict_false(fl->ifl_cidx == 0)) fl->ifl_gen = 0; bit_clear(fl->ifl_rx_bitmap, cidx); + return (m); } static struct mbuf * -assemble_segments(iflib_rxq_t rxq, if_rxd_info_t ri, if_rxsd_t sd) +assemble_segments(iflib_rxq_t rxq, if_rxd_info_t ri, if_rxsd_t sd, int *pf_rv) { - int i, padlen , flags; struct mbuf *m, *mh, *mt; caddr_t cl; + int *pf_rv_ptr, flags, i, padlen; + bool consumed; i = 0; mh = NULL; + consumed = false; + *pf_rv = PFIL_PASS; + pf_rv_ptr = pf_rv; do { - rxd_frag_to_sd(rxq, &ri->iri_frags[i], TRUE, sd); + m = rxd_frag_to_sd(rxq, &ri->iri_frags[i], !consumed, sd, + pf_rv_ptr, ri); MPASS(*sd->ifsd_cl != NULL); - MPASS(*sd->ifsd_m != NULL); - /* Don't include zero-length frags */ - if (ri->iri_frags[i].irf_len == 0) { + /* + * Exclude zero-length frags & frags from + * packets the filter has consumed or dropped + */ + if (ri->iri_frags[i].irf_len == 0 || consumed || + *pf_rv == PFIL_CONSUMED || *pf_rv == PFIL_DROPPED) { + if (mh == NULL) { + /* everything saved here */ + consumed = true; + pf_rv_ptr = NULL; + continue; + } /* XXX we can save the cluster here, but not the mbuf */ - m_init(*sd->ifsd_m, M_NOWAIT, MT_DATA, 0); - m_free(*sd->ifsd_m); - *sd->ifsd_m = NULL; + m_init(m, M_NOWAIT, MT_DATA, 0); + m_free(m); continue; } - m = *sd->ifsd_m; - *sd->ifsd_m = NULL; if (mh == NULL) { flags = M_PKTHDR|M_EXT; mh = mt = m; @@ -2582,22 +2633,28 @@ iflib_rxd_pkt_get(iflib_rxq_t rxq, if_rxd_info_t ri) { struct if_rxsd sd; struct mbuf *m; + int pf_rv; /* should I merge this back in now that the two paths are basically duplicated? */ if (ri->iri_nfrags == 1 && ri->iri_frags[0].irf_len <= MIN(IFLIB_RX_COPY_THRESH, MHLEN)) { - rxd_frag_to_sd(rxq, &ri->iri_frags[0], FALSE, &sd); - m = *sd.ifsd_m; - *sd.ifsd_m = NULL; - m_init(m, M_NOWAIT, MT_DATA, M_PKTHDR); + m = rxd_frag_to_sd(rxq, &ri->iri_frags[0], false, &sd, + &pf_rv, ri); + if (pf_rv != PFIL_PASS && pf_rv != PFIL_REALLOCED) + return (m); + if (pf_rv == PFIL_PASS) { + m_init(m, M_NOWAIT, MT_DATA, M_PKTHDR); #ifndef __NO_STRICT_ALIGNMENT - if (!IP_ALIGNED(m)) - m->m_data += 2; + if (!IP_ALIGNED(m)) + m->m_data += 2; #endif - memcpy(m->m_data, *sd.ifsd_cl, ri->iri_len); - m->m_len = ri->iri_frags[0].irf_len; - } else { - m = assemble_segments(rxq, ri, &sd); + memcpy(m->m_data, *sd.ifsd_cl, ri->iri_len); + m->m_len = ri->iri_frags[0].irf_len; + } + } else { + m = assemble_segments(rxq, ri, &sd, &pf_rv); + if (pf_rv != PFIL_PASS && pf_rv != PFIL_REALLOCED) + return (m); } m->m_pkthdr.len = ri->iri_len; m->m_pkthdr.rcvif = ri->iri_ifp; @@ -2694,6 +2751,8 @@ iflib_rxeof(iflib_rxq_t rxq, qidx_t budget) return (false); } + /* pfil needs the vnet to be set */ + CURVNET_SET_QUIET(ifp->if_vnet); for (budget_left = budget; budget_left > 0 && avail > 0;) { if (__predict_false(!CTX_ACTIVE(ctx))) { DBG_COUNTER_INC(rx_ctx_inactive); @@ -2711,6 +2770,8 @@ iflib_rxeof(iflib_rxq_t rxq, qidx_t budget) if (err) goto err; + rx_pkts += 1; + rx_bytes += ri.iri_len; if (sctx->isc_flags & IFLIB_HAS_RXCQ) { *cidxp = ri.iri_cidx; /* Update our consumer index */ @@ -2733,10 +2794,9 @@ iflib_rxeof(iflib_rxq_t rxq, qidx_t budget) if (avail == 0 && budget_left) avail = iflib_rxd_avail(ctx, rxq, *cidxp, budget_left); - if (__predict_false(m == NULL)) { - DBG_COUNTER_INC(rx_mbuf_null); + if (__predict_false(m == NULL)) continue; - } + /* imm_pkt: -- cxgb */ if (mh == NULL) mh = mt = m; @@ -2745,6 +2805,7 @@ iflib_rxeof(iflib_rxq_t rxq, qidx_t budget) mt = m; } } + CURVNET_RESTORE(); /* make sure that we can refill faster than drain */ for (i = 0, fl = &rxq->ifr_fl[0]; i < sctx->isc_nfl; i++, fl++) __iflib_fl_refill_lt(ctx, fl, budget + 8); @@ -4366,6 +4427,40 @@ iflib_reset_qvalues(if_ctx_t ctx) } } +static void +iflib_add_pfil(if_ctx_t ctx) +{ + struct pfil_head *pfil; + struct pfil_head_args pa; + iflib_rxq_t rxq; + int i; + + pa.pa_version = PFIL_VERSION; + pa.pa_flags = PFIL_IN; + pa.pa_type = PFIL_TYPE_ETHERNET; + pa.pa_headname = ctx->ifc_ifp->if_xname; + pfil = pfil_head_register(&pa); + + for (i = 0, rxq = ctx->ifc_rxqs; i < NRXQSETS(ctx); i++, rxq++) { + rxq->pfil = pfil; + } +} + +static void +iflib_rem_pfil(if_ctx_t ctx) +{ + struct pfil_head *pfil; + iflib_rxq_t rxq; + int i; + + rxq = ctx->ifc_rxqs; + pfil = rxq->pfil; + for (i = 0; i < NRXQSETS(ctx); i++, rxq++) { + rxq->pfil = NULL; + } + pfil_head_unregister(pfil); +} + int iflib_device_register(device_t dev, void *sc, if_shared_ctx_t sctx, if_ctx_t *ctxp) { @@ -4569,6 +4664,7 @@ iflib_device_register(device_t dev, void *sc, if_share if_setgetcounterfn(ctx->ifc_ifp, iflib_if_get_counter); iflib_add_device_sysctl_post(ctx); + iflib_add_pfil(ctx); ctx->ifc_flags |= IFC_INIT_DONE; CTX_UNLOCK(ctx); return (0); @@ -4903,6 +4999,7 @@ iflib_device_deregister(if_ctx_t ctx) iflib_netmap_detach(ifp); ether_ifdetach(ifp); + iflib_rem_pfil(ctx); if (ctx->ifc_led_dev != NULL) led_destroy(ctx->ifc_led_dev); /* XXX drain any dependent tasks */