From owner-svn-src-head@freebsd.org Tue Oct 27 21:53:33 2020 Return-Path: Delivered-To: svn-src-head@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id C2FB945B633; Tue, 27 Oct 2020 21:53:33 +0000 (UTC) (envelope-from vmaffione@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4CLQSY4ZtXz4GYg; Tue, 27 Oct 2020 21:53:33 +0000 (UTC) (envelope-from vmaffione@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 817D312604; Tue, 27 Oct 2020 21:53:33 +0000 (UTC) (envelope-from vmaffione@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id 09RLrXp8080555; Tue, 27 Oct 2020 21:53:33 GMT (envelope-from vmaffione@FreeBSD.org) Received: (from vmaffione@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id 09RLrXpr080554; Tue, 27 Oct 2020 21:53:33 GMT (envelope-from vmaffione@FreeBSD.org) Message-Id: <202010272153.09RLrXpr080554@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: vmaffione set sender to vmaffione@FreeBSD.org using -f From: Vincenzo Maffione Date: Tue, 27 Oct 2020 21:53:33 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: svn commit: r367093 - head/sys/net X-SVN-Group: head X-SVN-Commit-Author: vmaffione X-SVN-Commit-Paths: head/sys/net X-SVN-Commit-Revision: 367093 X-SVN-Commit-Repository: base MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Oct 2020 21:53:33 -0000 Author: vmaffione Date: Tue Oct 27 21:53:33 2020 New Revision: 367093 URL: https://svnweb.freebsd.org/changeset/base/367093 Log: iflib: add per-tx-queue netmap timer The way netmap TX is handled in iflib when TX interrupts are not used (IFC_NETMAP_TX_IRQ not set) has some issues: - The netmap_tx_irq() function gets called by iflib_timer(), which gets scheduled with tick granularity (hz). This is not frequent enough for 10Gbps NICs and beyond (e.g., ixgbe or ixl). The end result is that the transmitting netmap application is not woken up fast enough to saturate the link with small packets. - The iflib_timer() functions also calls isc_txd_credits_update() to ask for more TX completion updates. However, this violates the netmap requirement that only txsync can access the TX queue for datapath operations. Only netmap_tx_irq() may be called out of the txsync context. This change introduces per-tx-queue netmap timers, using microsecond granularity to ensure that netmap_tx_irq() can be called often enough to allow for maximum packet rate. The timer routine simply calls netmap_tx_irq() to wake up the netmap application. The latter will wake up and call txsync to collect TX completion updates. This change brings back line rate speed with small packets for ixgbe. For the time being, timer expiration is hardcoded to 90 microseconds, in order to avoid introducing a new sysctl. We may eventually implement an adaptive expiration period or use another deferred work mechanism in place of timers. Also, fix the timers usage to make sure that each queue is serviced by a different CPU. PR: 248652 Reported by: sg@efficientip.com MFC after: 2 weeks Modified: head/sys/net/iflib.c Modified: head/sys/net/iflib.c ============================================================================== --- head/sys/net/iflib.c Tue Oct 27 20:13:33 2020 (r367092) +++ head/sys/net/iflib.c Tue Oct 27 21:53:33 2020 (r367093) @@ -346,6 +346,9 @@ struct iflib_txq { qidx_t ift_size; uint16_t ift_id; struct callout ift_timer; +#ifdef DEV_NETMAP + struct callout ift_netmap_timer; +#endif /* DEV_NETMAP */ if_txsd_vec_t ift_sds; uint8_t ift_qstatus; @@ -753,6 +756,7 @@ iflib_num_tx_descs(if_ctx_t ctx) MODULE_DEPEND(iflib, netmap, 1, 1, 1); static int netmap_fl_refill(iflib_rxq_t rxq, struct netmap_kring *kring, bool init); +static void iflib_netmap_timer(void *arg); /* * device-specific sysctl variables: @@ -918,6 +922,8 @@ netmap_fl_refill(iflib_rxq_t rxq, struct netmap_kring return (0); } +#define NETMAP_TX_TIMER_US 90 + /* * Reconcile kernel and user view of the transmit ring. * @@ -1047,9 +1053,8 @@ iflib_netmap_txsync(struct netmap_kring *kring, int fl * Second part: reclaim buffers for completed transmissions. * * If there are unclaimed buffers, attempt to reclaim them. - * If none are reclaimed, and TX IRQs are not in use, do an initial - * minimal delay, then trigger the tx handler which will spin in the - * group task queue. + * If we don't manage to reclaim them all, and TX IRQs are not in use, + * trigger a per-tx-queue timer to try again later. */ if (kring->nr_hwtail != nm_prev(kring->nr_hwcur, lim)) { if (iflib_tx_credits_update(ctx, txq)) { @@ -1058,11 +1063,13 @@ iflib_netmap_txsync(struct netmap_kring *kring, int fl kring->nr_hwtail = nm_prev(netmap_idx_n2k(kring, nic_i), lim); } } + if (!(ctx->ifc_flags & IFC_NETMAP_TX_IRQ)) if (kring->nr_hwtail != nm_prev(kring->nr_hwcur, lim)) { - callout_reset_on(&txq->ift_timer, hz < 2000 ? 1 : hz / 1000, - iflib_timer, txq, txq->ift_timer.c_cpu); - } + callout_reset_sbt(&txq->ift_netmap_timer, + NETMAP_TX_TIMER_US * SBT_1US, SBT_1US, + iflib_netmap_timer, txq, txq->ift_netmap_timer.c_cpu); + } return (0); } @@ -1263,28 +1270,16 @@ iflib_netmap_rxq_init(if_ctx_t ctx, iflib_rxq_t rxq) } static void -iflib_netmap_timer_adjust(if_ctx_t ctx, iflib_txq_t txq, uint32_t *reset_on) +iflib_netmap_timer(void *arg) { - struct netmap_kring *kring; - uint16_t txqid; + iflib_txq_t txq = arg; + if_ctx_t ctx = txq->ift_ctx; - txqid = txq->ift_id; - kring = netmap_kring_on(NA(ctx->ifc_ifp), txqid, NR_TX); - if (kring == NULL) - return; - - if (kring->nr_hwcur != nm_next(kring->nr_hwtail, kring->nkr_num_slots - 1)) { - bus_dmamap_sync(txq->ift_ifdi->idi_tag, txq->ift_ifdi->idi_map, - BUS_DMASYNC_POSTREAD); - if (ctx->isc_txd_credits_update(ctx->ifc_softc, txqid, false)) - netmap_tx_irq(ctx->ifc_ifp, txqid); - if (!(ctx->ifc_flags & IFC_NETMAP_TX_IRQ)) { - if (hz < 2000) - *reset_on = 1; - else - *reset_on = hz / 1000; - } - } + /* + * Wake up the netmap application, to give it a chance to + * call txsync and reclaim more completed TX buffers. + */ + netmap_tx_irq(ctx->ifc_ifp, txq->ift_id); } #define iflib_netmap_detach(ifp) netmap_detach(ifp) @@ -1296,8 +1291,6 @@ iflib_netmap_timer_adjust(if_ctx_t ctx, iflib_txq_t tx #define iflib_netmap_attach(ctx) (0) #define netmap_rx_irq(ifp, qid, budget) (0) -#define netmap_tx_irq(ifp, qid) do {} while (0) -#define iflib_netmap_timer_adjust(ctx, txq, reset_on) #endif #if defined(__i386__) || defined(__amd64__) @@ -2287,7 +2280,6 @@ iflib_timer(void *arg) if_ctx_t ctx = txq->ift_ctx; if_softc_ctx_t sctx = &ctx->ifc_softc_ctx; uint64_t this_tick = ticks; - uint32_t reset_on = hz / 2; if (!(if_getdrvflags(ctx->ifc_ifp) & IFF_DRV_RUNNING)) return; @@ -2312,17 +2304,13 @@ iflib_timer(void *arg) } txq->ift_cleaned_prev = txq->ift_cleaned; } -#ifdef DEV_NETMAP - if (if_getcapenable(ctx->ifc_ifp) & IFCAP_NETMAP) - iflib_netmap_timer_adjust(ctx, txq, &reset_on); -#endif /* handle any laggards */ if (txq->ift_db_pending) GROUPTASK_ENQUEUE(&txq->ift_task); sctx->isc_pause_frames = 0; if (if_getdrvflags(ctx->ifc_ifp) & IFF_DRV_RUNNING) - callout_reset_on(&txq->ift_timer, reset_on, iflib_timer, txq, txq->ift_timer.c_cpu); + callout_reset_on(&txq->ift_timer, hz / 2, iflib_timer, txq, txq->ift_timer.c_cpu); return; hung: @@ -2396,6 +2384,9 @@ iflib_init_locked(if_ctx_t ctx) for (i = 0, txq = ctx->ifc_txqs; i < sctx->isc_ntxqsets; i++, txq++) { CALLOUT_LOCK(txq); callout_stop(&txq->ift_timer); +#ifdef DEV_NETMAP + callout_stop(&txq->ift_netmap_timer); +#endif /* DEV_NETMAP */ CALLOUT_UNLOCK(txq); iflib_netmap_txq_init(ctx, txq); } @@ -2485,6 +2476,9 @@ iflib_stop(if_ctx_t ctx) CALLOUT_LOCK(txq); callout_stop(&txq->ift_timer); +#ifdef DEV_NETMAP + callout_stop(&txq->ift_netmap_timer); +#endif /* DEV_NETMAP */ CALLOUT_UNLOCK(txq); /* clean any enqueued buffers */ @@ -3882,7 +3876,6 @@ _task_fn_admin(void *context) iflib_txq_t txq; int i; bool oactive, running, do_reset, do_watchdog, in_detach; - uint32_t reset_on = hz / 2; STATE_LOCK(ctx); running = (if_getdrvflags(ctx->ifc_ifp) & IFF_DRV_RUNNING); @@ -3910,12 +3903,8 @@ _task_fn_admin(void *context) } IFDI_UPDATE_ADMIN_STATUS(ctx); for (txq = ctx->ifc_txqs, i = 0; i < sctx->isc_ntxqsets; i++, txq++) { -#ifdef DEV_NETMAP - reset_on = hz / 2; - if (if_getcapenable(ctx->ifc_ifp) & IFCAP_NETMAP) - iflib_netmap_timer_adjust(ctx, txq, &reset_on); -#endif - callout_reset_on(&txq->ift_timer, reset_on, iflib_timer, txq, txq->ift_timer.c_cpu); + callout_reset_on(&txq->ift_timer, hz / 2, iflib_timer, txq, + txq->ift_timer.c_cpu); } IFDI_LINK_INTR_ENABLE(ctx); if (do_reset) @@ -5088,6 +5077,9 @@ iflib_pseudo_deregister(if_ctx_t ctx) tqg = qgroup_if_io_tqg; for (txq = ctx->ifc_txqs, i = 0; i < NTXQSETS(ctx); i++, txq++) { callout_drain(&txq->ift_timer); +#ifdef DEV_NETMAP + callout_drain(&txq->ift_netmap_timer); +#endif /* DEV_NETMAP */ if (txq->ift_task.gt_uniq != NULL) taskqgroup_detach(tqg, &txq->ift_task); } @@ -5174,6 +5166,9 @@ iflib_device_deregister(if_ctx_t ctx) tqg = qgroup_if_io_tqg; for (txq = ctx->ifc_txqs, i = 0; i < NTXQSETS(ctx); i++, txq++) { callout_drain(&txq->ift_timer); +#ifdef DEV_NETMAP + callout_drain(&txq->ift_netmap_timer); +#endif /* DEV_NETMAP */ if (txq->ift_task.gt_uniq != NULL) taskqgroup_detach(tqg, &txq->ift_task); } @@ -5583,8 +5578,6 @@ iflib_queues_alloc(if_ctx_t ctx) } else { txq->ift_br_offset = 0; } - /* XXX fix this */ - txq->ift_timer.c_cpu = cpu; if (iflib_txsd_alloc(txq)) { device_printf(dev, "Critical Failure setting up TX buffers\n"); @@ -5597,6 +5590,11 @@ iflib_queues_alloc(if_ctx_t ctx) device_get_nameunit(dev), txq->ift_id); mtx_init(&txq->ift_mtx, txq->ift_mtx_name, NULL, MTX_DEF); callout_init_mtx(&txq->ift_timer, &txq->ift_mtx, 0); + txq->ift_timer.c_cpu = cpu; +#ifdef DEV_NETMAP + callout_init_mtx(&txq->ift_netmap_timer, &txq->ift_mtx, 0); + txq->ift_netmap_timer.c_cpu = cpu; +#endif /* DEV_NETMAP */ err = ifmp_ring_alloc(&txq->ift_br, 2048, txq, iflib_txq_drain, iflib_txq_can_drain, M_IFLIB, M_WAITOK);