Date: Sat, 6 Jun 2015 09:28:41 +0000 (UTC) From: Navdeep Parhar <np@FreeBSD.org> To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-stable@freebsd.org, svn-src-stable-10@freebsd.org Subject: svn commit: r284052 - in stable/10/sys: arm/conf conf dev/cxgbe modules modules/cxgbe/if_cxgbe powerpc/conf Message-ID: <201506060928.t569SfjD020849@svn.freebsd.org>
next in thread | raw e-mail | index | archive | help
Author: np Date: Sat Jun 6 09:28:40 2015 New Revision: 284052 URL: https://svnweb.freebsd.org/changeset/base/284052 Log: MFC r276480, r276485, r276498, r277225, r277226, r277227, r277230, r277637, and r283149 (by emaste@). r276485 is the real change here, the rest deal with the fallout of mp_ring's reliance on 64b atomics. Use the incorrectly spelled 'eigth' from struct pkthdr in this branch instead of MFC'ing r261733, which would have renamed the field of a public structure in a -STABLE branch. --- r276480: Temporarily unplug cxgbe(4) from !amd64 builds. r276485: cxgbe(4): major tx rework. a) Front load as much work as possible in if_transmit, before any driver lock or software queue has to get involved. b) Replace buf_ring with a brand new mp_ring (multiproducer ring). This is specifically for the tx multiqueue model where one of the if_transmit producer threads becomes the consumer and other producers carry on as usual. mp_ring is implemented as standalone code and it should be possible to use it in any driver with tx multiqueue. It also has: - the ability to enqueue/dequeue multiple items. This might become significant if packet batching is ever implemented. - an abdication mechanism to allow a thread to give up writing tx descriptors and have another if_transmit thread take over. A thread that's writing tx descriptors can end up doing so for an unbounded time period if a) there are other if_transmit threads continuously feeding the sofware queue, and b) the chip keeps up with whatever the thread is throwing at it. - accurate statistics about interesting events even when the stats come at the expense of additional branches/conditional code. The NIC txq lock is uncontested on the fast path at this point. I've left it there for synchronization with the control events (interface up/down, modload/unload). c) Add support for "type 1" coalescing work request in the normal NIC tx path. This work request is optimized for frames with a single item in the DMA gather list. These are very common when forwarding packets. Note that netmap tx in cxgbe already uses these "type 1" work requests. d) Do not request automatic cidx updates every 32 descriptors. Instead, request updates via bits in individual work requests (still every 32 descriptors approximately). Also, request an automatic final update when the queue idles after activity. This means NIC tx reclaim is still performed lazily but it will catch up quickly as soon as the queue idles. This seems to be the best middle ground and I'll probably do something similar for netmap tx as well. e) Implement a faster tx path for WRQs (used by TOE tx and control queues, _not_ by the normal NIC tx). Allow work requests to be written directly to the hardware descriptor ring if room is available. I will convert t4_tom and iw_cxgbe modules to this faster style gradually. r276498: cxgbe(4): remove buf_ring specific restriction on the txq size. r277225: Make cxgbe(4) buildable with the gcc in base. r277226: Allow cxgbe(4) to be built on i386. Driver attach will succeed only on a subset of i386 systems. r277227: Plug cxgbe(4) back into !powerpc && !arm builds, instead of building it on amd64 only. r277230: Build cxgbe(4) on powerpc64 too. r277637: Make sure the compiler flag to get cxgbe(4) to compile with gcc is used only when gcc is being used. This is what r277225 should have been. Added: stable/10/sys/dev/cxgbe/t4_mp_ring.c - copied, changed from r276485, head/sys/dev/cxgbe/t4_mp_ring.c stable/10/sys/dev/cxgbe/t4_mp_ring.h - copied unchanged from r276485, head/sys/dev/cxgbe/t4_mp_ring.h Modified: stable/10/sys/arm/conf/NOTES stable/10/sys/conf/NOTES stable/10/sys/conf/files stable/10/sys/dev/cxgbe/adapter.h stable/10/sys/dev/cxgbe/t4_l2t.c stable/10/sys/dev/cxgbe/t4_main.c stable/10/sys/dev/cxgbe/t4_sge.c stable/10/sys/modules/Makefile stable/10/sys/modules/cxgbe/if_cxgbe/Makefile stable/10/sys/powerpc/conf/NOTES Directory Properties: stable/10/ (props changed) Modified: stable/10/sys/arm/conf/NOTES ============================================================================== --- stable/10/sys/arm/conf/NOTES Sat Jun 6 06:12:14 2015 (r284051) +++ stable/10/sys/arm/conf/NOTES Sat Jun 6 09:28:40 2015 (r284052) @@ -82,6 +82,7 @@ nodevice snake_saver nodevice star_saver nodevice warp_saver +nodevice cxgbe nodevice pcii nodevice snd_cmi nodevice tnt4882 Modified: stable/10/sys/conf/NOTES ============================================================================== --- stable/10/sys/conf/NOTES Sat Jun 6 06:12:14 2015 (r284051) +++ stable/10/sys/conf/NOTES Sat Jun 6 09:28:40 2015 (r284052) @@ -1907,8 +1907,8 @@ device xmphy # XaQti XMAC II # cas: Sun Cassini/Cassini+ and National Semiconductor DP83065 Saturn # cm: Arcnet SMC COM90c26 / SMC COM90c56 # (and SMC COM90c66 in '56 compatibility mode) adapters. -# cxgbe: Support for PCI express 10Gb/1Gb adapters based on the Chelsio T4 -# (Terminator 4) ASIC. +# cxgb: Chelsio T3 based 1GbE/10GbE PCIe Ethernet adapters. +# cxgbe:Chelsio T4 and T5 based 1GbE/10GbE/40GbE PCIe Ethernet adapters. # dc: Support for PCI fast ethernet adapters based on the DEC/Intel 21143 # and various workalikes including: # the ADMtek AL981 Comet and AN985 Centaur, the ASIX Electronics @@ -2057,6 +2057,7 @@ device bge # Broadcom BCM570xx Gigabit device cas # Sun Cassini/Cassini+ and NS DP83065 Saturn device cxgb # Chelsio T3 10 Gigabit Ethernet device cxgb_t3fw # Chelsio T3 10 Gigabit Ethernet firmware +device cxgbe # Chelsio T4 and T5 1GbE/10GbE/40GbE device dc # DEC/Intel 21143 and various workalikes device et # Agere ET1310 10/100/Gigabit Ethernet device fxp # Intel EtherExpress PRO/100B (82557, 82558) @@ -2085,7 +2086,6 @@ device wb # Winbond W89C840F device xl # 3Com 3c90x (``Boomerang'', ``Cyclone'') # PCI Ethernet NICs. -device cxgbe # Chelsio T4 10GbE PCIe adapter device de # DEC/Intel DC21x4x (``Tulip'') device em # Intel Pro/1000 Gigabit Ethernet device igb # Intel Pro/1000 PCIE Gigabit Ethernet Modified: stable/10/sys/conf/files ============================================================================== --- stable/10/sys/conf/files Sat Jun 6 06:12:14 2015 (r284051) +++ stable/10/sys/conf/files Sat Jun 6 09:28:40 2015 (r284052) @@ -1135,6 +1135,8 @@ dev/cxgb/sys/uipc_mvec.c optional cxgb p compile-with "${NORMAL_C} -I$S/dev/cxgb" dev/cxgb/cxgb_t3fw.c optional cxgb cxgb_t3fw \ compile-with "${NORMAL_C} -I$S/dev/cxgb" +dev/cxgbe/t4_mp_ring.c optional cxgbe pci \ + compile-with "${NORMAL_C} -I$S/dev/cxgbe ${GCC_MS_EXTENSIONS}" dev/cxgbe/t4_main.c optional cxgbe pci \ compile-with "${NORMAL_C} -I$S/dev/cxgbe" dev/cxgbe/t4_netmap.c optional cxgbe pci \ Modified: stable/10/sys/dev/cxgbe/adapter.h ============================================================================== --- stable/10/sys/dev/cxgbe/adapter.h Sat Jun 6 06:12:14 2015 (r284051) +++ stable/10/sys/dev/cxgbe/adapter.h Sat Jun 6 09:28:40 2015 (r284052) @@ -146,7 +146,8 @@ enum { CL_METADATA_SIZE = CACHE_LINE_SIZE, SGE_MAX_WR_NDESC = SGE_MAX_WR_LEN / EQ_ESIZE, /* max WR size in desc */ - TX_SGL_SEGS = 36, + TX_SGL_SEGS = 39, + TX_SGL_SEGS_TSO = 38, TX_WR_FLITS = SGE_MAX_WR_LEN / 8 }; @@ -266,6 +267,7 @@ struct port_info { struct timeval last_refreshed; struct port_stats stats; + u_int tx_parse_error; eventhandler_tag vlan_c; @@ -301,23 +303,9 @@ struct tx_desc { __be64 flit[8]; }; -struct tx_map { - struct mbuf *m; - bus_dmamap_t map; -}; - -/* DMA maps used for tx */ -struct tx_maps { - struct tx_map *maps; - uint32_t map_total; /* # of DMA maps */ - uint32_t map_pidx; /* next map to be used */ - uint32_t map_cidx; /* reclaimed up to this index */ - uint32_t map_avail; /* # of available maps */ -}; - struct tx_sdesc { + struct mbuf *m; /* m_nextpkt linked chain of frames */ uint8_t desc_used; /* # of hardware descriptors used by the WR */ - uint8_t credits; /* NIC txq: # of frames sent out in the WR */ }; @@ -371,16 +359,12 @@ struct sge_iq { enum { EQ_CTRL = 1, EQ_ETH = 2, -#ifdef TCP_OFFLOAD EQ_OFLD = 3, -#endif /* eq flags */ - EQ_TYPEMASK = 7, /* 3 lsbits hold the type */ - EQ_ALLOCATED = (1 << 3), /* firmware resources allocated */ - EQ_DOOMED = (1 << 4), /* about to be destroyed */ - EQ_CRFLUSHED = (1 << 5), /* expecting an update from SGE */ - EQ_STALLED = (1 << 6), /* out of hw descriptors or dmamaps */ + EQ_TYPEMASK = 0x3, /* 2 lsbits hold the type (see above) */ + EQ_ALLOCATED = (1 << 2), /* firmware resources allocated */ + EQ_ENABLED = (1 << 3), /* open for business */ }; /* Listed in order of preference. Update t4_sysctls too if you change these */ @@ -395,32 +379,25 @@ enum {DOORBELL_UDB, DOORBELL_WCWR, DOORB struct sge_eq { unsigned int flags; /* MUST be first */ unsigned int cntxt_id; /* SGE context id for the eq */ - bus_dma_tag_t desc_tag; - bus_dmamap_t desc_map; - char lockname[16]; struct mtx eq_lock; struct tx_desc *desc; /* KVA of descriptor ring */ - bus_addr_t ba; /* bus address of descriptor ring */ - struct sge_qstat *spg; /* status page, for convenience */ uint16_t doorbells; volatile uint32_t *udb; /* KVA of doorbell (lies within BAR2) */ u_int udb_qid; /* relative qid within the doorbell page */ - uint16_t cap; /* max # of desc, for convenience */ - uint16_t avail; /* available descriptors, for convenience */ - uint16_t qsize; /* size (# of entries) of the queue */ + uint16_t sidx; /* index of the entry with the status page */ uint16_t cidx; /* consumer idx (desc idx) */ uint16_t pidx; /* producer idx (desc idx) */ - uint16_t pending; /* # of descriptors used since last doorbell */ + uint16_t equeqidx; /* EQUEQ last requested at this pidx */ + uint16_t dbidx; /* pidx of the most recent doorbell */ uint16_t iqid; /* iq that gets egr_update for the eq */ uint8_t tx_chan; /* tx channel used by the eq */ - struct task tx_task; - struct callout tx_callout; + volatile u_int equiq; /* EQUIQ outstanding */ - /* stats */ - - uint32_t egr_update; /* # of SGE_EGR_UPDATE notifications for eq */ - uint32_t unstalled; /* recovered from stall */ + bus_dma_tag_t desc_tag; + bus_dmamap_t desc_map; + bus_addr_t ba; /* bus address of descriptor ring */ + char lockname[16]; }; struct sw_zone_info { @@ -492,18 +469,19 @@ struct sge_fl { struct cluster_layout cll_alt; /* alternate refill zone, layout */ }; +struct mp_ring; + /* txq: SGE egress queue + what's needed for Ethernet NIC */ struct sge_txq { struct sge_eq eq; /* MUST be first */ struct ifnet *ifp; /* the interface this txq belongs to */ - bus_dma_tag_t tx_tag; /* tag for transmit buffers */ - struct buf_ring *br; /* tx buffer ring */ + struct mp_ring *r; /* tx software ring */ struct tx_sdesc *sdesc; /* KVA of software descriptor ring */ - struct mbuf *m; /* held up due to temporary resource shortage */ - - struct tx_maps txmaps; + struct sglist *gl; + __be32 cpl_ctrl0; /* for convenience */ + struct task tx_reclaim_task; /* stats for common events first */ uint64_t txcsum; /* # of times hardware assisted with checksum */ @@ -512,13 +490,12 @@ struct sge_txq { uint64_t imm_wrs; /* # of work requests with immediate data */ uint64_t sgl_wrs; /* # of work requests with direct SGL */ uint64_t txpkt_wrs; /* # of txpkt work requests (not coalesced) */ - uint64_t txpkts_wrs; /* # of coalesced tx work requests */ - uint64_t txpkts_pkts; /* # of frames in coalesced tx work requests */ + uint64_t txpkts0_wrs; /* # of type0 coalesced tx work requests */ + uint64_t txpkts1_wrs; /* # of type1 coalesced tx work requests */ + uint64_t txpkts0_pkts; /* # of frames in type0 coalesced tx WRs */ + uint64_t txpkts1_pkts; /* # of frames in type1 coalesced tx WRs */ /* stats for not-that-common events */ - - uint32_t no_dmamap; /* no DMA map to load the mbuf */ - uint32_t no_desc; /* out of hardware descriptors */ } __aligned(CACHE_LINE_SIZE); /* rxq: SGE ingress queue + SGE free list + miscellaneous items */ @@ -567,7 +544,13 @@ struct wrqe { STAILQ_ENTRY(wrqe) link; struct sge_wrq *wrq; int wr_len; - uint64_t wr[] __aligned(16); + char wr[] __aligned(16); +}; + +struct wrq_cookie { + TAILQ_ENTRY(wrq_cookie) link; + int ndesc; + int pidx; }; /* @@ -578,17 +561,32 @@ struct sge_wrq { struct sge_eq eq; /* MUST be first */ struct adapter *adapter; + struct task wrq_tx_task; + + /* Tx desc reserved but WR not "committed" yet. */ + TAILQ_HEAD(wrq_incomplete_wrs , wrq_cookie) incomplete_wrs; - /* List of WRs held up due to lack of tx descriptors */ + /* List of WRs ready to go out as soon as descriptors are available. */ STAILQ_HEAD(, wrqe) wr_list; + u_int nwr_pending; + u_int ndesc_needed; /* stats for common events first */ - uint64_t tx_wrs; /* # of tx work requests */ + uint64_t tx_wrs_direct; /* # of WRs written directly to desc ring. */ + uint64_t tx_wrs_ss; /* # of WRs copied from scratch space. */ + uint64_t tx_wrs_copied; /* # of WRs queued and copied to desc ring. */ /* stats for not-that-common events */ - uint32_t no_desc; /* out of hardware descriptors */ + /* + * Scratch space for work requests that wrap around after reaching the + * status page, and some infomation about the last WR that used it. + */ + uint16_t ss_pidx; + uint16_t ss_len; + uint8_t ss[SGE_MAX_WR_LEN]; + } __aligned(CACHE_LINE_SIZE); @@ -737,7 +735,7 @@ struct adapter { struct sge sge; int lro_timeout; - struct taskqueue *tq[NCHAN]; /* taskqueues that flush data out */ + struct taskqueue *tq[NCHAN]; /* General purpose taskqueues */ struct port_info *port[MAX_NPORTS]; uint8_t chan_map[NCHAN]; @@ -970,12 +968,11 @@ static inline int tx_resume_threshold(struct sge_eq *eq) { - return (eq->qsize / 4); + /* not quite the same as qsize / 4, but this will do. */ + return (eq->sidx / 4); } /* t4_main.c */ -void t4_tx_task(void *, int); -void t4_tx_callout(void *); int t4_os_find_pci_capability(struct adapter *, int); int t4_os_pci_save_state(struct adapter *); int t4_os_pci_restore_state(struct adapter *); @@ -1016,16 +1013,15 @@ int t4_setup_adapter_queues(struct adapt int t4_teardown_adapter_queues(struct adapter *); int t4_setup_port_queues(struct port_info *); int t4_teardown_port_queues(struct port_info *); -int t4_alloc_tx_maps(struct tx_maps *, bus_dma_tag_t, int, int); -void t4_free_tx_maps(struct tx_maps *, bus_dma_tag_t); void t4_intr_all(void *); void t4_intr(void *); void t4_intr_err(void *); void t4_intr_evt(void *); void t4_wrq_tx_locked(struct adapter *, struct sge_wrq *, struct wrqe *); -int t4_eth_tx(struct ifnet *, struct sge_txq *, struct mbuf *); void t4_update_fl_bufsize(struct ifnet *); -int can_resume_tx(struct sge_eq *); +int parse_pkt(struct mbuf **); +void *start_wrq_wr(struct sge_wrq *, int, struct wrq_cookie *); +void commit_wrq_wr(struct sge_wrq *, void *, struct wrq_cookie *); int tnl_cong(struct port_info *); /* t4_tracer.c */ Modified: stable/10/sys/dev/cxgbe/t4_l2t.c ============================================================================== --- stable/10/sys/dev/cxgbe/t4_l2t.c Sat Jun 6 06:12:14 2015 (r284051) +++ stable/10/sys/dev/cxgbe/t4_l2t.c Sat Jun 6 09:28:40 2015 (r284052) @@ -112,16 +112,15 @@ found: int t4_write_l2e(struct adapter *sc, struct l2t_entry *e, int sync) { - struct wrqe *wr; + struct wrq_cookie cookie; struct cpl_l2t_write_req *req; int idx = e->idx + sc->vres.l2t.start; mtx_assert(&e->lock, MA_OWNED); - wr = alloc_wrqe(sizeof(*req), &sc->sge.mgmtq); - if (wr == NULL) + req = start_wrq_wr(&sc->sge.mgmtq, howmany(sizeof(*req), 16), &cookie); + if (req == NULL) return (ENOMEM); - req = wrtod(wr); INIT_TP_WR(req, 0); OPCODE_TID(req) = htonl(MK_OPCODE_TID(CPL_L2T_WRITE_REQ, idx | @@ -131,7 +130,7 @@ t4_write_l2e(struct adapter *sc, struct req->vlan = htons(e->vlan); memcpy(req->dst_mac, e->dmac, sizeof(req->dst_mac)); - t4_wrq_tx(sc, wr); + commit_wrq_wr(&sc->sge.mgmtq, req, &cookie); if (sync && e->state != L2T_STATE_SWITCHING) e->state = L2T_STATE_SYNC_WRITE; Modified: stable/10/sys/dev/cxgbe/t4_main.c ============================================================================== --- stable/10/sys/dev/cxgbe/t4_main.c Sat Jun 6 06:12:14 2015 (r284051) +++ stable/10/sys/dev/cxgbe/t4_main.c Sat Jun 6 09:28:40 2015 (r284052) @@ -36,6 +36,8 @@ __FBSDID("$FreeBSD$"); #include <sys/priv.h> #include <sys/kernel.h> #include <sys/bus.h> +#include <sys/systm.h> +#include <sys/counter.h> #include <sys/module.h> #include <sys/malloc.h> #include <sys/queue.h> @@ -66,6 +68,7 @@ __FBSDID("$FreeBSD$"); #include "common/t4_regs_values.h" #include "t4_ioctl.h" #include "t4_l2t.h" +#include "t4_mp_ring.h" /* T4 bus driver interface */ static int t4_probe(device_t); @@ -377,7 +380,8 @@ static void build_medialist(struct port_ static int cxgbe_init_synchronized(struct port_info *); static int cxgbe_uninit_synchronized(struct port_info *); static int setup_intr_handlers(struct adapter *); -static void quiesce_eq(struct adapter *, struct sge_eq *); +static void quiesce_txq(struct adapter *, struct sge_txq *); +static void quiesce_wrq(struct adapter *, struct sge_wrq *); static void quiesce_iq(struct adapter *, struct sge_iq *); static void quiesce_fl(struct adapter *, struct sge_fl *); static int t4_alloc_irq(struct adapter *, struct irq *, int rid, @@ -433,7 +437,6 @@ static int sysctl_tx_rate(SYSCTL_HANDLER static int sysctl_ulprx_la(SYSCTL_HANDLER_ARGS); static int sysctl_wcwr_stats(SYSCTL_HANDLER_ARGS); #endif -static inline void txq_start(struct ifnet *, struct sge_txq *); static uint32_t fconf_to_mode(uint32_t); static uint32_t mode_to_fconf(uint32_t); static uint32_t fspec_to_fconf(struct t4_filter_specification *); @@ -665,6 +668,14 @@ t4_attach(device_t dev) goto done; } +#if defined(__i386__) + if ((cpu_feature & CPUID_CX8) == 0) { + device_printf(dev, "64 bit atomics not available.\n"); + rc = ENOTSUP; + goto done; + } +#endif + /* Prepare the firmware for operation */ rc = prep_firmware(sc); if (rc != 0) @@ -1386,67 +1397,36 @@ cxgbe_transmit(struct ifnet *ifp, struct { struct port_info *pi = ifp->if_softc; struct adapter *sc = pi->adapter; - struct sge_txq *txq = &sc->sge.txq[pi->first_txq]; - struct buf_ring *br; + struct sge_txq *txq; + void *items[1]; int rc; M_ASSERTPKTHDR(m); + MPASS(m->m_nextpkt == NULL); /* not quite ready for this yet */ if (__predict_false(pi->link_cfg.link_ok == 0)) { m_freem(m); return (ENETDOWN); } - /* check if flowid is set */ - if (M_HASHTYPE_GET(m) != M_HASHTYPE_NONE) - txq += ((m->m_pkthdr.flowid % (pi->ntxq - pi->rsrv_noflowq)) - + pi->rsrv_noflowq); - br = txq->br; - - if (TXQ_TRYLOCK(txq) == 0) { - struct sge_eq *eq = &txq->eq; - - /* - * It is possible that t4_eth_tx finishes up and releases the - * lock between the TRYLOCK above and the drbr_enqueue here. We - * need to make sure that this mbuf doesn't just sit there in - * the drbr. - */ - - rc = drbr_enqueue(ifp, br, m); - if (rc == 0 && callout_pending(&eq->tx_callout) == 0 && - !(eq->flags & EQ_DOOMED)) - callout_reset(&eq->tx_callout, 1, t4_tx_callout, eq); + rc = parse_pkt(&m); + if (__predict_false(rc != 0)) { + MPASS(m == NULL); /* was freed already */ + atomic_add_int(&pi->tx_parse_error, 1); /* rare, atomic is ok */ return (rc); } - /* - * txq->m is the mbuf that is held up due to a temporary shortage of - * resources and it should be put on the wire first. Then what's in - * drbr and finally the mbuf that was just passed in to us. - * - * Return code should indicate the fate of the mbuf that was passed in - * this time. - */ - - TXQ_LOCK_ASSERT_OWNED(txq); - if (drbr_needs_enqueue(ifp, br) || txq->m) { - - /* Queued for transmission. */ - - rc = drbr_enqueue(ifp, br, m); - m = txq->m ? txq->m : drbr_dequeue(ifp, br); - (void) t4_eth_tx(ifp, txq, m); - TXQ_UNLOCK(txq); - return (rc); - } + /* Select a txq. */ + txq = &sc->sge.txq[pi->first_txq]; + if (M_HASHTYPE_GET(m) != M_HASHTYPE_NONE) + txq += ((m->m_pkthdr.flowid % (pi->ntxq - pi->rsrv_noflowq)) + + pi->rsrv_noflowq); - /* Direct transmission. */ - rc = t4_eth_tx(ifp, txq, m); - if (rc != 0 && txq->m) - rc = 0; /* held, will be transmitted soon (hopefully) */ + items[0] = m; + rc = mp_ring_enqueue(txq->r, items, 1, 4096); + if (__predict_false(rc != 0)) + m_freem(m); - TXQ_UNLOCK(txq); return (rc); } @@ -1456,17 +1436,17 @@ cxgbe_qflush(struct ifnet *ifp) struct port_info *pi = ifp->if_softc; struct sge_txq *txq; int i; - struct mbuf *m; /* queues do not exist if !PORT_INIT_DONE. */ if (pi->flags & PORT_INIT_DONE) { for_each_txq(pi, i, txq) { TXQ_LOCK(txq); - m_freem(txq->m); - txq->m = NULL; - while ((m = buf_ring_dequeue_sc(txq->br)) != NULL) - m_freem(m); + txq->eq.flags &= ~EQ_ENABLED; TXQ_UNLOCK(txq); + while (!mp_ring_is_idle(txq->r)) { + mp_ring_check_drainage(txq->r, 0); + pause("qflush", 1); + } } } if_qflush(ifp); @@ -3135,7 +3115,8 @@ cxgbe_init_synchronized(struct port_info { struct adapter *sc = pi->adapter; struct ifnet *ifp = pi->ifp; - int rc = 0; + int rc = 0, i; + struct sge_txq *txq; ASSERT_SYNCHRONIZED_OP(sc); @@ -3164,6 +3145,17 @@ cxgbe_init_synchronized(struct port_info } /* + * Can't fail from this point onwards. Review cxgbe_uninit_synchronized + * if this changes. + */ + + for_each_txq(pi, i, txq) { + TXQ_LOCK(txq); + txq->eq.flags |= EQ_ENABLED; + TXQ_UNLOCK(txq); + } + + /* * The first iq of the first port to come up is used for tracing. */ if (sc->traceq < 0) { @@ -3196,7 +3188,8 @@ cxgbe_uninit_synchronized(struct port_in { struct adapter *sc = pi->adapter; struct ifnet *ifp = pi->ifp; - int rc; + int rc, i; + struct sge_txq *txq; ASSERT_SYNCHRONIZED_OP(sc); @@ -3213,6 +3206,12 @@ cxgbe_uninit_synchronized(struct port_in return (rc); } + for_each_txq(pi, i, txq) { + TXQ_LOCK(txq); + txq->eq.flags &= ~EQ_ENABLED; + TXQ_UNLOCK(txq); + } + clrbit(&sc->open_device_map, pi->port_id); PORT_LOCK(pi); ifp->if_drv_flags &= ~IFF_DRV_RUNNING; @@ -3443,15 +3442,17 @@ port_full_uninit(struct port_info *pi) if (pi->flags & PORT_INIT_DONE) { - /* Need to quiesce queues. XXX: ctrl queues? */ + /* Need to quiesce queues. */ + + quiesce_wrq(sc, &sc->sge.ctrlq[pi->port_id]); for_each_txq(pi, i, txq) { - quiesce_eq(sc, &txq->eq); + quiesce_txq(sc, txq); } #ifdef TCP_OFFLOAD for_each_ofld_txq(pi, i, ofld_txq) { - quiesce_eq(sc, &ofld_txq->eq); + quiesce_wrq(sc, ofld_txq); } #endif @@ -3476,23 +3477,39 @@ port_full_uninit(struct port_info *pi) } static void -quiesce_eq(struct adapter *sc, struct sge_eq *eq) +quiesce_txq(struct adapter *sc, struct sge_txq *txq) { - EQ_LOCK(eq); - eq->flags |= EQ_DOOMED; + struct sge_eq *eq = &txq->eq; + struct sge_qstat *spg = (void *)&eq->desc[eq->sidx]; - /* - * Wait for the response to a credit flush if one's - * pending. - */ - while (eq->flags & EQ_CRFLUSHED) - mtx_sleep(eq, &eq->eq_lock, 0, "crflush", 0); - EQ_UNLOCK(eq); + (void) sc; /* unused */ - callout_drain(&eq->tx_callout); /* XXX: iffy */ - pause("callout", 10); /* Still iffy */ +#ifdef INVARIANTS + TXQ_LOCK(txq); + MPASS((eq->flags & EQ_ENABLED) == 0); + TXQ_UNLOCK(txq); +#endif - taskqueue_drain(sc->tq[eq->tx_chan], &eq->tx_task); + /* Wait for the mp_ring to empty. */ + while (!mp_ring_is_idle(txq->r)) { + mp_ring_check_drainage(txq->r, 0); + pause("rquiesce", 1); + } + + /* Then wait for the hardware to finish. */ + while (spg->cidx != htobe16(eq->pidx)) + pause("equiesce", 1); + + /* Finally, wait for the driver to reclaim all descriptors. */ + while (eq->cidx != eq->pidx) + pause("dquiesce", 1); +} + +static void +quiesce_wrq(struct adapter *sc, struct sge_wrq *wrq) +{ + + /* XXXTX */ } static void @@ -4287,7 +4304,7 @@ cxgbe_refresh_stats(struct adapter *sc, drops = s->tx_drop; for_each_txq(pi, i, txq) - drops += txq->br->br_drops; + drops += counter_u64_fetch(txq->r->drops); ifp->if_snd.ifq_drops = drops; ifp->if_oerrors = s->tx_error_frames; @@ -4818,6 +4835,9 @@ cxgbe_sysctls(struct port_info *pi) oid = SYSCTL_ADD_NODE(ctx, children, OID_AUTO, "stats", CTLFLAG_RD, NULL, "port statistics"); children = SYSCTL_CHILDREN(oid); + SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "tx_parse_error", CTLFLAG_RD, + &pi->tx_parse_error, 0, + "# of tx packets with invalid length or # of segments"); #define SYSCTL_ADD_T4_REG64(pi, name, desc, reg) \ SYSCTL_ADD_OID(ctx, children, OID_AUTO, name, \ @@ -5177,8 +5197,7 @@ sysctl_qsize_txq(SYSCTL_HANDLER_ARGS) if (rc != 0 || req->newptr == NULL) return (rc); - /* bufring size must be powerof2 */ - if (qsize < 128 || !powerof2(qsize)) + if (qsize < 128 || qsize > 65536) return (EINVAL); rc = begin_synchronized_op(sc, pi, HOLD_LOCK | SLEEP_OK | INTR_OK, @@ -6876,74 +6895,6 @@ sysctl_wcwr_stats(SYSCTL_HANDLER_ARGS) } #endif -static inline void -txq_start(struct ifnet *ifp, struct sge_txq *txq) -{ - struct buf_ring *br; - struct mbuf *m; - - TXQ_LOCK_ASSERT_OWNED(txq); - - br = txq->br; - m = txq->m ? txq->m : drbr_dequeue(ifp, br); - if (m) - t4_eth_tx(ifp, txq, m); -} - -void -t4_tx_callout(void *arg) -{ - struct sge_eq *eq = arg; - struct adapter *sc; - - if (EQ_TRYLOCK(eq) == 0) - goto reschedule; - - if (eq->flags & EQ_STALLED && !can_resume_tx(eq)) { - EQ_UNLOCK(eq); -reschedule: - if (__predict_true(!(eq->flags && EQ_DOOMED))) - callout_schedule(&eq->tx_callout, 1); - return; - } - - EQ_LOCK_ASSERT_OWNED(eq); - - if (__predict_true((eq->flags & EQ_DOOMED) == 0)) { - - if ((eq->flags & EQ_TYPEMASK) == EQ_ETH) { - struct sge_txq *txq = arg; - struct port_info *pi = txq->ifp->if_softc; - - sc = pi->adapter; - } else { - struct sge_wrq *wrq = arg; - - sc = wrq->adapter; - } - - taskqueue_enqueue(sc->tq[eq->tx_chan], &eq->tx_task); - } - - EQ_UNLOCK(eq); -} - -void -t4_tx_task(void *arg, int count) -{ - struct sge_eq *eq = arg; - - EQ_LOCK(eq); - if ((eq->flags & EQ_TYPEMASK) == EQ_ETH) { - struct sge_txq *txq = arg; - txq_start(txq->ifp, txq); - } else { - struct sge_wrq *wrq = arg; - t4_wrq_tx_locked(wrq->adapter, wrq, NULL); - } - EQ_UNLOCK(eq); -} - static uint32_t fconf_to_mode(uint32_t fconf) { @@ -7373,9 +7324,9 @@ static int set_filter_wr(struct adapter *sc, int fidx) { struct filter_entry *f = &sc->tids.ftid_tab[fidx]; - struct wrqe *wr; struct fw_filter_wr *fwr; unsigned int ftid; + struct wrq_cookie cookie; ASSERT_SYNCHRONIZED_OP(sc); @@ -7394,12 +7345,10 @@ set_filter_wr(struct adapter *sc, int fi ftid = sc->tids.ftid_base + fidx; - wr = alloc_wrqe(sizeof(*fwr), &sc->sge.mgmtq); - if (wr == NULL) + fwr = start_wrq_wr(&sc->sge.mgmtq, howmany(sizeof(*fwr), 16), &cookie); + if (fwr == NULL) return (ENOMEM); - - fwr = wrtod(wr); - bzero(fwr, sizeof (*fwr)); + bzero(fwr, sizeof(*fwr)); fwr->op_pkd = htobe32(V_FW_WR_OP(FW_FILTER_WR)); fwr->len16_pkd = htobe32(FW_LEN16(*fwr)); @@ -7468,7 +7417,7 @@ set_filter_wr(struct adapter *sc, int fi f->pending = 1; sc->tids.ftids_in_use++; - t4_wrq_tx(sc, wr); + commit_wrq_wr(&sc->sge.mgmtq, fwr, &cookie); return (0); } @@ -7476,22 +7425,21 @@ static int del_filter_wr(struct adapter *sc, int fidx) { struct filter_entry *f = &sc->tids.ftid_tab[fidx]; - struct wrqe *wr; struct fw_filter_wr *fwr; unsigned int ftid; + struct wrq_cookie cookie; ftid = sc->tids.ftid_base + fidx; - wr = alloc_wrqe(sizeof(*fwr), &sc->sge.mgmtq); - if (wr == NULL) + fwr = start_wrq_wr(&sc->sge.mgmtq, howmany(sizeof(*fwr), 16), &cookie); + if (fwr == NULL) return (ENOMEM); - fwr = wrtod(wr); bzero(fwr, sizeof (*fwr)); t4_mk_filtdelwr(ftid, fwr, sc->sge.fwq.abs_id); f->pending = 1; - t4_wrq_tx(sc, wr); + commit_wrq_wr(&sc->sge.mgmtq, fwr, &cookie); return (0); } @@ -8091,6 +8039,7 @@ t4_ioctl(struct cdev *dev, unsigned long /* MAC stats */ t4_clr_port_stats(sc, pi->tx_chan); + pi->tx_parse_error = 0; if (pi->flags & PORT_INIT_DONE) { struct sge_rxq *rxq; @@ -8113,24 +8062,24 @@ t4_ioctl(struct cdev *dev, unsigned long txq->imm_wrs = 0; txq->sgl_wrs = 0; txq->txpkt_wrs = 0; - txq->txpkts_wrs = 0; - txq->txpkts_pkts = 0; - txq->br->br_drops = 0; - txq->no_dmamap = 0; - txq->no_desc = 0; + txq->txpkts0_wrs = 0; + txq->txpkts1_wrs = 0; + txq->txpkts0_pkts = 0; + txq->txpkts1_pkts = 0; + mp_ring_reset_stats(txq->r); } #ifdef TCP_OFFLOAD /* nothing to clear for each ofld_rxq */ for_each_ofld_txq(pi, i, wrq) { - wrq->tx_wrs = 0; - wrq->no_desc = 0; + wrq->tx_wrs_direct = 0; + wrq->tx_wrs_copied = 0; } #endif wrq = &sc->sge.ctrlq[pi->port_id]; - wrq->tx_wrs = 0; - wrq->no_desc = 0; + wrq->tx_wrs_direct = 0; + wrq->tx_wrs_copied = 0; } break; } Copied and modified: stable/10/sys/dev/cxgbe/t4_mp_ring.c (from r276485, head/sys/dev/cxgbe/t4_mp_ring.c) ============================================================================== --- head/sys/dev/cxgbe/t4_mp_ring.c Wed Dec 31 23:19:16 2014 (r276485, copy source) +++ stable/10/sys/dev/cxgbe/t4_mp_ring.c Sat Jun 6 09:28:40 2015 (r284052) @@ -38,6 +38,11 @@ __FBSDID("$FreeBSD$"); #include "t4_mp_ring.h" +#if defined(__i386__) +#define atomic_cmpset_acq_64 atomic_cmpset_64 +#define atomic_cmpset_rel_64 atomic_cmpset_64 +#endif + union ring_state { struct { uint16_t pidx_head; Copied: stable/10/sys/dev/cxgbe/t4_mp_ring.h (from r276485, head/sys/dev/cxgbe/t4_mp_ring.h) ============================================================================== --- /dev/null 00:00:00 1970 (empty, because file is newly added) +++ stable/10/sys/dev/cxgbe/t4_mp_ring.h Sat Jun 6 09:28:40 2015 (r284052, copy of r276485, head/sys/dev/cxgbe/t4_mp_ring.h) @@ -0,0 +1,68 @@ +/*- + * Copyright (c) 2014 Chelsio Communications, Inc. + * All rights reserved. + * Written by: Navdeep Parhar <np@FreeBSD.org> + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD$ + * + */ + +#ifndef __CXGBE_MP_RING_H +#define __CXGBE_MP_RING_H + +#ifndef _KERNEL +#error "no user-serviceable parts inside" +#endif + +struct mp_ring; +typedef u_int (*ring_drain_t)(struct mp_ring *, u_int, u_int); +typedef u_int (*ring_can_drain_t)(struct mp_ring *); + +struct mp_ring { + volatile uint64_t state __aligned(CACHE_LINE_SIZE); + + int size __aligned(CACHE_LINE_SIZE); + void * cookie; + struct malloc_type * mt; + ring_drain_t drain; + ring_can_drain_t can_drain; /* cheap, may be unreliable */ + counter_u64_t enqueues; + counter_u64_t drops; + counter_u64_t starts; + counter_u64_t stalls; + counter_u64_t restarts; /* recovered after stalling */ + counter_u64_t abdications; + + void * volatile items[] __aligned(CACHE_LINE_SIZE); +}; + +int mp_ring_alloc(struct mp_ring **, int, void *, ring_drain_t, + ring_can_drain_t, struct malloc_type *, int); +void mp_ring_free(struct mp_ring *); +int mp_ring_enqueue(struct mp_ring *, void **, int, int); +void mp_ring_check_drainage(struct mp_ring *, int); +void mp_ring_reset_stats(struct mp_ring *); +int mp_ring_is_idle(struct mp_ring *); + +#endif Modified: stable/10/sys/dev/cxgbe/t4_sge.c ============================================================================== --- stable/10/sys/dev/cxgbe/t4_sge.c Sat Jun 6 06:12:14 2015 (r284051) +++ stable/10/sys/dev/cxgbe/t4_sge.c Sat Jun 6 09:28:40 2015 (r284052) @@ -35,12 +35,12 @@ __FBSDID("$FreeBSD$"); #include <sys/mbuf.h> #include <sys/socket.h> #include <sys/kernel.h> -#include <sys/kdb.h> #include <sys/malloc.h> #include <sys/queue.h> #include <sys/sbuf.h> #include <sys/taskqueue.h> #include <sys/time.h> +#include <sys/sglist.h> #include <sys/sysctl.h> #include <sys/smp.h> #include <sys/counter.h> @@ -67,6 +67,7 @@ __FBSDID("$FreeBSD$"); #include "common/t4_regs.h" #include "common/t4_regs_values.h" #include "common/t4_msg.h" +#include "t4_mp_ring.h" #ifdef T4_PKT_TIMESTAMP #define RX_COPY_THRESHOLD (MINCLSIZE - 8) @@ -146,19 +147,17 @@ TUNABLE_INT("hw.cxgbe.largest_rx_cluster static int safest_rx_cluster = PAGE_SIZE; TUNABLE_INT("hw.cxgbe.safest_rx_cluster", &safest_rx_cluster); -/* Used to track coalesced tx work request */ struct txpkts { - uint64_t *flitp; /* ptr to flit where next pkt should start */ - uint8_t npkt; /* # of packets in this work request */ - uint8_t nflits; /* # of flits used by this work request */ - uint16_t plen; /* total payload (sum of all packets) */ + u_int wr_type; /* type 0 or type 1 */ + u_int npkt; /* # of packets in this work request */ + u_int plen; /* total payload (sum of all packets) */ + u_int len16; /* # of 16B pieces used by this work request */ }; /* A packet's SGL. This + m_pkthdr has all info needed for tx */ struct sgl { - int nsegs; /* # of segments in the SGL, 0 means imm. tx */ - int nflits; /* # of flits needed for the SGL */ - bus_dma_segment_t seg[TX_SGL_SEGS]; + struct sglist sg; + struct sglist_seg seg[TX_SGL_SEGS]; }; static int service_iq(struct sge_iq *, int); @@ -220,26 +219,31 @@ static void find_best_refill_source(stru static void find_safe_refill_source(struct adapter *, struct sge_fl *); static void add_fl_to_sfl(struct adapter *, struct sge_fl *); -static int get_pkt_sgl(struct sge_txq *, struct mbuf **, struct sgl *, int); -static int free_pkt_sgl(struct sge_txq *, struct sgl *); -static int write_txpkt_wr(struct port_info *, struct sge_txq *, struct mbuf *, - struct sgl *); -static int add_to_txpkts(struct port_info *, struct sge_txq *, struct txpkts *, - struct mbuf *, struct sgl *); -static void write_txpkts_wr(struct sge_txq *, struct txpkts *); -static inline void write_ulp_cpl_sgl(struct port_info *, struct sge_txq *, - struct txpkts *, struct mbuf *, struct sgl *); -static int write_sgl_to_txd(struct sge_eq *, struct sgl *, caddr_t *); +static inline void get_pkt_gl(struct mbuf *, struct sglist *); +static inline u_int txpkt_len16(u_int, u_int); +static inline u_int txpkts0_len16(u_int); +static inline u_int txpkts1_len16(void); +static u_int write_txpkt_wr(struct sge_txq *, struct fw_eth_tx_pkt_wr *, + struct mbuf *, u_int); +static int try_txpkts(struct mbuf *, struct mbuf *, struct txpkts *, u_int); +static int add_to_txpkts(struct mbuf *, struct txpkts *, u_int); +static u_int write_txpkts_wr(struct sge_txq *, struct fw_eth_tx_pkts_wr *, + struct mbuf *, const struct txpkts *, u_int); +static void write_gl_to_txd(struct sge_txq *, struct mbuf *, caddr_t *, int); static inline void copy_to_txd(struct sge_eq *, caddr_t, caddr_t *, int); -static inline void ring_eq_db(struct adapter *, struct sge_eq *); -static inline int reclaimable(struct sge_eq *); -static int reclaim_tx_descs(struct sge_txq *, int, int); -static void write_eqflush_wr(struct sge_eq *); -static __be64 get_flit(bus_dma_segment_t *, int, int); +static inline void ring_eq_db(struct adapter *, struct sge_eq *, u_int); +static inline uint16_t read_hw_cidx(struct sge_eq *); +static inline u_int reclaimable_tx_desc(struct sge_eq *); +static inline u_int total_available_tx_desc(struct sge_eq *); +static u_int reclaim_tx_descs(struct sge_txq *, u_int); +static void tx_reclaim(void *, int); +static __be64 get_flit(struct sglist_seg *, int, int); static int handle_sge_egr_update(struct sge_iq *, const struct rss_header *, struct mbuf *); static int handle_fw_msg(struct sge_iq *, const struct rss_header *, struct mbuf *); +static void wrq_tx_drain(void *, int); +static void drain_wrq_wr_list(struct adapter *, struct sge_wrq *); static int sysctl_uint16(SYSCTL_HANDLER_ARGS); static int sysctl_bufsizes(SYSCTL_HANDLER_ARGS); *** DIFF OUTPUT TRUNCATED AT 1000 LINES ***
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201506060928.t569SfjD020849>