Date: Tue, 20 Jan 2015 20:33:38 -0500 From: Pedro Giffuni <pfg@FreeBSD.org> To: Navdeep Parhar <np@FreeBSD.org>, Luigi Rizzo <rizzo@iet.unipi.it>, "src-committers@freebsd.org" <src-committers@freebsd.org>, "svn-src-all@freebsd.org" <svn-src-all@freebsd.org>, "svn-src-head@freebsd.org" <svn-src-head@freebsd.org> Subject: Re: svn commit: r276485 - in head/sys: conf dev/cxgbe modules/cxgbe/if_cxgbe Message-ID: <54BF01F2.4030502@FreeBSD.org> In-Reply-To: <54BEFB79.6090806@FreeBSD.org> References: <201412312319.sBVNJHca031041@svn.freebsd.org> <CA%2BhQ2%2Bh29RObCONCd8Nu_W92CnJ9jHMZdRBqiU9hu78D3SwUDA@mail.gmail.com> <20150106203344.GB26068@ox> <54BEE07A.3070207@FreeBSD.org> <54BEE305.6020905@FreeBSD.org> <54BEF7CF.9030505@FreeBSD.org> <54BEFB79.6090806@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 01/20/15 20:06, Navdeep Parhar wrote: > On 01/20/15 16:50, Pedro Giffuni wrote: >> >> On 01/20/15 18:21, Navdeep Parhar wrote: >>> The problem reported by Luigi has been fixed in r277225 already. >>> >>> Regards, >>> Navdeep >>> >> >> But the fix is rather ugly, isn't it? I would personally prefer to just >> kill the older >> gcc but in the meantime updating it so that it behaves like the updated >> gcc/clang would be better. IMHO. > > I'm not sure why you think the fix is ugly. Modifying the base > compiler to deal with minor stuff like this seems excessive and I > never even considered that. > "Modifying the base compiler to deal with minor stuff like this" is actually called "an update" since upstream already did it: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10676 You could also call it "making it more compatible with clang and newer gcc" The base compiler is ugly as it can be but that's upstream's fault, not the fault those of us that were once condemned to add bandaids. Happily I am not planning to touch it anymore ;). Pedro. > Regards, > Navdeep > >> >> Pedro. >> >>> On 01/20/15 15:10, Pedro Giffuni wrote: >>>> Hi; >>>> >>>> I got this patch from the OpenBSD-tech list[1]. >>>> Perhaps this fixes the gcc issue? >>>> >>>> Apparently it's required for mesa too. >>>> >>>> Pedro. >>>> >>>> [1] http://article.gmane.org/gmane.os.openbsd.tech/40604 >>>> >>>> On 01/06/15 15:33, Navdeep Parhar wrote: >>>>> On Tue, Jan 06, 2015 at 07:58:34PM +0100, Luigi Rizzo wrote: >>>>>> >>>>>> On Thu, Jan 1, 2015 at 12:19 AM, Navdeep Parhar <np@freebsd.org> >>>>>> wrote: >>>>>> >>>>>> Author: np >>>>>> Date: Wed Dec 31 23:19:16 2014 >>>>>> New Revision: 276485 >>>>>> URL: https://svnweb.freebsd.org/changeset/base/276485 >>>>>> >>>>>> Log: >>>>>> cxgbe(4): major tx rework. >>>>>> >>>>>> >>>>>> FYI, this commit has some unnamed unions (eg. in t4_mp_ring.c) >>>>>> which prevent the kernel from compiling with our stock gcc >>>>>> and its standard kernel build flags (specifically -std=...). >>>>>> >>>>>> Adding the following in the kernel config >>>>>> >>>>>> makeoptions COPTFLAGS="-fms-extensions" >>>>>> >>>>>> seems to do the job >>>>>> >>>>>> I know it is unavoidable that we'll end up with gcc not working, >>>>>> but maybe we can still avoid unnamed unions. >>>>> There are two unresolved issues with mp_ring and I had to make the >>>>> driver amd64-only while I consider my options. >>>>> >>>>> - platforms where gcc is the default (and our version has problems >>>>> with >>>>> unnamed unions). This is simple to fix but reduces the >>>>> readability of >>>>> the code. But sure, if building head with gcc is popular then >>>>> that >>>>> trumps readability. I wonder if adding -fms-extensions just to >>>>> the >>>>> driver's build flags would be an acceptable compromise. >>>>> - platforms without the acq/rel versions of 64b cmpset. I think it >>>>> would be simple to add acq/rel variants to i386/pc98 and others >>>>> that >>>>> already have 64b cmpset. The driver will be permanently unplugged >>>>> from >>>>> whatever remains (only 32 bit powerpc I think). >>>>> >>>>> I'll try to sort all this out within the next couple of weeks. >>>>> >>>>> Regards, >>>>> Navdeep >>>>> >>>>>> cheers >>>>>> luigi >>>>>> >>>>>> >>>>>> a) Front load as much work as possible in if_transmit, before >>>>>> any driver >>>>>> lock or software queue has to get involved. >>>>>> >>>>>> b) Replace buf_ring with a brand new mp_ring (multiproducer >>>>>> ring). This >>>>>> is specifically for the tx multiqueue model where one of the >>>>>> if_transmit >>>>>> producer threads becomes the consumer and other producers >>>>>> carry on as >>>>>> usual. mp_ring is implemented as standalone code and it >>>>>> should be >>>>>> possible to use it in any driver with tx multiqueue. It also >>>>>> has: >>>>>> - the ability to enqueue/dequeue multiple items. This might >>>>>> become >>>>>> significant if packet batching is ever implemented. >>>>>> - an abdication mechanism to allow a thread to give up >>>>>> writing tx >>>>>> descriptors and have another if_transmit thread take over. >>>>>> A thread >>>>>> that's writing tx descriptors can end up doing so for an >>>>>> unbounded >>>>>> time period if a) there are other if_transmit threads >>>>>> continuously >>>>>> feeding the sofware queue, and b) the chip keeps up with >>>>>> whatever the >>>>>> thread is throwing at it. >>>>>> - accurate statistics about interesting events even when the >>>>>> stats come >>>>>> at the expense of additional branches/conditional code. >>>>>> >>>>>> The NIC txq lock is uncontested on the fast path at this >>>>>> point. I've >>>>>> left it there for synchronization with the control events >>>>>> (interface >>>>>> up/down, modload/unload). >>>>>> >>>>>> c) Add support for "type 1" coalescing work request in the >>>>>> normal NIC tx >>>>>> path. This work request is optimized for frames with a >>>>>> single >>>>>> item in >>>>>> the DMA gather list. These are very common when forwarding >>>>>> packets. >>>>>> Note that netmap tx in cxgbe already uses these "type 1" work >>>>>> requests. >>>>>> >>>>>> d) Do not request automatic cidx updates every 32 >>>>>> descriptors. Instead, >>>>>> request updates via bits in individual work requests (still >>>>>> every 32 >>>>>> descriptors approximately). Also, request an automatic final >>>>>> update >>>>>> when the queue idles after activity. This means NIC tx >>>>>> reclaim is still >>>>>> performed lazily but it will catch up quickly as soon as the >>>>>> queue >>>>>> idles. This seems to be the best middle ground and I'll >>>>>> probably do >>>>>> something similar for netmap tx as well. >>>>>> >>>>>> e) Implement a faster tx path for WRQs (used by TOE tx and >>>>>> control >>>>>> queues, _not_ by the normal NIC tx). Allow work requests to >>>>>> be written >>>>>> directly to the hardware descriptor ring if room is >>>>>> available. I will >>>>>> convert t4_tom and iw_cxgbe modules to this faster style >>>>>> gradually. >>>>>> >>>>>> MFC after: 2 months >>>>>> >>>>>> Added: >>>>>> head/sys/dev/cxgbe/t4_mp_ring.c (contents, props changed) >>>>>> head/sys/dev/cxgbe/t4_mp_ring.h (contents, props changed) >>>>>> Modified: >>>>>> head/sys/conf/files >>>>>> head/sys/dev/cxgbe/adapter.h >>>>>> head/sys/dev/cxgbe/t4_l2t.c >>>>>> head/sys/dev/cxgbe/t4_main.c >>>>>> head/sys/dev/cxgbe/t4_sge.c >>>>>> head/sys/modules/cxgbe/if_cxgbe/Makefile >>>>>> >>>>>> Modified: head/sys/conf/files >>>>>> >>>>>> =========================================================================== >>>>>> >>>>>> >>>>>> >>>>>> === >>>>>> --- head/sys/conf/files Wed Dec 31 22:52:43 2014 (r276484) >>>>>> +++ head/sys/conf/files Wed Dec 31 23:19:16 2014 (r276485) >>>>>> @@ -1142,6 +1142,8 @@ dev/cxgb/sys/uipc_mvec.c optional cxgb p >>>>>> compile-with "${NORMAL_C} -I$S/dev/cxgb" >>>>>> dev/cxgb/cxgb_t3fw.c optional cxgb cxgb_t3fw \ >>>>>> compile-with "${NORMAL_C} -I$S/dev/cxgb" >>>>>> +dev/cxgbe/t4_mp_ring.c optional cxgbe pci \ >>>>>> + compile-with "${NORMAL_C} -I$S/dev/cxgbe" >>>>>> dev/cxgbe/t4_main.c optional cxgbe pci \ >>>>>> compile-with "${NORMAL_C} -I$S/dev/cxgbe" >>>>>> dev/cxgbe/t4_netmap.c optional cxgbe pci \ >>>>>> >>>>>> Modified: head/sys/dev/cxgbe/adapter.h >>>>>> >>>>>> =========================================================================== >>>>>> >>>>>> >>>>>> >>>>>> === >>>>>> --- head/sys/dev/cxgbe/adapter.h Wed Dec 31 22:52:43 >>>>>> 2014 >>>>>> (r276484) >>>>>> +++ head/sys/dev/cxgbe/adapter.h Wed Dec 31 23:19:16 >>>>>> 2014 >>>>>> (r276485) >>>>>> @@ -152,7 +152,8 @@ enum { >>>>>> CL_METADATA_SIZE = CACHE_LINE_SIZE, >>>>>> >>>>>> SGE_MAX_WR_NDESC = SGE_MAX_WR_LEN / EQ_ESIZE, /* max WR >>>>>> size in >>>>>> desc */ >>>>>> - TX_SGL_SEGS = 36, >>>>>> + TX_SGL_SEGS = 39, >>>>>> + TX_SGL_SEGS_TSO = 38, >>>>>> TX_WR_FLITS = SGE_MAX_WR_LEN / 8 >>>>>> }; >>>>>> >>>>>> @@ -273,6 +274,7 @@ struct port_info { >>>>>> struct timeval last_refreshed; >>>>>> struct port_stats stats; >>>>>> u_int tnl_cong_drops; >>>>>> + u_int tx_parse_error; >>>>>> >>>>>> eventhandler_tag vlan_c; >>>>>> >>>>>> @@ -308,23 +310,9 @@ struct tx_desc { >>>>>> __be64 flit[8]; >>>>>> }; >>>>>> >>>>>> -struct tx_map { >>>>>> - struct mbuf *m; >>>>>> - bus_dmamap_t map; >>>>>> -}; >>>>>> - >>>>>> -/* DMA maps used for tx */ >>>>>> -struct tx_maps { >>>>>> - struct tx_map *maps; >>>>>> - uint32_t map_total; /* # of DMA maps */ >>>>>> - uint32_t map_pidx; /* next map to be used */ >>>>>> - uint32_t map_cidx; /* reclaimed up to this >>>>>> index */ >>>>>> - uint32_t map_avail; /* # of available maps */ >>>>>> -}; >>>>>> - >>>>>> struct tx_sdesc { >>>>>> + struct mbuf *m; /* m_nextpkt linked chain of >>>>>> frames */ >>>>>> uint8_t desc_used; /* # of hardware descriptors >>>>>> used by the WR >>>>>> */ >>>>>> - uint8_t credits; /* NIC txq: # of frames sent >>>>>> out >>>>>> in the WR >>>>>> */ >>>>>> }; >>>>>> >>>>>> >>>>>> @@ -378,16 +366,12 @@ struct sge_iq { >>>>>> enum { >>>>>> EQ_CTRL = 1, >>>>>> EQ_ETH = 2, >>>>>> -#ifdef TCP_OFFLOAD >>>>>> EQ_OFLD = 3, >>>>>> -#endif >>>>>> >>>>>> /* eq flags */ >>>>>> - EQ_TYPEMASK = 7, /* 3 lsbits hold the >>>>>> type */ >>>>>> - EQ_ALLOCATED = (1 << 3), /* firmware resources >>>>>> allocated */ >>>>>> - EQ_DOOMED = (1 << 4), /* about to be >>>>>> destroyed */ >>>>>> - EQ_CRFLUSHED = (1 << 5), /* expecting an update >>>>>> from SGE */ >>>>>> - EQ_STALLED = (1 << 6), /* out of hw >>>>>> descriptors >>>>>> or dmamaps >>>>>> */ >>>>>> + EQ_TYPEMASK = 0x3, /* 2 lsbits hold the >>>>>> type (see >>>>>> above) */ >>>>>> + EQ_ALLOCATED = (1 << 2), /* firmware resources >>>>>> allocated */ >>>>>> + EQ_ENABLED = (1 << 3), /* open for business */ >>>>>> }; >>>>>> >>>>>> /* Listed in order of preference. Update t4_sysctls too if >>>>>> you >>>>>> change >>>>>> these */ >>>>>> @@ -402,32 +386,25 @@ enum {DOORBELL_UDB, DOORBELL_WCWR, DOORB >>>>>> struct sge_eq { >>>>>> unsigned int flags; /* MUST be first */ >>>>>> unsigned int cntxt_id; /* SGE context id for the eq */ >>>>>> - bus_dma_tag_t desc_tag; >>>>>> - bus_dmamap_t desc_map; >>>>>> - char lockname[16]; >>>>>> struct mtx eq_lock; >>>>>> >>>>>> struct tx_desc *desc; /* KVA of descriptor ring */ >>>>>> - bus_addr_t ba; /* bus address of descriptor >>>>>> ring */ >>>>>> - struct sge_qstat *spg; /* status page, for >>>>>> convenience */ >>>>>> uint16_t doorbells; >>>>>> volatile uint32_t *udb; /* KVA of doorbell (lies within >>>>>> BAR2) */ >>>>>> u_int udb_qid; /* relative qid within the >>>>>> doorbell page */ >>>>>> - uint16_t cap; /* max # of desc, for >>>>>> convenience */ >>>>>> - uint16_t avail; /* available descriptors, for >>>>>> convenience * >>>>>> / >>>>>> - uint16_t qsize; /* size (# of entries) of the >>>>>> queue */ >>>>>> + uint16_t sidx; /* index of the entry with the >>>>>> status page >>>>>> */ >>>>>> uint16_t cidx; /* consumer idx (desc idx) */ >>>>>> uint16_t pidx; /* producer idx (desc idx) */ >>>>>> - uint16_t pending; /* # of descriptors used since >>>>>> last >>>>>> doorbell */ >>>>>> + uint16_t equeqidx; /* EQUEQ last requested at this >>>>>> pidx */ >>>>>> + uint16_t dbidx; /* pidx of the most recent >>>>>> doorbell */ >>>>>> uint16_t iqid; /* iq that gets egr_update for >>>>>> the eq */ >>>>>> uint8_t tx_chan; /* tx channel used by the eq */ >>>>>> - struct task tx_task; >>>>>> - struct callout tx_callout; >>>>>> + volatile u_int equiq; /* EQUIQ outstanding */ >>>>>> >>>>>> - /* stats */ >>>>>> - >>>>>> - uint32_t egr_update; /* # of SGE_EGR_UPDATE >>>>>> notifications for eq >>>>>> */ >>>>>> - uint32_t unstalled; /* recovered from stall */ >>>>>> + bus_dma_tag_t desc_tag; >>>>>> + bus_dmamap_t desc_map; >>>>>> + bus_addr_t ba; /* bus address of descriptor >>>>>> ring */ >>>>>> + char lockname[16]; >>>>>> }; >>>>>> >>>>>> struct sw_zone_info { >>>>>> @@ -499,18 +476,19 @@ struct sge_fl { >>>>>> struct cluster_layout cll_alt; /* alternate refill >>>>>> zone, layout */ >>>>>> }; >>>>>> >>>>>> +struct mp_ring; >>>>>> + >>>>>> /* txq: SGE egress queue + what's needed for Ethernet NIC */ >>>>>> struct sge_txq { >>>>>> struct sge_eq eq; /* MUST be first */ >>>>>> >>>>>> struct ifnet *ifp; /* the interface this txq >>>>>> belongs to */ >>>>>> - bus_dma_tag_t tx_tag; /* tag for transmit buffers */ >>>>>> - struct buf_ring *br; /* tx buffer ring */ >>>>>> + struct mp_ring *r; /* tx software ring */ >>>>>> struct tx_sdesc *sdesc; /* KVA of software descriptor >>>>>> ring */ >>>>>> - struct mbuf *m; /* held up due to temporary >>>>>> resource >>>>>> shortage */ >>>>>> - >>>>>> - struct tx_maps txmaps; >>>>>> + struct sglist *gl; >>>>>> + __be32 cpl_ctrl0; /* for convenience */ >>>>>> >>>>>> + struct task tx_reclaim_task; >>>>>> /* stats for common events first */ >>>>>> >>>>>> uint64_t txcsum; /* # of times hardware assisted >>>>>> with >>>>>> checksum */ >>>>>> @@ -519,13 +497,12 @@ struct sge_txq { >>>>>> uint64_t imm_wrs; /* # of work requests with >>>>>> immediate data * >>>>>> / >>>>>> uint64_t sgl_wrs; /* # of work requests with >>>>>> direct SGL */ >>>>>> uint64_t txpkt_wrs; /* # of txpkt work requests >>>>>> (not >>>>>> coalesced) >>>>>> */ >>>>>> - uint64_t txpkts_wrs; /* # of coalesced tx work >>>>>> requests */ >>>>>> - uint64_t txpkts_pkts; /* # of frames in coalesced tx >>>>>> work >>>>>> requests */ >>>>>> + uint64_t txpkts0_wrs; /* # of type0 coalesced tx work >>>>>> requests */ >>>>>> + uint64_t txpkts1_wrs; /* # of type1 coalesced tx work >>>>>> requests */ >>>>>> + uint64_t txpkts0_pkts; /* # of frames in type0 >>>>>> coalesced tx WRs */ >>>>>> + uint64_t txpkts1_pkts; /* # of frames in type1 >>>>>> coalesced tx WRs */ >>>>>> >>>>>> /* stats for not-that-common events */ >>>>>> - >>>>>> - uint32_t no_dmamap; /* no DMA map to load the >>>>>> mbuf */ >>>>>> - uint32_t no_desc; /* out of hardware >>>>>> descriptors */ >>>>>> } __aligned(CACHE_LINE_SIZE); >>>>>> >>>>>> /* rxq: SGE ingress queue + SGE free list + miscellaneous >>>>>> items */ >>>>>> @@ -574,7 +551,13 @@ struct wrqe { >>>>>> STAILQ_ENTRY(wrqe) link; >>>>>> struct sge_wrq *wrq; >>>>>> int wr_len; >>>>>> - uint64_t wr[] __aligned(16); >>>>>> + char wr[] __aligned(16); >>>>>> +}; >>>>>> + >>>>>> +struct wrq_cookie { >>>>>> + TAILQ_ENTRY(wrq_cookie) link; >>>>>> + int ndesc; >>>>>> + int pidx; >>>>>> }; >>>>>> >>>>>> /* >>>>>> @@ -585,17 +568,32 @@ struct sge_wrq { >>>>>> struct sge_eq eq; /* MUST be first */ >>>>>> >>>>>> struct adapter *adapter; >>>>>> + struct task wrq_tx_task; >>>>>> + >>>>>> + /* Tx desc reserved but WR not "committed" yet. */ >>>>>> + TAILQ_HEAD(wrq_incomplete_wrs , wrq_cookie) >>>>>> incomplete_wrs; >>>>>> >>>>>> - /* List of WRs held up due to lack of tx descriptors */ >>>>>> + /* List of WRs ready to go out as soon as >>>>>> descriptors are >>>>>> available. */ >>>>>> STAILQ_HEAD(, wrqe) wr_list; >>>>>> + u_int nwr_pending; >>>>>> + u_int ndesc_needed; >>>>>> >>>>>> /* stats for common events first */ >>>>>> >>>>>> - uint64_t tx_wrs; /* # of tx work requests */ >>>>>> + uint64_t tx_wrs_direct; /* # of WRs written directly to >>>>>> desc ring. >>>>>> */ >>>>>> + uint64_t tx_wrs_ss; /* # of WRs copied from scratch >>>>>> space. */ >>>>>> + uint64_t tx_wrs_copied; /* # of WRs queued and >>>>>> copied to >>>>>> desc ring. >>>>>> */ >>>>>> >>>>>> /* stats for not-that-common events */ >>>>>> >>>>>> - uint32_t no_desc; /* out of hardware >>>>>> descriptors */ >>>>>> + /* >>>>>> + * Scratch space for work requests that wrap around >>>>>> after reaching >>>>>> the >>>>>> + * status page, and some infomation about the last WR >>>>>> that used it. >>>>>> + */ >>>>>> + uint16_t ss_pidx; >>>>>> + uint16_t ss_len; >>>>>> + uint8_t ss[SGE_MAX_WR_LEN]; >>>>>> + >>>>>> } __aligned(CACHE_LINE_SIZE); >>>>>> >>>>>> >>>>>> @@ -744,7 +742,7 @@ struct adapter { >>>>>> struct sge sge; >>>>>> int lro_timeout; >>>>>> >>>>>> - struct taskqueue *tq[NCHAN]; /* taskqueues that >>>>>> flush >>>>>> data out * >>>>>> / >>>>>> + struct taskqueue *tq[NCHAN]; /* General purpose >>>>>> taskqueues */ >>>>>> struct port_info *port[MAX_NPORTS]; >>>>>> uint8_t chan_map[NCHAN]; >>>>>> >>>>>> @@ -978,12 +976,11 @@ static inline int >>>>>> tx_resume_threshold(struct sge_eq *eq) >>>>>> { >>>>>> >>>>>> - return (eq->qsize / 4); >>>>>> + /* not quite the same as qsize / 4, but this will >>>>>> do. */ >>>>>> + return (eq->sidx / 4); >>>>>> } >>>>>> >>>>>> /* t4_main.c */ >>>>>> -void t4_tx_task(void *, int); >>>>>> -void t4_tx_callout(void *); >>>>>> int t4_os_find_pci_capability(struct adapter *, int); >>>>>> int t4_os_pci_save_state(struct adapter *); >>>>>> int t4_os_pci_restore_state(struct adapter *); >>>>>> @@ -1024,16 +1021,15 @@ int t4_setup_adapter_queues(struct >>>>>> adapt >>>>>> int t4_teardown_adapter_queues(struct adapter *); >>>>>> int t4_setup_port_queues(struct port_info *); >>>>>> int t4_teardown_port_queues(struct port_info *); >>>>>> -int t4_alloc_tx_maps(struct tx_maps *, bus_dma_tag_t, int, >>>>>> int); >>>>>> -void t4_free_tx_maps(struct tx_maps *, bus_dma_tag_t); >>>>>> void t4_intr_all(void *); >>>>>> void t4_intr(void *); >>>>>> void t4_intr_err(void *); >>>>>> void t4_intr_evt(void *); >>>>>> void t4_wrq_tx_locked(struct adapter *, struct sge_wrq *, >>>>>> struct wrqe *); >>>>>> -int t4_eth_tx(struct ifnet *, struct sge_txq *, struct mbuf >>>>>> *); >>>>>> void t4_update_fl_bufsize(struct ifnet *); >>>>>> -int can_resume_tx(struct sge_eq *); >>>>>> +int parse_pkt(struct mbuf **); >>>>>> +void *start_wrq_wr(struct sge_wrq *, int, struct wrq_cookie >>>>>> *); >>>>>> +void commit_wrq_wr(struct sge_wrq *, void *, struct >>>>>> wrq_cookie *); >>>>>> >>>>>> /* t4_tracer.c */ >>>>>> struct t4_tracer; >>>>>> >>>>>> Modified: head/sys/dev/cxgbe/t4_l2t.c >>>>>> >>>>>> =========================================================================== >>>>>> >>>>>> >>>>>> >>>>>> === >>>>>> --- head/sys/dev/cxgbe/t4_l2t.c Wed Dec 31 22:52:43 2014 >>>>>> (r276484) >>>>>> +++ head/sys/dev/cxgbe/t4_l2t.c Wed Dec 31 23:19:16 2014 >>>>>> (r276485) >>>>>> @@ -113,16 +113,15 @@ found: >>>>>> int >>>>>> t4_write_l2e(struct adapter *sc, struct l2t_entry *e, int >>>>>> sync) >>>>>> { >>>>>> - struct wrqe *wr; >>>>>> + struct wrq_cookie cookie; >>>>>> struct cpl_l2t_write_req *req; >>>>>> int idx = e->idx + sc->vres.l2t.start; >>>>>> >>>>>> mtx_assert(&e->lock, MA_OWNED); >>>>>> >>>>>> - wr = alloc_wrqe(sizeof(*req), &sc->sge.mgmtq); >>>>>> - if (wr == NULL) >>>>>> + req = start_wrq_wr(&sc->sge.mgmtq, >>>>>> howmany(sizeof(*req), >>>>>> 16), & >>>>>> cookie); >>>>>> + if (req == NULL) >>>>>> return (ENOMEM); >>>>>> - req = wrtod(wr); >>>>>> >>>>>> INIT_TP_WR(req, 0); >>>>>> OPCODE_TID(req) = >>>>>> htonl(MK_OPCODE_TID(CPL_L2T_WRITE_REQ, >>>>>> idx | >>>>>> @@ -132,7 +131,7 @@ t4_write_l2e(struct adapter *sc, struct >>>>>> req->vlan = htons(e->vlan); >>>>>> memcpy(req->dst_mac, e->dmac, sizeof(req->dst_mac)); >>>>>> >>>>>> - t4_wrq_tx(sc, wr); >>>>>> + commit_wrq_wr(&sc->sge.mgmtq, req, &cookie); >>>>>> >>>>>> if (sync && e->state != L2T_STATE_SWITCHING) >>>>>> e->state = L2T_STATE_SYNC_WRITE; >>>>>> >>>>>> Modified: head/sys/dev/cxgbe/t4_main.c >>>>>> >>>>>> =========================================================================== >>>>>> >>>>>> >>>>>> >>>>>> === >>>>>> --- head/sys/dev/cxgbe/t4_main.c Wed Dec 31 22:52:43 >>>>>> 2014 >>>>>> (r276484) >>>>>> +++ head/sys/dev/cxgbe/t4_main.c Wed Dec 31 23:19:16 >>>>>> 2014 >>>>>> (r276485) >>>>>> @@ -66,6 +66,7 @@ __FBSDID("$FreeBSD$"); >>>>>> #include "common/t4_regs_values.h" >>>>>> #include "t4_ioctl.h" >>>>>> #include "t4_l2t.h" >>>>>> +#include "t4_mp_ring.h" >>>>>> >>>>>> /* T4 bus driver interface */ >>>>>> static int t4_probe(device_t); >>>>>> @@ -378,7 +379,8 @@ static void build_medialist(struct port_ >>>>>> static int cxgbe_init_synchronized(struct port_info *); >>>>>> static int cxgbe_uninit_synchronized(struct port_info *); >>>>>> static int setup_intr_handlers(struct adapter *); >>>>>> -static void quiesce_eq(struct adapter *, struct sge_eq *); >>>>>> +static void quiesce_txq(struct adapter *, struct sge_txq *); >>>>>> +static void quiesce_wrq(struct adapter *, struct sge_wrq *); >>>>>> static void quiesce_iq(struct adapter *, struct sge_iq *); >>>>>> static void quiesce_fl(struct adapter *, struct sge_fl *); >>>>>> static int t4_alloc_irq(struct adapter *, struct irq *, int >>>>>> rid, >>>>>> @@ -434,7 +436,6 @@ static int sysctl_tx_rate(SYSCTL_HANDLER >>>>>> static int sysctl_ulprx_la(SYSCTL_HANDLER_ARGS); >>>>>> static int sysctl_wcwr_stats(SYSCTL_HANDLER_ARGS); >>>>>> #endif >>>>>> -static inline void txq_start(struct ifnet *, struct sge_txq >>>>>> *); >>>>>> static uint32_t fconf_to_mode(uint32_t); >>>>>> static uint32_t mode_to_fconf(uint32_t); >>>>>> static uint32_t fspec_to_fconf(struct t4_filter_specification >>>>>> *); >>>>>> @@ -1429,67 +1430,36 @@ cxgbe_transmit(struct ifnet *ifp, >>>>>> struct >>>>>> { >>>>>> struct port_info *pi = ifp->if_softc; >>>>>> struct adapter *sc = pi->adapter; >>>>>> - struct sge_txq *txq = &sc->sge.txq[pi->first_txq]; >>>>>> - struct buf_ring *br; >>>>>> + struct sge_txq *txq; >>>>>> + void *items[1]; >>>>>> int rc; >>>>>> >>>>>> M_ASSERTPKTHDR(m); >>>>>> + MPASS(m->m_nextpkt == NULL); /* not quite ready for >>>>>> this yet */ >>>>>> >>>>>> if (__predict_false(pi->link_cfg.link_ok == 0)) { >>>>>> m_freem(m); >>>>>> return (ENETDOWN); >>>>>> } >>>>>> >>>>>> - /* check if flowid is set */ >>>>>> - if (M_HASHTYPE_GET(m) != M_HASHTYPE_NONE) >>>>>> - txq += ((m->m_pkthdr.flowid % (pi->ntxq - pi-> >>>>>> rsrv_noflowq)) >>>>>> - + pi->rsrv_noflowq); >>>>>> - br = txq->br; >>>>>> - >>>>>> - if (TXQ_TRYLOCK(txq) == 0) { >>>>>> - struct sge_eq *eq = &txq->eq; >>>>>> - >>>>>> - /* >>>>>> - * It is possible that t4_eth_tx finishes up >>>>>> and >>>>>> releases >>>>>> the >>>>>> - * lock between the TRYLOCK above and the >>>>>> drbr_enqueue >>>>>> here. We >>>>>> - * need to make sure that this mbuf doesn't >>>>>> just >>>>>> sit there >>>>>> in >>>>>> - * the drbr. >>>>>> - */ >>>>>> - >>>>>> - rc = drbr_enqueue(ifp, br, m); >>>>>> - if (rc == 0 && callout_pending(&eq->tx_callout) >>>>>> == 0 && >>>>>> - !(eq->flags & EQ_DOOMED)) >>>>>> - callout_reset(&eq->tx_callout, 1, >>>>>> t4_tx_callout, >>>>>> eq); >>>>>> + rc = parse_pkt(&m); >>>>>> + if (__predict_false(rc != 0)) { >>>>>> + MPASS(m == NULL); /* was >>>>>> freed >>>>>> already */ >>>>>> + atomic_add_int(&pi->tx_parse_error, 1); /* rare, >>>>>> atomic is >>>>>> ok */ >>>>>> return (rc); >>>>>> } >>>>>> >>>>>> - /* >>>>>> - * txq->m is the mbuf that is held up due to a >>>>>> temporary >>>>>> shortage >>>>>> of >>>>>> - * resources and it should be put on the wire first. >>>>>> Then what's >>>>>> in >>>>>> - * drbr and finally the mbuf that was just passed in >>>>>> to us. >>>>>> - * >>>>>> - * Return code should indicate the fate of the mbuf >>>>>> that >>>>>> was passed >>>>>> in >>>>>> - * this time. >>>>>> - */ >>>>>> - >>>>>> - TXQ_LOCK_ASSERT_OWNED(txq); >>>>>> - if (drbr_needs_enqueue(ifp, br) || txq->m) { >>>>>> - >>>>>> - /* Queued for transmission. */ >>>>>> - >>>>>> - rc = drbr_enqueue(ifp, br, m); >>>>>> - m = txq->m ? txq->m : drbr_dequeue(ifp, br); >>>>>> - (void) t4_eth_tx(ifp, txq, m); >>>>>> - TXQ_UNLOCK(txq); >>>>>> - return (rc); >>>>>> - } >>>>>> + /* Select a txq. */ >>>>>> + txq = &sc->sge.txq[pi->first_txq]; >>>>>> + if (M_HASHTYPE_GET(m) != M_HASHTYPE_NONE) >>>>>> + txq += ((m->m_pkthdr.flowid % (pi->ntxq - pi-> >>>>>> rsrv_noflowq)) + >>>>>> + pi->rsrv_noflowq); >>>>>> >>>>>> - /* Direct transmission. */ >>>>>> - rc = t4_eth_tx(ifp, txq, m); >>>>>> - if (rc != 0 && txq->m) >>>>>> - rc = 0; /* held, will be transmitted soon >>>>>> (hopefully) */ >>>>>> + items[0] = m; >>>>>> + rc = mp_ring_enqueue(txq->r, items, 1, 4096); >>>>>> + if (__predict_false(rc != 0)) >>>>>> + m_freem(m); >>>>>> >>>>>> - TXQ_UNLOCK(txq); >>>>>> return (rc); >>>>>> } >>>>>> >>>>>> @@ -1499,17 +1469,17 @@ cxgbe_qflush(struct ifnet *ifp) >>>>>> struct port_info *pi = ifp->if_softc; >>>>>> struct sge_txq *txq; >>>>>> int i; >>>>>> - struct mbuf *m; >>>>>> >>>>>> /* queues do not exist if !PORT_INIT_DONE. */ >>>>>> if (pi->flags & PORT_INIT_DONE) { >>>>>> for_each_txq(pi, i, txq) { >>>>>> TXQ_LOCK(txq); >>>>>> - m_freem(txq->m); >>>>>> - txq->m = NULL; >>>>>> - while ((m = >>>>>> buf_ring_dequeue_sc(txq->br)) != NULL) >>>>>> - m_freem(m); >>>>>> + txq->eq.flags &= ~EQ_ENABLED; >>>>>> TXQ_UNLOCK(txq); >>>>>> + while (!mp_ring_is_idle(txq->r)) { >>>>>> + mp_ring_check_drainage(txq->r, 0); >>>>>> + pause("qflush", 1); >>>>>> + } >>>>>> } >>>>>> } >>>>>> if_qflush(ifp); >>>>>> @@ -1564,7 +1534,7 @@ cxgbe_get_counter(struct ifnet *ifp, ift >>>>>> struct sge_txq *txq; >>>>>> >>>>>> for_each_txq(pi, i, txq) >>>>>> - drops += txq->br->br_drops; >>>>>> + drops += >>>>>> counter_u64_fetch(txq->r->drops); >>>>>> } >>>>>> >>>>>> return (drops); >>>>>> @@ -3236,7 +3206,8 @@ cxgbe_init_synchronized(struct port_info >>>>>> { >>>>>> struct adapter *sc = pi->adapter; >>>>>> struct ifnet *ifp = pi->ifp; >>>>>> - int rc = 0; >>>>>> + int rc = 0, i; >>>>>> + struct sge_txq *txq; >>>>>> >>>>>> ASSERT_SYNCHRONIZED_OP(sc); >>>>>> >>>>>> @@ -3265,6 +3236,17 @@ cxgbe_init_synchronized(struct port_info >>>>>> } >>>>>> >>>>>> /* >>>>>> + * Can't fail from this point onwards. Review >>>>>> cxgbe_uninit_synchronized >>>>>> + * if this changes. >>>>>> + */ >>>>>> + >>>>>> + for_each_txq(pi, i, txq) { >>>>>> + TXQ_LOCK(txq); >>>>>> + txq->eq.flags |= EQ_ENABLED; >>>>>> + TXQ_UNLOCK(txq); >>>>>> + } >>>>>> + >>>>>> + /* >>>>>> * The first iq of the first port to come up is used >>>>>> for >>>>>> tracing. >>>>>> */ >>>>>> if (sc->traceq < 0) { >>>>>> @@ -3297,7 +3279,8 @@ cxgbe_uninit_synchronized(struct port_in >>>>>> { >>>>>> struct adapter *sc = pi->adapter; >>>>>> struct ifnet *ifp = pi->ifp; >>>>>> - int rc; >>>>>> + int rc, i; >>>>>> + struct sge_txq *txq; >>>>>> >>>>>> ASSERT_SYNCHRONIZED_OP(sc); >>>>>> >>>>>> @@ -3314,6 +3297,12 @@ cxgbe_uninit_synchronized(struct port_in >>>>>> return (rc); >>>>>> } >>>>>> >>>>>> + for_each_txq(pi, i, txq) { >>>>>> + TXQ_LOCK(txq); >>>>>> + txq->eq.flags &= ~EQ_ENABLED; >>>>>> + TXQ_UNLOCK(txq); >>>>>> + } >>>>>> + >>>>>> clrbit(&sc->open_device_map, pi->port_id); >>>>>> PORT_LOCK(pi); >>>>>> ifp->if_drv_flags &= ~IFF_DRV_RUNNING; >>>>>> @@ -3543,15 +3532,17 @@ port_full_uninit(struct port_info *pi) >>>>>> >>>>>> if (pi->flags & PORT_INIT_DONE) { >>>>>> >>>>>> - /* Need to quiesce queues. XXX: ctrl >>>>>> queues? */ >>>>>> + /* Need to quiesce queues. */ >>>>>> + >>>>>> + quiesce_wrq(sc, &sc->sge.ctrlq[pi->port_id]); >>>>>> >>>>>> for_each_txq(pi, i, txq) { >>>>>> - quiesce_eq(sc, &txq->eq); >>>>>> + quiesce_txq(sc, txq); >>>>>> } >>>>>> >>>>>> #ifdef TCP_OFFLOAD >>>>>> for_each_ofld_txq(pi, i, ofld_txq) { >>>>>> - quiesce_eq(sc, &ofld_txq->eq); >>>>>> + quiesce_wrq(sc, ofld_txq); >>>>>> } >>>>>> #endif >>>>>> >>>>>> @@ -3576,23 +3567,39 @@ port_full_uninit(struct port_info *pi) >>>>>> } >>>>>> >>>>>> static void >>>>>> -quiesce_eq(struct adapter *sc, struct sge_eq *eq) >>>>>> +quiesce_txq(struct adapter *sc, struct sge_txq *txq) >>>>>> { >>>>>> - EQ_LOCK(eq); >>>>>> - eq->flags |= EQ_DOOMED; >>>>>> + struct sge_eq *eq = &txq->eq; >>>>>> + struct sge_qstat *spg = (void *)&eq->desc[eq->sidx]; >>>>>> >>>>>> - /* >>>>>> - * Wait for the response to a credit flush if one's >>>>>> - * pending. >>>>>> - */ >>>>>> - while (eq->flags & EQ_CRFLUSHED) >>>>>> - mtx_sleep(eq, &eq->eq_lock, 0, "crflush", 0); >>>>>> - EQ_UNLOCK(eq); >>>>>> + (void) sc; /* unused */ >>>>>> >>>>>> - callout_drain(&eq->tx_callout); /* XXX: iffy */ >>>>>> - pause("callout", 10); /* Still iffy */ >>>>>> +#ifdef INVARIANTS >>>>>> + TXQ_LOCK(txq); >>>>>> + MPASS((eq->flags & EQ_ENABLED) == 0); >>>>>> + TXQ_UNLOCK(txq); >>>>>> +#endif >>>>>> >>>>>> - taskqueue_drain(sc->tq[eq->tx_chan], &eq->tx_task); >>>>>> + /* Wait for the mp_ring to empty. */ >>>>>> + while (!mp_ring_is_idle(txq->r)) { >>>>>> + mp_ring_check_drainage(txq->r, 0); >>>>>> + pause("rquiesce", 1); >>>>>> + } >>>>>> + >>>>>> + /* Then wait for the hardware to finish. */ >>>>>> + while (spg->cidx != htobe16(eq->pidx)) >>>>>> + pause("equiesce", 1); >>>>>> + >>>>>> + /* Finally, wait for the driver to reclaim all >>>>>> descriptors. */ >>>>>> + while (eq->cidx != eq->pidx) >>>>>> + pause("dquiesce", 1); >>>>>> +} >>>>>> + >>>>>> +static void >>>>>> +quiesce_wrq(struct adapter *sc, struct sge_wrq *wrq) >>>>>> +{ >>>>>> + >>>>>> + /* XXXTX */ >>>>>> } >>>>>> >>>>>> static void >>>>>> @@ -4892,6 +4899,9 @@ cxgbe_sysctls(struct port_info *pi) >>>>>> oid = SYSCTL_ADD_NODE(ctx, children, OID_AUTO, "stats", >>>>>> CTLFLAG_RD, >>>>>> NULL, "port statistics"); >>>>>> children = SYSCTL_CHILDREN(oid); >>>>>> + SYSCTL_ADD_UINT(ctx, children, OID_AUTO, >>>>>> "tx_parse_error", >>>>>> CTLFLAG_RD, >>>>>> + &pi->tx_parse_error, 0, >>>>>> + "# of tx packets with invalid length or # of >>>>>> segments"); >>>>>> >>>>>> #define SYSCTL_ADD_T4_REG64(pi, name, desc, reg) \ >>>>>> SYSCTL_ADD_OID(ctx, children, OID_AUTO, name, \ >>>>>> @@ -6947,74 +6957,6 @@ sysctl_wcwr_stats(SYSCTL_HANDLER_ARGS) >>>>>> } >>>>>> #endif >>>>>> >>>>>> -static inline void >>>>>> -txq_start(struct ifnet *ifp, struct sge_txq *txq) >>>>>> -{ >>>>>> - struct buf_ring *br; >>>>>> - struct mbuf *m; >>>>>> - >>>>>> - TXQ_LOCK_ASSERT_OWNED(txq); >>>>>> - >>>>>> - br = txq->br; >>>>>> - m = txq->m ? txq->m : drbr_dequeue(ifp, br); >>>>>> - if (m) >>>>>> - t4_eth_tx(ifp, txq, m); >>>>>> -} >>>>>> - >>>>>> -void >>>>>> -t4_tx_callout(void *arg) >>>>>> -{ >>>>>> - struct sge_eq *eq = arg; >>>>>> - struct adapter *sc; >>>>>> - >>>>>> - if (EQ_TRYLOCK(eq) == 0) >>>>>> - goto reschedule; >>>>>> - >>>>>> - if (eq->flags & EQ_STALLED && !can_resume_tx(eq)) { >>>>>> - EQ_UNLOCK(eq); >>>>>> -reschedule: >>>>>> - if (__predict_true(!(eq->flags && EQ_DOOMED))) >>>>>> - callout_schedule(&eq->tx_callout, 1); >>>>>> - return; >>>>>> - } >>>>>> - >>>>>> - EQ_LOCK_ASSERT_OWNED(eq); >>>>>> - >>>>>> - if (__predict_true((eq->flags & EQ_DOOMED) == 0)) { >>>>>> - >>>>>> - if ((eq->flags & EQ_TYPEMASK) == EQ_ETH) { >>>>>> - struct sge_txq *txq = arg; >>>>>> - struct port_info *pi = >>>>>> txq->ifp->if_softc; >>>>>> - >>>>>> - sc = pi->adapter; >>>>>> - } else { >>>>>> - struct sge_wrq *wrq = arg; >>>>>> - >>>>>> - sc = wrq->adapter; >>>>>> - } >>>>>> - >>>>>> - taskqueue_enqueue(sc->tq[eq->tx_chan], >>>>>> &eq->tx_task); >>>>>> - } >>>>>> - >>>>>> - EQ_UNLOCK(eq); >>>>>> -} >>>>>> - >>>>>> -void >>>>>> -t4_tx_task(void *arg, int count) >>>>>> -{ >>>>>> - struct sge_eq *eq = arg; >>>>>> - >>>>>> - EQ_LOCK(eq); >>>>>> - if ((eq->flags & EQ_TYPEMASK) == EQ_ETH) { >>>>>> - struct sge_txq *txq = arg; >>>>>> - txq_start(txq->ifp, txq); >>>>>> - } else { >>>>>> - struct sge_wrq *wrq = arg; >>>>>> - t4_wrq_tx_locked(wrq->adapter, wrq, NULL); >>>>>> - } >>>>>> - EQ_UNLOCK(eq); >>>>>> -} >>>>>> - >>>>>> static uint32_t >>>>>> fconf_to_mode(uint32_t fconf) >>>>>> { >>>>>> @@ -7452,9 +7394,9 @@ static int >>>>>> set_filter_wr(struct adapter *sc, int fidx) >>>>>> { >>>>>> struct filter_entry *f = &sc->tids.ftid_tab[fidx]; >>>>>> - struct wrqe *wr; >>>>>> struct fw_filter_wr *fwr; >>>>>> unsigned int ftid; >>>>>> + struct wrq_cookie cookie; >>>>>> >>>>>> ASSERT_SYNCHRONIZED_OP(sc); >>>>>> >>>>>> @@ -7473,12 +7415,10 @@ set_filter_wr(struct adapter *sc, >>>>>> int fi >>>>>> >>>>>> ftid = sc->tids.ftid_base + fidx; >>>>>> >>>>>> - wr = alloc_wrqe(sizeof(*fwr), &sc->sge.mgmtq); >>>>>> - if (wr == NULL) >>>>>> + fwr = start_wrq_wr(&sc->sge.mgmtq, >>>>>> howmany(sizeof(*fwr), >>>>>> 16), & >>>>>> cookie); >>>>>> + if (fwr == NULL) >>>>>> return (ENOMEM); >>>>>> - >>>>>> - fwr = wrtod(wr); >>>>>> - bzero(fwr, sizeof (*fwr)); >>>>>> + bzero(fwr, sizeof(*fwr)); >>>>>> >>>>>> fwr->op_pkd = htobe32(V_FW_WR_OP(FW_FILTER_WR)); >>>>>> fwr->len16_pkd = htobe32(FW_LEN16(*fwr)); >>>>>> @@ -7547,7 +7487,7 @@ set_filter_wr(struct adapter *sc, int fi >>>>>> f->pending = 1; >>>>>> sc->tids.ftids_in_use++; >>>>>> >>>>>> - t4_wrq_tx(sc, wr); >>>>>> + commit_wrq_wr(&sc->sge.mgmtq, fwr, &cookie); >>>>>> return (0); >>>>>> } >>>>>> >>>>>> @@ -7555,22 +7495,21 @@ static int >>>>>> del_filter_wr(struct adapter *sc, int fidx) >>>>>> { >>>>>> struct filter_entry *f = &sc->tids.ftid_tab[fidx]; >>>>>> - struct wrqe *wr; >>>>>> struct fw_filter_wr *fwr; >>>>>> unsigned int ftid; >>>>>> + struct wrq_cookie cookie; >>>>>> >>>>>> ftid = sc->tids.ftid_base + fidx; >>>>>> >>>>>> - wr = alloc_wrqe(sizeof(*fwr), &sc->sge.mgmtq); >>>>>> - if (wr == NULL) >>>>>> + fwr = start_wrq_wr(&sc->sge.mgmtq, >>>>>> howmany(sizeof(*fwr), >>>>>> 16), & >>>>>> cookie); >>>>>> + if (fwr == NULL) >>>>>> return (ENOMEM); >>>>>> - fwr = wrtod(wr); >>>>>> bzero(fwr, sizeof (*fwr)); >>>>>> >>>>>> t4_mk_filtdelwr(ftid, fwr, sc->sge.fwq.abs_id); >>>>>> >>>>>> f->pending = 1; >>>>>> - t4_wrq_tx(sc, wr); >>>>>> + commit_wrq_wr(&sc->sge.mgmtq, fwr, &cookie); >>>>>> return (0); >>>>>> } >>>>>> >>>>>> @@ -8170,6 +8109,7 @@ t4_ioctl(struct cdev *dev, unsigned long >>>>>> >>>>>> /* MAC stats */ >>>>>> t4_clr_port_stats(sc, pi->tx_chan); >>>>>> + pi->tx_parse_error = 0; >>>>>> >>>>>> if (pi->flags & PORT_INIT_DONE) { >>>>>> struct sge_rxq *rxq; >>>>>> @@ -8192,24 +8132,24 @@ t4_ioctl(struct cdev *dev, unsigned >>>>>> long >>>>>> txq->imm_wrs = 0; >>>>>> txq->sgl_wrs = 0; >>>>>> txq->txpkt_wrs = 0; >>>>>> - txq->txpkts_wrs = 0; >>>>>> - txq->txpkts_pkts = 0; >>>>>> - txq->br->br_drops = 0; >>>>>> - txq->no_dmamap = 0; >>>>>> - txq->no_desc = 0; >>>>>> + txq->txpkts0_wrs = 0; >>>>>> + txq->txpkts1_wrs = 0; >>>>>> + txq->txpkts0_pkts = 0; >>>>>> + txq->txpkts1_pkts = 0; >>>>>> + mp_ring_reset_stats(txq->r); >>>>>> } >>>>>> >>>>>> #ifdef TCP_OFFLOAD >>>>>> /* nothing to clear for each >>>>>> ofld_rxq */ >>>>>> >>>>>> for_each_ofld_txq(pi, i, wrq) { >>>>>> - wrq->tx_wrs = 0; >>>>>> - wrq->no_desc = 0; >>>>>> + wrq->tx_wrs_direct = 0; >>>>>> + wrq->tx_wrs_copied = 0; >>>>>> } >>>>>> #endif >>>>>> wrq = &sc->sge.ctrlq[pi->port_id]; >>>>>> - wrq->tx_wrs = 0; >>>>>> - wrq->no_desc = 0; >>>>>> + wrq->tx_wrs_direct = 0; >>>>>> + wrq->tx_wrs_copied = 0; >>>>>> } >>>>>> break; >>>>>> } >>>>>> >>>>>> Added: head/sys/dev/cxgbe/t4_mp_ring.c >>>>>> >>>>>> =========================================================================== >>>>>> >>>>>> >>>>>> >>>>>> === >>>>>> --- /dev/null 00:00:00 1970 (empty, because file is newly >>>>>> added) >>>>>> +++ head/sys/dev/cxgbe/t4_mp_ring.c Wed Dec 31 23:19:16 >>>>>> 2014 >>>>>> (r276485) >>>>>> @@ -0,0 +1,364 @@ >>>>>> +/*- >>>>>> + * Copyright (c) 2014 Chelsio Communications, Inc. >>>>>> + * All rights reserved. >>>>>> + * Written by: Navdeep Parhar <np@FreeBSD.org> >>>>>> + * >>>>>> + * Redistribution and use in source and binary forms, with or >>>>>> without >>>>>> + * modification, are permitted provided that the following >>>>>> conditions >>>>>> + * are met: >>>>>> + * 1. Redistributions of source code must retain the above >>>>>> copyright >>>>>> + * notice, this list of conditions and the following >>>>>> disclaimer. >>>>>> + * 2. Redistributions in binary form must reproduce the above >>>>>> copyright >>>>>> + * notice, this list of conditions and the following >>>>>> disclaimer in the >>>>>> + * documentation and/or other materials provided with the >>>>>> distribution. >>>>>> + * >>>>>> + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS >>>>>> ``AS IS'' AND >>>>>> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT >>>>>> LIMITED TO, THE >>>>>> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A >>>>>> PARTICULAR >>>>>> PURPOSE >>>>>> + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR >>>>>> CONTRIBUTORS BE LIABLE >>>>>> + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, >>>>>> EXEMPLARY, OR >>>>>> CONSEQUENTIAL >>>>>> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF >>>>>> SUBSTITUTE GOODS >>>>>> + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS >>>>>> INTERRUPTION) >>>>>> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN >>>>>> CONTRACT, >>>>>> STRICT >>>>>> + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) >>>>>> ARISING IN ANY >>>>>> WAY >>>>>> + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE >>>>>> POSSIBILITY OF >>>>>> + * SUCH DAMAGE. >>>>>> + */ >>>>>> + >>>>>> +#include <sys/cdefs.h> >>>>>> +__FBSDID("$FreeBSD$"); >>>>>> + >>>>>> +#include <sys/types.h> >>>>>> +#include <sys/param.h> >>>>>> +#include <sys/systm.h> >>>>>> +#include <sys/counter.h> >>>>>> +#include <sys/lock.h> >>>>>> +#include <sys/malloc.h> >>>>>> +#include <machine/cpu.h> >>>>>> + >>>>>> +#include "t4_mp_ring.h" >>>>>> + >>>>>> +union ring_state { >>>>>> + struct { >>>>>> + uint16_t pidx_head; >>>>>> + uint16_t pidx_tail; >>>>>> + uint16_t cidx; >>>>>> + uint16_t flags; >>>>>> + }; >>>>>> + uint64_t state; >>>>>> +}; >>>>>> + >>>>>> +enum { >>>>>> + IDLE = 0, /* consumer ran to completion, nothing >>>>>> more to do. >>>>>> */ >>>>>> + BUSY, /* consumer is running already, or will >>>>>> be shortly. >>>>>> */ >>>>>> + STALLED, /* consumer stopped due to lack of >>>>>> resources. */ >>>>>> + ABDICATED, /* consumer stopped even though there >>>>>> was work to >>>>>> be >>>>>> + done because it wants another thread >>>>>> to take >>>>>> over. */ >>>>>> +}; >>>>>> + >>>>>> +static inline uint16_t >>>>>> +space_available(struct mp_ring *r, union ring_state s) >>>>>> +{ >>>>>> + uint16_t x = r->size - 1; >>>>>> + >>>>>> + if (s.cidx == s.pidx_head) >>>>>> + return (x); >>>>>> + else if (s.cidx > s.pidx_head) >>>>>> + return (s.cidx - s.pidx_head - 1); >>>>>> + else >>>>>> + return (x - s.pidx_head + s.cidx); >>>>>> +} >>>>>> + >>>>>> +static inline uint16_t >>>>>> +increment_idx(struct mp_ring *r, uint16_t idx, uint16_t n) >>>>>> +{ >>>>>> + int x = r->size - idx; >>>>>> + >>>>>> + MPASS(x > 0); >>>>>> + return (x > n ? idx + n : n - x); >>>>>> +} >>>>>> + >>>>>> +/* Consumer is about to update the ring's state to s */ >>>>>> +static inline uint16_t >>>>>> +state_to_flags(union ring_state s, int abdicate) >>>>>> +{ >>>>>> + >>>>>> + if (s.cidx == s.pidx_tail) >>>>>> + return (IDLE); >>>>>> + else if (abdicate && s.pidx_tail != s.pidx_head) >>>>>> + return (ABDICATED); >>>>>> + >>>>>> + return (BUSY); >>>>>> +} >>>>>> + >>>>>> +/* >>>>>> + * Caller passes in a state, with a guarantee that there is >>>>>> work to do and >>>>>> that >>>>>> + * all items up to the pidx_tail in the state are visible. >>>>>> + */ >>>>>> +static void >>>>>> +drain_ring(struct mp_ring *r, union ring_state os, uint16_t >>>>>> prev, int >>>>>> budget) >>>>>> +{ >>>>>> + union ring_state ns; >>>>>> + int n, pending, total; >>>>>> + uint16_t cidx = os.cidx; >>>>>> + uint16_t pidx = os.pidx_tail; >>>>>> + >>>>>> + MPASS(os.flags == BUSY); >>>>>> + MPASS(cidx != pidx); >>>>>> + >>>>>> + if (prev == IDLE) >>>>>> + counter_u64_add(r->starts, 1); >>>>>> + pending = 0; >>>>>> + total = 0; >>>>>> + >>>>>> + while (cidx != pidx) { >>>>>> + >>>>>> + /* Items from cidx to pidx are available for >>>>>> consumption. * >>>>>> / >>>>>> + n = r->drain(r, cidx, pidx); >>>>>> + if (n == 0) { >>>>>> + critical_enter(); >>>>>> + do { >>>>>> + os.state = ns.state = r->state; >>>>>> + ns.cidx = cidx; >>>>>> + ns.flags = STALLED; >>>>>> + } while (atomic_cmpset_64(&r->state, >>>>>> os.state, >>>>>> + ns.state) == 0); >>>>>> + critical_exit(); >>>>>> + if (prev != STALLED) >>>>>> + counter_u64_add(r->stalls, 1); >>>>>> + else if (total > 0) { >>>>>> + counter_u64_add(r->restarts, 1); >>>>>> + counter_u64_add(r->stalls, 1); >>>>>> + } >>>>>> + break; >>>>>> + } >>>>>> + cidx = increment_idx(r, cidx, n); >>>>>> + pending += n; >>>>>> + total += n; >>>>>> + >>>>>> + /* >>>>>> + * We update the cidx only if we've caught up >>>>>> with the >>>>>> pidx, the >>>>>> + * real cidx is getting too far ahead of the >>>>>> one >>>>>> visible to >>>>>> + * everyone else, or we have exceeded our >>>>>> budget. >>>>>> + */ >>>>>> + if (cidx != pidx && pending < 64 && total < >>>>>> budget) >>>>>> + continue; >>>>>> + critical_enter(); >>>>>> + do { >>>>>> + os.state = ns.state = r->state; >>>>>> + ns.cidx = cidx; >>>>>> + ns.flags = state_to_flags(ns, total >= >>>>>> budget); >>>>>> + } while (atomic_cmpset_acq_64(&r->state, >>>>>> os.state, >>>>>> ns.state) == 0); >>>>>> + critical_exit(); >>>>>> + >>>>>> + if (ns.flags == ABDICATED) >>>>>> + counter_u64_add(r->abdications, 1); >>>>>> + if (ns.flags != BUSY) { >>>>>> + /* Wrong loop exit if we're going to >>>>>> stall. */ >>>>>> + MPASS(ns.flags != STALLED); >>>>>> + if (prev == STALLED) { >>>>>> + MPASS(total > 0); >>>>>> + counter_u64_add(r->restarts, 1); >>>>>> + } >>>>>> + break; >>>>>> + } >>>>>> + >>>>>> + /* >>>>>> + * The acquire style atomic above guarantees >>>>>> visibility of >>>>>> items >>>>>> + * associated with any pidx change that we >>>>>> notice here. >>>>>> + */ >>>>>> + pidx = ns.pidx_tail; >>>>>> + pending = 0; >>>>>> + } >>>>>> +} >>>>>> + >>>>>> +int >>>>>> +mp_ring_alloc(struct mp_ring **pr, int size, void *cookie, >>>>>> ring_drain_t >>>>>> drain, >>>>>> + ring_can_drain_t can_drain, struct malloc_type *mt, int >>>>>> flags) >>>>>> +{ >>>>>> + struct mp_ring *r; >>>>>> + >>>>>> + /* All idx are 16b so size can be 65536 at most */ >>>>>> + if (pr == NULL || size < 2 || size > 65536 || drain == >>>>>> NULL || >>>>>> + can_drain == NULL) >>>>>> + return (EINVAL); >>>>>> + *pr = NULL; >>>>>> + flags &= M_NOWAIT | M_WAITOK; >>>>>> + MPASS(flags != 0); >>>>>> + >>>>>> + r = malloc(__offsetof(struct mp_ring, items[size]), mt, >>>>>> flags | >>>>>> M_ZERO); >>>>>> + if (r == NULL) >>>>>> + return (ENOMEM); >>>>>> + r->size = size; >>>>>> + r->cookie = cookie; >>>>>> + r->mt = mt; >>>>>> + r->drain = drain; >>>>>> + r->can_drain = can_drain; >>>>>> + r->enqueues = counter_u64_alloc(flags); >>>>>> + r->drops = counter_u64_alloc(flags); >>>>>> + r->starts = counter_u64_alloc(flags); >>>>>> + r->stalls = counter_u64_alloc(flags); >>>>>> + r->restarts = counter_u64_alloc(flags); >>>>>> + r->abdications = counter_u64_alloc(flags); >>>>>> + if (r->enqueues == NULL || r->drops == NULL || >>>>>> r->starts >>>>>> == NULL || >>>>>> + r->stalls == NULL || r->restarts == NULL || >>>>>> + r->abdications == NULL) { >>>>>> + mp_ring_free(r); >>>>>> + return (ENOMEM); >>>>>> + } >>>>>> + >>>>>> + *pr = r; >>>>>> + return (0); >>>>>> +} >>>>>> + >>>>>> +void >>>>>> + >>>>>> +mp_ring_free(struct mp_ring *r) >>>>>> +{ >>>>>> + >>>>>> + if (r == NULL) >>>>>> + return; >>>>>> + >>>>>> + if (r->enqueues != NULL) >>>>>> + counter_u64_free(r->enqueues); >>>>>> + if (r->drops != NULL) >>>>>> + counter_u64_free(r->drops); >>>>>> + if (r->starts != NULL) >>>>>> + counter_u64_free(r->starts); >>>>>> + if (r->stalls != NULL) >>>>>> + counter_u64_free(r->stalls); >>>>>> + if (r->restarts != NULL) >>>>>> + counter_u64_free(r->restarts); >>>>>> + if (r->abdications != NULL) >>>>>> + counter_u64_free(r->abdications); >>>>>> + >>>>>> + free(r, r->mt); >>>>>> +} >>>>>> + >>>>>> +/* >>>>>> + * Enqueue n items and maybe drain the ring for some time. >>>>>> + * >>>>>> + * Returns an errno. >>>>>> + */ >>>>>> +int >>>>>> +mp_ring_enqueue(struct mp_ring *r, void **items, int n, int >>>>>> budget) >>>>>> +{ >>>>>> + union ring_state os, ns; >>>>>> + uint16_t pidx_start, pidx_stop; >>>>>> + int i; >>>>>> + >>>>>> + MPASS(items != NULL); >>>>>> + MPASS(n > 0); >>>>>> + >>>>>> >>>>>> *** DIFF OUTPUT TRUNCATED AT 1000 LINES *** >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> -----------------------------------------+------------------------------- >>>>>> >>>>>> >>>>>> >>>>>> Prof. Luigi RIZZO, rizzo@iet.unipi.it . Dip. di Ing. >>>>>> dell'Informazione >>>>>> http://www.iet.unipi.it/~luigi/ . Universita` di Pisa >>>>>> TEL +39-050-2211611 . via Diotisalvi 2 >>>>>> Mobile +39-338-6809875 . 56122 PISA (Italy) >>>>>> -----------------------------------------+------------------------------- >>>>>> >>>>>> >>>>>> >>>> >>> >>> >> >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?54BF01F2.4030502>