From owner-svn-src-head@freebsd.org Wed Dec 5 14:25:04 2018 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id AAA761315E23; Wed, 5 Dec 2018 14:25:04 +0000 (UTC) (envelope-from slavash@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 5D111760AA; Wed, 5 Dec 2018 14:25:04 +0000 (UTC) (envelope-from slavash@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 3DEE81598F; Wed, 5 Dec 2018 14:25:04 +0000 (UTC) (envelope-from slavash@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id wB5EP4UG004570; Wed, 5 Dec 2018 14:25:04 GMT (envelope-from slavash@FreeBSD.org) Received: (from slavash@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id wB5EP38T004562; Wed, 5 Dec 2018 14:25:03 GMT (envelope-from slavash@FreeBSD.org) Message-Id: <201812051425.wB5EP38T004562@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: slavash set sender to slavash@FreeBSD.org using -f From: Slava Shwartsman Date: Wed, 5 Dec 2018 14:25:03 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: svn commit: r341586 - head/sys/dev/mlx5/mlx5_en X-SVN-Group: head X-SVN-Commit-Author: slavash X-SVN-Commit-Paths: head/sys/dev/mlx5/mlx5_en X-SVN-Commit-Revision: 341586 X-SVN-Commit-Repository: base MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 5D111760AA X-Spamd-Result: default: False [-0.74 / 15.00]; local_wl_from(0.00)[FreeBSD.org]; NEURAL_HAM_MEDIUM(-0.50)[-0.498,0]; NEURAL_SPAM_LONG(0.01)[0.010,0]; ASN(0.00)[asn:11403, ipnet:2610:1c1:1::/48, country:US]; NEURAL_HAM_SHORT(-0.25)[-0.249,0] X-Rspamd-Server: mx1.freebsd.org X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 05 Dec 2018 14:25:05 -0000 Author: slavash Date: Wed Dec 5 14:25:03 2018 New Revision: 341586 URL: https://svnweb.freebsd.org/changeset/base/341586 Log: mlx5en: Implement backpressure indication. The backpressure indication is implemented using an unlimited rate type of mbuf send tag. When the upper layers typically the socket layer has obtained such a tag, it can then query the destination driver queue for the current amount of space available in the send queue. A single mbuf send tag may be referenced multiple times and a refcount has been added to the mlx5e_priv structure to track its usage. Because the send tag resides in the mlx5e_channel structure, there is no need to wait for refcounts to reach zero until the mlx4en(4) driver is detached. The channels structure is persistant during the lifetime of the mlx5en(4) driver it belongs to and can so be accessed without any need of synchronization. The mlx5e_snd_tag structure was extended to contain a type field, because there are now two different tag types which end up in the driver which need to be distinguished. Submitted by: hselasky@ Approved by: hselasky (mentor) MFC after: 1 week Sponsored by: Mellanox Technologies Modified: head/sys/dev/mlx5/mlx5_en/en.h head/sys/dev/mlx5/mlx5_en/en_rl.h head/sys/dev/mlx5/mlx5_en/mlx5_en_main.c head/sys/dev/mlx5/mlx5_en/mlx5_en_rl.c head/sys/dev/mlx5/mlx5_en/mlx5_en_tx.c Modified: head/sys/dev/mlx5/mlx5_en/en.h ============================================================================== --- head/sys/dev/mlx5/mlx5_en/en.h Wed Dec 5 14:24:33 2018 (r341585) +++ head/sys/dev/mlx5/mlx5_en/en.h Wed Dec 5 14:25:03 2018 (r341586) @@ -580,6 +580,11 @@ enum { MLX5E_SQ_FULL }; +struct mlx5e_snd_tag { + struct m_snd_tag m_snd_tag; /* send tag */ + u32 type; /* tag type */ +}; + struct mlx5e_sq { /* data path */ struct mtx lock; @@ -640,11 +645,27 @@ mlx5e_sq_has_room_for(struct mlx5e_sq *sq, u16 n) return ((sq->wq.sz_m1 & (cc - pc)) >= n || cc == pc); } +static inline u32 +mlx5e_sq_queue_level(struct mlx5e_sq *sq) +{ + u16 cc; + u16 pc; + + if (sq == NULL) + return (0); + + cc = sq->cc; + pc = sq->pc; + + return (((sq->wq.sz_m1 & (pc - cc)) * + IF_SND_QUEUE_LEVEL_MAX) / sq->wq.sz_m1); +} + struct mlx5e_channel { /* data path */ struct mlx5e_rq rq; + struct mlx5e_snd_tag tag; struct mlx5e_sq sq[MLX5E_MAX_TX_NUM_TC]; - struct ifnet *ifp; u32 mkey_be; u8 num_tc; @@ -770,6 +791,7 @@ struct mlx5e_priv { u32 pdn; u32 tdn; struct mlx5_core_mr mr; + volatile unsigned int channel_refs; u32 tisn[MLX5E_MAX_TX_NUM_TC]; u32 rqtn; @@ -907,6 +929,24 @@ mlx5e_cq_arm(struct mlx5e_cq *cq, spinlock_t *dblock) mcq = &cq->mcq; mlx5_cq_arm(mcq, MLX5_CQ_DB_REQ_NOT, mcq->uar->map, dblock, cq->wq.cc); +} + +static inline void +mlx5e_ref_channel(struct mlx5e_priv *priv) +{ + + KASSERT(priv->channel_refs < INT_MAX, + ("Channel refs will overflow")); + atomic_fetchadd_int(&priv->channel_refs, 1); +} + +static inline void +mlx5e_unref_channel(struct mlx5e_priv *priv) +{ + + KASSERT(priv->channel_refs > 0, + ("Channel refs is not greater than zero")); + atomic_fetchadd_int(&priv->channel_refs, -1); } extern const struct ethtool_ops mlx5e_ethtool_ops; Modified: head/sys/dev/mlx5/mlx5_en/en_rl.h ============================================================================== --- head/sys/dev/mlx5/mlx5_en/en_rl.h Wed Dec 5 14:24:33 2018 (r341585) +++ head/sys/dev/mlx5/mlx5_en/en_rl.h Wed Dec 5 14:25:03 2018 (r341586) @@ -129,7 +129,7 @@ struct mlx5e_rl_channel_param { }; struct mlx5e_rl_channel { - struct m_snd_tag m_snd_tag; + struct mlx5e_snd_tag tag; STAILQ_ENTRY(mlx5e_rl_channel) entry; struct mlx5e_sq * volatile sq; struct mlx5e_rl_worker *worker; Modified: head/sys/dev/mlx5/mlx5_en/mlx5_en_main.c ============================================================================== --- head/sys/dev/mlx5/mlx5_en/mlx5_en_main.c Wed Dec 5 14:24:33 2018 (r341585) +++ head/sys/dev/mlx5/mlx5_en/mlx5_en_main.c Wed Dec 5 14:25:03 2018 (r341586) @@ -886,7 +886,7 @@ mlx5e_create_rq(struct mlx5e_channel *c, wq_sz = mlx5_wq_ll_get_size(&rq->wq); - err = -tcp_lro_init_args(&rq->lro, c->ifp, TCP_LRO_ENTRIES, wq_sz); + err = -tcp_lro_init_args(&rq->lro, c->tag.m_snd_tag.ifp, TCP_LRO_ENTRIES, wq_sz); if (err) goto err_rq_wq_destroy; @@ -916,7 +916,7 @@ mlx5e_create_rq(struct mlx5e_channel *c, #endif } - rq->ifp = c->ifp; + rq->ifp = c->tag.m_snd_tag.ifp; rq->channel = c; rq->ix = c->ix; @@ -1778,7 +1778,9 @@ mlx5e_open_channel(struct mlx5e_priv *priv, int ix, c->priv = priv; c->ix = ix; - c->ifp = priv->ifp; + /* setup send tag */ + c->tag.m_snd_tag.ifp = priv->ifp; + c->tag.type = IF_SND_TAG_TYPE_UNLIMITED; c->mkey_be = cpu_to_be32(priv->mr.key); c->num_tc = priv->num_tc; @@ -2011,7 +2013,6 @@ mlx5e_open_channels(struct mlx5e_priv *priv) if (err) goto err_close_channels; } - return (0); err_close_channels: @@ -3525,6 +3526,141 @@ mlx5e_setup_pauseframes(struct mlx5e_priv *priv) PRIV_UNLOCK(priv); } +static int +mlx5e_ul_snd_tag_alloc(struct ifnet *ifp, + union if_snd_tag_alloc_params *params, + struct m_snd_tag **ppmt) +{ + struct mlx5e_priv *priv; + struct mlx5e_channel *pch; + + priv = ifp->if_softc; + + if (unlikely(priv->gone || params->hdr.flowtype == M_HASHTYPE_NONE)) { + return (EOPNOTSUPP); + } else { + /* keep this code synced with mlx5e_select_queue() */ + u32 ch = priv->params.num_channels; +#ifdef RSS + u32 temp; + + if (rss_hash2bucket(params->hdr.flowid, + params->hdr.flowtype, &temp) == 0) + ch = temp % ch; + else +#endif + ch = (params->hdr.flowid % 128) % ch; + + /* + * NOTE: The channels array is only freed at detach + * and it safe to return a pointer to the send tag + * inside the channels structure as long as we + * reference the priv. + */ + pch = priv->channel + ch; + + /* check if send queue is not running */ + if (unlikely(pch->sq[0].running == 0)) + return (ENXIO); + mlx5e_ref_channel(priv); + *ppmt = &pch->tag.m_snd_tag; + return (0); + } +} + +static int +mlx5e_ul_snd_tag_query(struct m_snd_tag *pmt, union if_snd_tag_query_params *params) +{ + struct mlx5e_channel *pch = + container_of(pmt, struct mlx5e_channel, tag.m_snd_tag); + + params->unlimited.max_rate = -1ULL; + params->unlimited.queue_level = mlx5e_sq_queue_level(&pch->sq[0]); + return (0); +} + +static void +mlx5e_ul_snd_tag_free(struct m_snd_tag *pmt) +{ + struct mlx5e_channel *pch = + container_of(pmt, struct mlx5e_channel, tag.m_snd_tag); + + mlx5e_unref_channel(pch->priv); +} + +static int +mlx5e_snd_tag_alloc(struct ifnet *ifp, + union if_snd_tag_alloc_params *params, + struct m_snd_tag **ppmt) +{ + + switch (params->hdr.type) { +#ifdef RATELIMIT + case IF_SND_TAG_TYPE_RATE_LIMIT: + return (mlx5e_rl_snd_tag_alloc(ifp, params, ppmt)); +#endif + case IF_SND_TAG_TYPE_UNLIMITED: + return (mlx5e_ul_snd_tag_alloc(ifp, params, ppmt)); + default: + return (EOPNOTSUPP); + } +} + +static int +mlx5e_snd_tag_modify(struct m_snd_tag *pmt, union if_snd_tag_modify_params *params) +{ + struct mlx5e_snd_tag *tag = + container_of(pmt, struct mlx5e_snd_tag, m_snd_tag); + + switch (tag->type) { +#ifdef RATELIMIT + case IF_SND_TAG_TYPE_RATE_LIMIT: + return (mlx5e_rl_snd_tag_modify(pmt, params)); +#endif + case IF_SND_TAG_TYPE_UNLIMITED: + default: + return (EOPNOTSUPP); + } +} + +static int +mlx5e_snd_tag_query(struct m_snd_tag *pmt, union if_snd_tag_query_params *params) +{ + struct mlx5e_snd_tag *tag = + container_of(pmt, struct mlx5e_snd_tag, m_snd_tag); + + switch (tag->type) { +#ifdef RATELIMIT + case IF_SND_TAG_TYPE_RATE_LIMIT: + return (mlx5e_rl_snd_tag_query(pmt, params)); +#endif + case IF_SND_TAG_TYPE_UNLIMITED: + return (mlx5e_ul_snd_tag_query(pmt, params)); + default: + return (EOPNOTSUPP); + } +} + +static void +mlx5e_snd_tag_free(struct m_snd_tag *pmt) +{ + struct mlx5e_snd_tag *tag = + container_of(pmt, struct mlx5e_snd_tag, m_snd_tag); + + switch (tag->type) { +#ifdef RATELIMIT + case IF_SND_TAG_TYPE_RATE_LIMIT: + mlx5e_rl_snd_tag_free(pmt); + break; +#endif + case IF_SND_TAG_TYPE_UNLIMITED: + mlx5e_ul_snd_tag_free(pmt); + break; + default: + break; + } +} + static void * mlx5e_create_ifp(struct mlx5_core_dev *mdev) { @@ -3578,13 +3714,11 @@ mlx5e_create_ifp(struct mlx5_core_dev *mdev) ifp->if_capabilities |= IFCAP_LRO; ifp->if_capabilities |= IFCAP_TSO | IFCAP_VLAN_HWTSO; ifp->if_capabilities |= IFCAP_HWSTATS | IFCAP_HWRXTSTMP; -#ifdef RATELIMIT ifp->if_capabilities |= IFCAP_TXRTLMT; - ifp->if_snd_tag_alloc = mlx5e_rl_snd_tag_alloc; - ifp->if_snd_tag_free = mlx5e_rl_snd_tag_free; - ifp->if_snd_tag_modify = mlx5e_rl_snd_tag_modify; - ifp->if_snd_tag_query = mlx5e_rl_snd_tag_query; -#endif + ifp->if_snd_tag_alloc = mlx5e_snd_tag_alloc; + ifp->if_snd_tag_free = mlx5e_snd_tag_free; + ifp->if_snd_tag_modify = mlx5e_snd_tag_modify; + ifp->if_snd_tag_query = mlx5e_snd_tag_query; /* set TSO limits so that we don't have to drop TX packets */ ifp->if_hw_tsomax = MLX5E_MAX_TX_PAYLOAD_SIZE - (ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN); @@ -3838,6 +3972,13 @@ mlx5e_destroy_ifp(struct mlx5_core_dev *mdev, void *vp PRIV_LOCK(priv); mlx5e_close_locked(ifp); PRIV_UNLOCK(priv); + + /* wait for all unlimited send tags to go away */ + while (priv->channel_refs != 0) { + if_printf(priv->ifp, "Waiting for all unlimited connections " + "to terminate\n"); + pause("W", hz); + } /* unregister device */ ifmedia_removeall(&priv->media); Modified: head/sys/dev/mlx5/mlx5_en/mlx5_en_rl.c ============================================================================== --- head/sys/dev/mlx5/mlx5_en/mlx5_en_rl.c Wed Dec 5 14:24:33 2018 (r341585) +++ head/sys/dev/mlx5/mlx5_en/mlx5_en_rl.c Wed Dec 5 14:25:03 2018 (r341586) @@ -841,7 +841,8 @@ mlx5e_rl_init(struct mlx5e_priv *priv) for (i = 0; i < rl->param.tx_channels_per_worker_def; i++) { struct mlx5e_rl_channel *channel = rlw->channels + i; channel->worker = rlw; - channel->m_snd_tag.ifp = priv->ifp; + channel->tag.m_snd_tag.ifp = priv->ifp; + channel->tag.type = IF_SND_TAG_TYPE_RATE_LIMIT; STAILQ_INSERT_TAIL(&rlw->index_list_head, channel, entry); } MLX5E_RL_WORKER_UNLOCK(rlw); @@ -1038,17 +1039,21 @@ mlx5e_rl_modify(struct mlx5e_rl_worker *rlw, struct ml } static int -mlx5e_rl_query(struct mlx5e_rl_worker *rlw, struct mlx5e_rl_channel *channel, uint64_t *prate) +mlx5e_rl_query(struct mlx5e_rl_worker *rlw, struct mlx5e_rl_channel *channel, + union if_snd_tag_query_params *params) { int retval; MLX5E_RL_WORKER_LOCK(rlw); switch (channel->state) { case MLX5E_RL_ST_USED: - *prate = channel->last_rate; + params->rate_limit.max_rate = channel->last_rate; + params->rate_limit.queue_level = mlx5e_sq_queue_level(channel->sq); retval = 0; break; case MLX5E_RL_ST_MODIFY: + params->rate_limit.max_rate = channel->last_rate; + params->rate_limit.queue_level = mlx5e_sq_queue_level(channel->sq); retval = EBUSY; break; default: @@ -1120,7 +1125,7 @@ mlx5e_rl_snd_tag_alloc(struct ifnet *ifp, } /* store pointer to mbuf tag */ - *ppmt = &channel->m_snd_tag; + *ppmt = &channel->tag.m_snd_tag; done: return (error); } @@ -1130,7 +1135,7 @@ int mlx5e_rl_snd_tag_modify(struct m_snd_tag *pmt, union if_snd_tag_modify_params *params) { struct mlx5e_rl_channel *channel = - container_of(pmt, struct mlx5e_rl_channel, m_snd_tag); + container_of(pmt, struct mlx5e_rl_channel, tag.m_snd_tag); return (mlx5e_rl_modify(channel->worker, channel, params->rate_limit.max_rate)); } @@ -1139,16 +1144,16 @@ int mlx5e_rl_snd_tag_query(struct m_snd_tag *pmt, union if_snd_tag_query_params *params) { struct mlx5e_rl_channel *channel = - container_of(pmt, struct mlx5e_rl_channel, m_snd_tag); + container_of(pmt, struct mlx5e_rl_channel, tag.m_snd_tag); - return (mlx5e_rl_query(channel->worker, channel, ¶ms->rate_limit.max_rate)); + return (mlx5e_rl_query(channel->worker, channel, params)); } void mlx5e_rl_snd_tag_free(struct m_snd_tag *pmt) { struct mlx5e_rl_channel *channel = - container_of(pmt, struct mlx5e_rl_channel, m_snd_tag); + container_of(pmt, struct mlx5e_rl_channel, tag.m_snd_tag); mlx5e_rl_free(channel->worker, channel); } Modified: head/sys/dev/mlx5/mlx5_en/mlx5_en_tx.c ============================================================================== --- head/sys/dev/mlx5/mlx5_en/mlx5_en_tx.c Wed Dec 5 14:24:33 2018 (r341585) +++ head/sys/dev/mlx5/mlx5_en/mlx5_en_tx.c Wed Dec 5 14:25:03 2018 (r341586) @@ -78,6 +78,47 @@ SYSINIT(mlx5e_hash_init, SI_SUB_RANDOM, SI_ORDER_ANY, #endif static struct mlx5e_sq * +mlx5e_select_queue_by_send_tag(struct ifnet *ifp, struct mbuf *mb) +{ + struct mlx5e_snd_tag *ptag; + struct mlx5e_sq *sq; + + /* check for route change */ + if (mb->m_pkthdr.snd_tag->ifp != ifp) + return (NULL); + + /* get pointer to sendqueue */ + ptag = container_of(mb->m_pkthdr.snd_tag, + struct mlx5e_snd_tag, m_snd_tag); + + switch (ptag->type) { +#ifdef RATELIMIT + case IF_SND_TAG_TYPE_RATE_LIMIT: + sq = container_of(ptag, + struct mlx5e_rl_channel, tag)->sq; + break; +#endif + case IF_SND_TAG_TYPE_UNLIMITED: + sq = &container_of(ptag, + struct mlx5e_channel, tag)->sq[0]; + KASSERT(({ + struct mlx5e_priv *priv = ifp->if_softc; + priv->channel_refs > 0; }), + ("mlx5e_select_queue: Channel refs are zero for unlimited tag")); + break; + default: + sq = NULL; + break; + } + + /* check if valid */ + if (sq != NULL && READ_ONCE(sq->running) != 0) + return (sq); + + return (NULL); +} + +static struct mlx5e_sq * mlx5e_select_queue(struct ifnet *ifp, struct mbuf *mb) { struct mlx5e_priv *priv = ifp->if_softc; @@ -96,25 +137,6 @@ mlx5e_select_queue(struct ifnet *ifp, struct mbuf *mb) ch = priv->params.num_channels; -#ifdef RATELIMIT - if (mb->m_pkthdr.snd_tag != NULL) { - struct mlx5e_sq *sq; - - /* check for route change */ - if (mb->m_pkthdr.snd_tag->ifp != ifp) - return (NULL); - - /* get pointer to sendqueue */ - sq = container_of(mb->m_pkthdr.snd_tag, - struct mlx5e_rl_channel, m_snd_tag)->sq; - - /* check if valid */ - if (sq != NULL && sq->running != 0) - return (sq); - - /* FALLTHROUGH */ - } -#endif /* check if flowid is set */ if (M_HASHTYPE_GET(mb) != M_HASHTYPE_NONE) { #ifdef RSS @@ -587,27 +609,33 @@ mlx5e_xmit(struct ifnet *ifp, struct mbuf *mb) struct mlx5e_sq *sq; int ret; - sq = mlx5e_select_queue(ifp, mb); - if (unlikely(sq == NULL)) { -#ifdef RATELIMIT - /* Check for route change */ - if (mb->m_pkthdr.snd_tag != NULL && - mb->m_pkthdr.snd_tag->ifp != ifp) { + if (mb->m_pkthdr.snd_tag != NULL) { + sq = mlx5e_select_queue_by_send_tag(ifp, mb); + if (unlikely(sq == NULL)) { + /* Check for route change */ + if (mb->m_pkthdr.snd_tag->ifp != ifp) { + /* Free mbuf */ + m_freem(mb); + + /* + * Tell upper layers about route + * change and to re-transmit this + * packet: + */ + return (EAGAIN); + } + goto select_queue; + } + } else { +select_queue: + sq = mlx5e_select_queue(ifp, mb); + if (unlikely(sq == NULL)) { /* Free mbuf */ m_freem(mb); - /* - * Tell upper layers about route change and to - * re-transmit this packet: - */ - return (EAGAIN); + /* Invalid send queue */ + return (ENXIO); } -#endif - /* Free mbuf */ - m_freem(mb); - - /* Invalid send queue */ - return (ENXIO); } mtx_lock(&sq->lock);