Date: Thu, 1 Dec 2016 23:38:53 +0000 (UTC) From: John Baldwin <jhb@FreeBSD.org> To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-stable@freebsd.org, svn-src-stable-10@freebsd.org Subject: svn commit: r309378 - in stable/10: contrib/ofed/libcxgb4 contrib/ofed/libcxgb4/src contrib/ofed/usr.lib contrib/ofed/usr.lib/libcxgb4 sys/dev/cxgb/ulp/iw_cxgb sys/dev/cxgbe/iw_cxgbe sys/ofed/drive... Message-ID: <201612012338.uB1NcrBF087312@repo.freebsd.org>
next in thread | raw e-mail | index | archive | help
Author: jhb Date: Thu Dec 1 23:38:52 2016 New Revision: 309378 URL: https://svnweb.freebsd.org/changeset/base/309378 Log: MFC 273806,289103,289201,289338,289578,293185,294474,294610,297124,297368, 297406,300875,300888,301158,301896,301897,304838: Pull in most of the Chelsio and iWARP related changes from stable/11 into stable/10. A few changes from 278886 (OFED 1.2) were also included though the full merge is not: - The find_gid_port() function in infiband/core/cma.c. - Addition of the 'ord' and 'ird' fields to 'struct iw_cm_event'. 273806: Userspace library for Chelsio's Terminator 5 based iWARP RNICs (pretty much every T5 card that does _not_ have "-SO" in its name is RDMA capable). This plugs into the OFED verbs framework and allows userspace RDMA applications to work over T5 RNICs. Tested with rping. 289103: iw_cxgbe: fix for page fault in cm_close_handler(). This is roughly the iw_cxgbe equivalent of https://github.com/torvalds/linux/commit/be13b2dff8c4e41846477b22cc5c164ea5a6ac2e ----------------- RDMA/cxgb4: Connect_request_upcall fixes When processing an MPA Start Request, if the listening endpoint is DEAD, then abort the connection. If the IWCM returns an error, then we must abort the connection and release resources. Also abort_connection() should not post a CLOSE event, so clean that up too. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com> ----------------- 289201: iw_cxgbe: MPA v2 is always available. 289338: iw_cxgbe: use correct RFC number. 289578: Merge LinuxKPI changes from DragonflyBSD: - Define the kref structure identical to the one found in Linux. - Update clients referring inside the kref structure. - Implement kref_sub() for FreeBSD. 293185: iw_cxgbe: Shut down the socket but do not close the fd in case of error. The fd is closed later in this case. This fixes a "SS_NOFDREF on enter" panic. 294474: iw_cxgbe: fix a couple of problems int the RDMA_TERMINATE handler. a) Look for the CPL in the payload buffer instead of the descriptor. b) Retrieve the socket associated with the tid with the inpcb lock held. 294610: Fix for iWARP servers that listen on INADDR_ANY. The iWARP Connection Manager (CM) on FreeBSD creates a TCP socket to represent an iWARP endpoint when the connection is over TCP. For servers the current approach is to invoke create_listen callback for each iWARP RNIC registered with the CM. This doesn't work too well for INADDR_ANY because a listen on any TCP socket already notifies all hardware TOEs/RNICs of the new listener. This patch fixes the server side of things for FreeBSD. We've tried to keep all these modifications in the iWARP/TCP specific parts of the OFED infrastructure as much as possible. 297124: iw_cxgbe/libcxgb4: Pull in many applicable fixes from the upstream Linux iWARP driver and userspace library to the FreeBSD iw_cxgbe and libcxgb4. This commit includes internal changesets 6785 8111 8149 8478 8617 8648 8650 9110 9143 9440 9511 9894 10164 10261 10450 10980 10981 10982 11730 11792 12218 12220 12222 12223 12225 12226 12227 12228 12229 12654. 297368: cxgbe/iw_cxgbe: Fix for stray "start_ep_timer timer already started!" messages. 297406: Remove unnecessary dequeue_mutex (added in r294610) from the iWARP connection manager. Examining so_comp without synchronization with iw_so_event_handler is a harmless race. 300875: iw_cxgbe: Use vmem(9) to manage PBL and RQT allocations. 300888: iw_cxgbe: Plug a lock leak in process_mpa_request(). If the parent is DEAD or connect_request_upcall() fails, the parent mutex is left locked. This leads to a hang when process_mpa_request() is called again for another child of the listening endpoint. 301158: iw_cxgbe: Fix panic that occurs when c4iw_ev_handler tries to acquire comp_handler_lock but c4iw_destroy_cq has already freed the CQ memory (which is where the lock resides). 301896: Fix bug in iwcm that caused a panic in iw_cm_wq when krping is run repeatedly in a tight loop. 301897: iw_cxgbe: Make sure that send_abort results in a TCP RST and not a FIN. Release the hold on ep->com immediately after sending the RST. This fixes a bug that sometimes leaves userspace iWARP tools hung when the user presses ^C. 304838: Do not free an uninitialized pointer on soaccept failure in the iWARP connection manager. Submitted by: Krishnamraju Eraparaju @ Chelsio (original patch) Sponsored by: Chelsio Communications Added: stable/10/contrib/ofed/libcxgb4/ - copied from r273806, head/contrib/ofed/libcxgb4/ stable/10/contrib/ofed/usr.lib/libcxgb4/ - copied from r273806, head/contrib/ofed/usr.lib/libcxgb4/ Modified: stable/10/contrib/ofed/libcxgb4/src/cq.c stable/10/contrib/ofed/libcxgb4/src/dev.c stable/10/contrib/ofed/libcxgb4/src/libcxgb4.h stable/10/contrib/ofed/libcxgb4/src/qp.c stable/10/contrib/ofed/libcxgb4/src/t4.h stable/10/contrib/ofed/libcxgb4/src/verbs.c stable/10/contrib/ofed/usr.lib/Makefile stable/10/sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb.h stable/10/sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb_cm.c stable/10/sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb_cm.h stable/10/sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb_provider.c stable/10/sys/dev/cxgbe/iw_cxgbe/cm.c stable/10/sys/dev/cxgbe/iw_cxgbe/cq.c stable/10/sys/dev/cxgbe/iw_cxgbe/ev.c stable/10/sys/dev/cxgbe/iw_cxgbe/iw_cxgbe.h stable/10/sys/dev/cxgbe/iw_cxgbe/mem.c stable/10/sys/dev/cxgbe/iw_cxgbe/provider.c stable/10/sys/dev/cxgbe/iw_cxgbe/qp.c stable/10/sys/dev/cxgbe/iw_cxgbe/resource.c stable/10/sys/dev/cxgbe/iw_cxgbe/t4.h stable/10/sys/dev/cxgbe/iw_cxgbe/user.h stable/10/sys/ofed/drivers/infiniband/core/cma.c stable/10/sys/ofed/drivers/infiniband/core/iwcm.c stable/10/sys/ofed/include/linux/kref.h stable/10/sys/ofed/include/rdma/iw_cm.h stable/10/sys/ofed/include/rdma/rdma_cm.h Directory Properties: stable/10/ (props changed) Modified: stable/10/contrib/ofed/libcxgb4/src/cq.c ============================================================================== --- head/contrib/ofed/libcxgb4/src/cq.c Wed Oct 29 01:15:48 2014 (r273806) +++ stable/10/contrib/ofed/libcxgb4/src/cq.c Thu Dec 1 23:38:52 2016 (r309378) @@ -697,7 +697,7 @@ static int c4iw_poll_cq_one(struct c4iw_ default: PDBG("Unexpected cqe_status 0x%x for QPID=0x%0x\n", CQE_STATUS(&cqe), CQE_QPID(&cqe)); - ret = -EINVAL; + wc->status = IBV_WC_FATAL_ERR; } } if (wc->status && wc->status != IBV_WC_WR_FLUSH_ERR) Modified: stable/10/contrib/ofed/libcxgb4/src/dev.c ============================================================================== --- head/contrib/ofed/libcxgb4/src/dev.c Wed Oct 29 01:15:48 2014 (r273806) +++ stable/10/contrib/ofed/libcxgb4/src/dev.c Thu Dec 1 23:38:52 2016 (r309378) @@ -54,7 +54,6 @@ struct { \ unsigned vendor; \ unsigned device; \ - unsigned chip_version; \ } hca_table[] = { #define CH_PCI_DEVICE_ID_FUNCTION \ @@ -64,7 +63,6 @@ { \ .vendor = PCI_VENDOR_ID_CHELSIO, \ .device = (__DeviceID), \ - .chip_version = CHELSIO_PCI_ID_CHIP_VERSION(__DeviceID), \ } #define CH_PCI_DEVICE_ID_TABLE_DEFINE_END \ @@ -493,7 +491,8 @@ found: } PDBG("%s found vendor %d device %d type %d\n", - __FUNCTION__, vendor, device, hca_table[i].chip_version); + __FUNCTION__, vendor, device, + CHELSIO_PCI_ID_CHIP_VERSION(hca_table[i].device)); dev = calloc(1, sizeof *dev); if (!dev) { @@ -502,7 +501,7 @@ found: pthread_spin_init(&dev->lock, PTHREAD_PROCESS_PRIVATE); dev->ibv_dev.ops = c4iw_dev_ops; - dev->chip_version = hca_table[i].chip_version; + dev->chip_version = CHELSIO_PCI_ID_CHIP_VERSION(hca_table[i].device); dev->abi_version = abi_version; PDBG("%s device claimed\n", __FUNCTION__); Modified: stable/10/contrib/ofed/libcxgb4/src/libcxgb4.h ============================================================================== --- head/contrib/ofed/libcxgb4/src/libcxgb4.h Wed Oct 29 01:15:48 2014 (r273806) +++ stable/10/contrib/ofed/libcxgb4/src/libcxgb4.h Thu Dec 1 23:38:52 2016 (r309378) @@ -69,6 +69,11 @@ static inline int dev_is_t5(struct c4iw_ return dev->chip_version == CHELSIO_T5; } +static inline int dev_is_t4(struct c4iw_dev *dev) +{ + return dev->chip_version == CHELSIO_T4; +} + struct c4iw_context { struct ibv_context ibv_ctx; struct t4_dev_status_page *status_page; Modified: stable/10/contrib/ofed/libcxgb4/src/qp.c ============================================================================== --- head/contrib/ofed/libcxgb4/src/qp.c Wed Oct 29 01:15:48 2014 (r273806) +++ stable/10/contrib/ofed/libcxgb4/src/qp.c Thu Dec 1 23:38:52 2016 (r309378) @@ -362,7 +362,7 @@ int c4iw_post_send(struct ibv_qp *ibqp, err = build_rdma_read(wqe, wr, &len16); if (err) break; - swsqe->read_len = wr->sg_list[0].length; + swsqe->read_len = wr->sg_list ? wr->sg_list[0].length : 0; if (!qhp->wq.sq.oldest_read) qhp->wq.sq.oldest_read = swsqe; break; Modified: stable/10/contrib/ofed/libcxgb4/src/t4.h ============================================================================== --- head/contrib/ofed/libcxgb4/src/t4.h Wed Oct 29 01:15:48 2014 (r273806) +++ stable/10/contrib/ofed/libcxgb4/src/t4.h Thu Dec 1 23:38:52 2016 (r309378) @@ -328,6 +328,7 @@ struct t4_sq { volatile u32 *udb; size_t memsize; u32 qid; + u32 bar2_qid; void *ma_sync; u16 in_use; u16 size; @@ -336,6 +337,7 @@ struct t4_sq { u16 wq_pidx; u16 flags; short flush_cidx; + int wc_reg_available; }; struct t4_swrqe { @@ -348,6 +350,7 @@ struct t4_rq { volatile u32 *udb; size_t memsize; u32 qid; + u32 bar2_qid; u32 msn; u32 rqt_hwaddr; u16 rqt_size; @@ -356,6 +359,7 @@ struct t4_rq { u16 cidx; u16 pidx; u16 wq_pidx; + int wc_reg_available; }; struct t4_wq { @@ -485,14 +489,14 @@ static inline void t4_ring_sq_db(struct { wc_wmb(); if (t5) { - if (t5_en_wc && inc == 1) { + if (t5_en_wc && inc == 1 && wq->sq.wc_reg_available) { PDBG("%s: WC wq->sq.pidx = %d; len16=%d\n", __func__, wq->sq.pidx, len16); copy_wqe_to_udb(wq->sq.udb + 14, wqe); } else { PDBG("%s: DB wq->sq.pidx = %d; len16=%d\n", __func__, wq->sq.pidx, len16); - writel(V_PIDX_T5(inc), wq->sq.udb); + writel(V_QID(wq->sq.bar2_qid) | V_PIDX_T5(inc), wq->sq.udb); } wc_wmb(); return; @@ -518,14 +522,14 @@ static inline void t4_ring_rq_db(struct { wc_wmb(); if (t5) { - if (t5_en_wc && inc == 1) { + if (t5_en_wc && inc == 1 && wq->sq.wc_reg_available) { PDBG("%s: WC wq->rq.pidx = %d; len16=%d\n", __func__, wq->rq.pidx, len16); copy_wqe_to_udb(wq->rq.udb + 14, wqe); } else { PDBG("%s: DB wq->rq.pidx = %d; len16=%d\n", __func__, wq->rq.pidx, len16); - writel(V_PIDX_T5(inc), wq->rq.udb); + writel(V_QID(wq->rq.bar2_qid) | V_PIDX_T5(inc), wq->rq.udb); } wc_wmb(); return; Modified: stable/10/contrib/ofed/libcxgb4/src/verbs.c ============================================================================== --- head/contrib/ofed/libcxgb4/src/verbs.c Wed Oct 29 01:15:48 2014 (r273806) +++ stable/10/contrib/ofed/libcxgb4/src/verbs.c Thu Dec 1 23:38:52 2016 (r309378) @@ -213,7 +213,7 @@ struct ibv_cq *c4iw_create_cq(struct ibv goto err3; if (dev_is_t5(chp->rhp)) - chp->cq.ugts += 3; + chp->cq.ugts += 5; else chp->cq.ugts += 1; chp->cq.sw_queue = calloc(chp->cq.size, sizeof *chp->cq.queue); @@ -460,8 +460,14 @@ static struct ibv_qp *create_qp(struct i goto err3; } qhp->wq.sq.udb = dbva; - if (dev_is_t5(qhp->rhp)) { - qhp->wq.sq.udb += (128*(qhp->wq.sq.qid & qhp->wq.qid_mask))/4; + if (!dev_is_t4(qhp->rhp)) { + unsigned long segment_offset = 128 * (qhp->wq.sq.qid & qhp->wq.qid_mask); + + if (segment_offset < c4iw_page_size) { + qhp->wq.sq.udb += segment_offset / 4; + qhp->wq.sq.wc_reg_available = 1; + } else + qhp->wq.sq.bar2_qid = qhp->wq.sq.qid & qhp->wq.qid_mask; qhp->wq.sq.udb += 2; } @@ -479,8 +485,14 @@ static struct ibv_qp *create_qp(struct i if (dbva == MAP_FAILED) goto err5; qhp->wq.rq.udb = dbva; - if (dev_is_t5(qhp->rhp)) { - qhp->wq.rq.udb += (128*(qhp->wq.rq.qid & qhp->wq.qid_mask))/4; + if (!dev_is_t4(qhp->rhp)) { + unsigned long segment_offset = 128 * (qhp->wq.rq.qid & qhp->wq.qid_mask); + + if (segment_offset < c4iw_page_size) { + qhp->wq.rq.udb += segment_offset / 4; + qhp->wq.rq.wc_reg_available = 1; + } else + qhp->wq.rq.bar2_qid = qhp->wq.rq.qid & qhp->wq.qid_mask; qhp->wq.rq.udb += 2; } qhp->wq.rq.queue = mmap(NULL, qhp->wq.rq.memsize, Modified: stable/10/contrib/ofed/usr.lib/Makefile ============================================================================== --- stable/10/contrib/ofed/usr.lib/Makefile Thu Dec 1 23:37:17 2016 (r309377) +++ stable/10/contrib/ofed/usr.lib/Makefile Thu Dec 1 23:38:52 2016 (r309378) @@ -1,6 +1,4 @@ SUBDIR= libibcommon libibmad libibumad libibverbs libmlx4 libmthca \ - libopensm libosmcomp libosmvendor libibcm librdmacm libsdp - -SUBDIR_PARALLEL= + libopensm libosmcomp libosmvendor libibcm librdmacm libsdp libcxgb4 .include <bsd.subdir.mk> Modified: stable/10/sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb.h ============================================================================== --- stable/10/sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb.h Thu Dec 1 23:37:17 2016 (r309377) +++ stable/10/sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb.h Thu Dec 1 23:38:52 2016 (r309378) @@ -174,4 +174,5 @@ static inline void remove_handle(struct } void iwch_ev_dispatch(struct iwch_dev *, struct mbuf *); +void process_newconn(struct iw_cm_id *parent_cm_id, struct socket *child_so); #endif Modified: stable/10/sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb_cm.c ============================================================================== --- stable/10/sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb_cm.c Thu Dec 1 23:37:17 2016 (r309377) +++ stable/10/sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb_cm.c Thu Dec 1 23:38:52 2016 (r309378) @@ -267,7 +267,6 @@ alloc_ep(int size, int flags) void __free_ep(struct iwch_ep_common *epc) { CTR3(KTR_IW_CXGB, "%s ep %p state %s", __FUNCTION__, epc, states[state_read(epc)]); - KASSERT(!epc->so, ("%s warning ep->so %p \n", __FUNCTION__, epc->so)); KASSERT(!epc->entry.tqe_prev, ("%s epc %p still on req list!\n", __FUNCTION__, epc)); free(epc, M_DEVBUF); } @@ -1374,7 +1373,7 @@ out: } int -iwch_create_listen(struct iw_cm_id *cm_id, int backlog) +iwch_create_listen_ep(struct iw_cm_id *cm_id, int backlog) { int err = 0; struct iwch_listen_ep *ep; @@ -1394,35 +1393,22 @@ iwch_create_listen(struct iw_cm_id *cm_i state_set(&ep->com, LISTEN); ep->com.so = cm_id->so; - err = init_sock(&ep->com); - if (err) - goto fail; - - err = solisten(ep->com.so, ep->backlog, ep->com.thread); - if (!err) { - cm_id->provider_data = ep; - goto out; - } - close_socket(&ep->com, 0); -fail: - cm_id->rem_ref(cm_id); - put_ep(&ep->com); + cm_id->provider_data = ep; out: return err; } -int -iwch_destroy_listen(struct iw_cm_id *cm_id) +void +iwch_destroy_listen_ep(struct iw_cm_id *cm_id) { struct iwch_listen_ep *ep = to_listen_ep(cm_id); CTR2(KTR_IW_CXGB, "%s ep %p", __FUNCTION__, ep); state_set(&ep->com, DEAD); - close_socket(&ep->com, 0); cm_id->rem_ref(cm_id); put_ep(&ep->com); - return 0; + return; } int @@ -1539,54 +1525,32 @@ process_connected(struct iwch_ep *ep) } } -static struct socket * -dequeue_socket(struct socket *head, struct sockaddr_in **remote, struct iwch_ep *child_ep) -{ - struct socket *so; - - ACCEPT_LOCK(); - so = TAILQ_FIRST(&head->so_comp); - if (!so) { - ACCEPT_UNLOCK(); - return NULL; - } - TAILQ_REMOVE(&head->so_comp, so, so_list); - head->so_qlen--; - SOCK_LOCK(so); - so->so_qstate &= ~SQ_COMP; - so->so_head = NULL; - soref(so); - soupcall_set(so, SO_RCV, iwch_so_upcall, child_ep); - so->so_state |= SS_NBIO; - PANIC_IF(!(so->so_state & SS_ISCONNECTED)); - PANIC_IF(so->so_error); - SOCK_UNLOCK(so); - ACCEPT_UNLOCK(); - soaccept(so, (struct sockaddr **)remote); - return so; -} - -static void -process_newconn(struct iwch_ep *parent_ep) +void +process_newconn(struct iw_cm_id *parent_cm_id, struct socket *child_so) { - struct socket *child_so; struct iwch_ep *child_ep; + struct sockaddr_in *local; struct sockaddr_in *remote; + struct iwch_ep *parent_ep = parent_cm_id->provider_data; CTR3(KTR_IW_CXGB, "%s parent ep %p so %p", __FUNCTION__, parent_ep, parent_ep->com.so); + if (!child_so) { + log(LOG_ERR, "%s - invalid child socket!\n", __func__); + return; + } child_ep = alloc_ep(sizeof(*child_ep), M_NOWAIT); if (!child_ep) { log(LOG_ERR, "%s - failed to allocate ep entry!\n", __FUNCTION__); return; } - child_so = dequeue_socket(parent_ep->com.so, &remote, child_ep); - if (!child_so) { - log(LOG_ERR, "%s - failed to dequeue child socket!\n", - __FUNCTION__); - __free_ep(&child_ep->com); - return; - } + SOCKBUF_LOCK(&child_so->so_rcv); + soupcall_set(child_so, SO_RCV, iwch_so_upcall, child_ep); + SOCKBUF_UNLOCK(&child_so->so_rcv); + + in_getsockaddr(child_so, (struct sockaddr **)&local); + in_getpeeraddr(child_so, (struct sockaddr **)&remote); + CTR3(KTR_IW_CXGB, "%s remote addr %s port %d", __FUNCTION__, inet_ntoa(remote->sin_addr), ntohs(remote->sin_port)); child_ep->com.tdev = parent_ep->com.tdev; @@ -1603,9 +1567,9 @@ process_newconn(struct iwch_ep *parent_e child_ep->com.thread = parent_ep->com.thread; child_ep->parent_ep = parent_ep; + free(local, M_SONAME); free(remote, M_SONAME); get_ep(&parent_ep->com); - child_ep->parent_ep = parent_ep; callout_init(&child_ep->timer, TRUE); state_set(&child_ep->com, MPA_REQ_WAIT); start_ep_timer(child_ep); @@ -1643,7 +1607,10 @@ process_socket_event(struct iwch_ep *ep) } if (state == LISTEN) { - process_newconn(ep); + /* socket listening events are handled at IWCM */ + CTR3(KTR_IW_CXGB, "%s Invalid ep state:%u, ep:%p", __func__, + ep->com.state, ep); + BUG(); return; } Modified: stable/10/sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb_cm.h ============================================================================== --- stable/10/sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb_cm.h Thu Dec 1 23:37:17 2016 (r309377) +++ stable/10/sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb_cm.h Thu Dec 1 23:38:52 2016 (r309378) @@ -231,8 +231,8 @@ iwch_wakeup(struct cv *cv, struct mtx *l /* CM prototypes */ int iwch_connect(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param); -int iwch_create_listen(struct iw_cm_id *cm_id, int backlog); -int iwch_destroy_listen(struct iw_cm_id *cm_id); +int iwch_create_listen_ep(struct iw_cm_id *cm_id, int backlog); +void iwch_destroy_listen_ep(struct iw_cm_id *cm_id); int iwch_reject_cr(struct iw_cm_id *cm_id, const void *pdata, u8 pdata_len); int iwch_accept_cr(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param); int iwch_ep_disconnect(struct iwch_ep *ep, int abrupt, int flags); Modified: stable/10/sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb_provider.c ============================================================================== --- stable/10/sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb_provider.c Thu Dec 1 23:37:17 2016 (r309377) +++ stable/10/sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb_provider.c Thu Dec 1 23:38:52 2016 (r309378) @@ -1131,8 +1131,9 @@ int iwch_register_device(struct iwch_dev dev->ibdev.iwcm->connect = iwch_connect; dev->ibdev.iwcm->accept = iwch_accept_cr; dev->ibdev.iwcm->reject = iwch_reject_cr; - dev->ibdev.iwcm->create_listen = iwch_create_listen; - dev->ibdev.iwcm->destroy_listen = iwch_destroy_listen; + dev->ibdev.iwcm->create_listen_ep = iwch_create_listen_ep; + dev->ibdev.iwcm->destroy_listen_ep = iwch_destroy_listen_ep; + dev->ibdev.iwcm->newconn = process_newconn; dev->ibdev.iwcm->add_ref = iwch_qp_add_ref; dev->ibdev.iwcm->rem_ref = iwch_qp_rem_ref; dev->ibdev.iwcm->get_qp = iwch_get_qp; Modified: stable/10/sys/dev/cxgbe/iw_cxgbe/cm.c ============================================================================== --- stable/10/sys/dev/cxgbe/iw_cxgbe/cm.c Thu Dec 1 23:37:17 2016 (r309377) +++ stable/10/sys/dev/cxgbe/iw_cxgbe/cm.c Thu Dec 1 23:38:52 2016 (r309378) @@ -1,5 +1,5 @@ /* - * Copyright (c) 2009-2013 Chelsio, Inc. All rights reserved. + * Copyright (c) 2009-2013, 2016 Chelsio, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU @@ -79,7 +79,7 @@ static spinlock_t timeout_lock; static void process_req(struct work_struct *ctx); static void start_ep_timer(struct c4iw_ep *ep); -static void stop_ep_timer(struct c4iw_ep *ep); +static int stop_ep_timer(struct c4iw_ep *ep); static int set_tcpinfo(struct c4iw_ep *ep); static enum c4iw_ep_state state_read(struct c4iw_ep_common *epc); static void __state_set(struct c4iw_ep_common *epc, enum c4iw_ep_state tostate); @@ -95,14 +95,14 @@ static void send_mpa_req(struct c4iw_ep static int send_mpa_reject(struct c4iw_ep *ep, const void *pdata, u8 plen); static int send_mpa_reply(struct c4iw_ep *ep, const void *pdata, u8 plen); static void close_complete_upcall(struct c4iw_ep *ep, int status); -static int abort_connection(struct c4iw_ep *ep); +static int send_abort(struct c4iw_ep *ep); static void peer_close_upcall(struct c4iw_ep *ep); static void peer_abort_upcall(struct c4iw_ep *ep); static void connect_reply_upcall(struct c4iw_ep *ep, int status); -static void connect_request_upcall(struct c4iw_ep *ep); +static int connect_request_upcall(struct c4iw_ep *ep); static void established_upcall(struct c4iw_ep *ep); -static void process_mpa_reply(struct c4iw_ep *ep); -static void process_mpa_request(struct c4iw_ep *ep); +static int process_mpa_reply(struct c4iw_ep *ep); +static int process_mpa_request(struct c4iw_ep *ep); static void process_peer_close(struct c4iw_ep *ep); static void process_conn_error(struct c4iw_ep *ep); static void process_close_complete(struct c4iw_ep *ep); @@ -110,8 +110,6 @@ static void ep_timeout(unsigned long arg static void init_sock(struct c4iw_ep_common *epc); static void process_data(struct c4iw_ep *ep); static void process_connected(struct c4iw_ep *ep); -static struct socket * dequeue_socket(struct socket *head, struct sockaddr_in **remote, struct c4iw_ep *child_ep); -static void process_newconn(struct c4iw_ep *parent_ep); static int c4iw_so_upcall(struct socket *so, void *arg, int waitflag); static void process_socket_event(struct c4iw_ep *ep); static void release_ep_resources(struct c4iw_ep *ep); @@ -124,11 +122,11 @@ static void release_ep_resources(struct } while (0) #define STOP_EP_TIMER(ep) \ - do { \ + ({ \ CTR3(KTR_IW_CXGBE, "stop_ep_timer (%s:%d) ep %p", \ __func__, __LINE__, (ep)); \ stop_ep_timer(ep); \ - } while (0) + }) #ifdef KTR static char *states[] = { @@ -148,6 +146,34 @@ static char *states[] = { }; #endif + +static void deref_cm_id(struct c4iw_ep_common *epc) +{ + epc->cm_id->rem_ref(epc->cm_id); + epc->cm_id = NULL; + set_bit(CM_ID_DEREFED, &epc->history); +} + +static void ref_cm_id(struct c4iw_ep_common *epc) +{ + set_bit(CM_ID_REFED, &epc->history); + epc->cm_id->add_ref(epc->cm_id); +} + +static void deref_qp(struct c4iw_ep *ep) +{ + c4iw_qp_rem_ref(&ep->com.qp->ibqp); + clear_bit(QP_REFERENCED, &ep->com.flags); + set_bit(QP_DEREFED, &ep->com.history); +} + +static void ref_qp(struct c4iw_ep *ep) +{ + set_bit(QP_REFERENCED, &ep->com.flags); + set_bit(QP_REFED, &ep->com.history); + c4iw_qp_add_ref(&ep->com.qp->ibqp); +} + static void process_req(struct work_struct *ctx) { @@ -307,9 +333,7 @@ process_peer_close(struct c4iw_ep *ep) disconnect = 0; STOP_EP_TIMER(ep); close_socket(&ep->com, 0); - ep->com.cm_id->rem_ref(ep->com.cm_id); - ep->com.cm_id = NULL; - ep->com.qp = NULL; + deref_cm_id(&ep->com); release = 1; break; @@ -474,7 +498,7 @@ process_conn_error(struct c4iw_ep *ep) if (state != ABORTING) { CTR2(KTR_IW_CXGBE, "%s:pce1 %p", __func__, ep); - close_socket(&ep->com, 1); + close_socket(&ep->com, 0); state_set(&ep->com, DEAD); c4iw_put_ep(&ep->com); } @@ -493,6 +517,7 @@ process_close_complete(struct c4iw_ep *e /* The cm_id may be null if we failed to connect */ mutex_lock(&ep->com.mutex); + set_bit(CLOSE_CON_RPL, &ep->com.history); switch (ep->com.state) { @@ -583,13 +608,14 @@ static void process_data(struct c4iw_ep *ep) { struct sockaddr_in *local, *remote; + int disconnect = 0; CTR5(KTR_IW_CXGBE, "%s: so %p, ep %p, state %s, sb_cc %d", __func__, ep->com.so, ep, states[ep->com.state], ep->com.so->so_rcv.sb_cc); switch (state_read(&ep->com)) { case MPA_REQ_SENT: - process_mpa_reply(ep); + disconnect = process_mpa_reply(ep); break; case MPA_REQ_WAIT: in_getsockaddr(ep->com.so, (struct sockaddr **)&local); @@ -598,7 +624,7 @@ process_data(struct c4iw_ep *ep) ep->com.remote_addr = *remote; free(local, M_SONAME); free(remote, M_SONAME); - process_mpa_request(ep); + disconnect = process_mpa_request(ep); break; default: if (ep->com.so->so_rcv.sb_cc) @@ -608,6 +634,9 @@ process_data(struct c4iw_ep *ep) ep->com.so->so_state, ep->com.so->so_rcv.sb_cc); break; } + if (disconnect) + c4iw_ep_disconnect(ep, disconnect == 2, GFP_KERNEL); + } static void @@ -624,40 +653,21 @@ process_connected(struct c4iw_ep *ep) } } -static struct socket * -dequeue_socket(struct socket *head, struct sockaddr_in **remote, - struct c4iw_ep *child_ep) -{ - struct socket *so; - - ACCEPT_LOCK(); - so = TAILQ_FIRST(&head->so_comp); - if (!so) { - ACCEPT_UNLOCK(); - return (NULL); - } - TAILQ_REMOVE(&head->so_comp, so, so_list); - head->so_qlen--; - SOCK_LOCK(so); - so->so_qstate &= ~SQ_COMP; - so->so_head = NULL; - soref(so); - soupcall_set(so, SO_RCV, c4iw_so_upcall, child_ep); - so->so_state |= SS_NBIO; - SOCK_UNLOCK(so); - ACCEPT_UNLOCK(); - soaccept(so, (struct sockaddr **)remote); - - return (so); -} - -static void -process_newconn(struct c4iw_ep *parent_ep) +void +process_newconn(struct iw_cm_id *parent_cm_id, struct socket *child_so) { - struct socket *child_so; struct c4iw_ep *child_ep; + struct sockaddr_in *local; struct sockaddr_in *remote; + struct c4iw_ep *parent_ep = parent_cm_id->provider_data; + if (!child_so) { + CTR4(KTR_IW_CXGBE, + "%s: parent so %p, parent ep %p, child so %p, invalid so", + __func__, parent_ep->com.so, parent_ep, child_so); + log(LOG_ERR, "%s: invalid child socket\n", __func__); + return; + } child_ep = alloc_ep(sizeof(*child_ep), M_NOWAIT); if (!child_ep) { CTR3(KTR_IW_CXGBE, "%s: parent so %p, parent ep %p, ENOMEM", @@ -665,23 +675,18 @@ process_newconn(struct c4iw_ep *parent_e log(LOG_ERR, "%s: failed to allocate ep entry\n", __func__); return; } - - child_so = dequeue_socket(parent_ep->com.so, &remote, child_ep); - if (!child_so) { - CTR4(KTR_IW_CXGBE, - "%s: parent so %p, parent ep %p, child ep %p, dequeue err", - __func__, parent_ep->com.so, parent_ep, child_ep); - log(LOG_ERR, "%s: failed to dequeue child socket\n", __func__); - __free_ep(&child_ep->com); - return; - - } + SOCKBUF_LOCK(&child_so->so_rcv); + soupcall_set(child_so, SO_RCV, c4iw_so_upcall, child_ep); + SOCKBUF_UNLOCK(&child_so->so_rcv); CTR5(KTR_IW_CXGBE, "%s: parent so %p, parent ep %p, child so %p, child ep %p", __func__, parent_ep->com.so, parent_ep, child_so, child_ep); - child_ep->com.local_addr = parent_ep->com.local_addr; + in_getsockaddr(child_so, (struct sockaddr **)&local); + in_getpeeraddr(child_so, (struct sockaddr **)&remote); + + child_ep->com.local_addr = *local; child_ep->com.remote_addr = *remote; child_ep->com.dev = parent_ep->com.dev; child_ep->com.so = child_so; @@ -689,15 +694,17 @@ process_newconn(struct c4iw_ep *parent_e child_ep->com.thread = parent_ep->com.thread; child_ep->parent_ep = parent_ep; + free(local, M_SONAME); free(remote, M_SONAME); + c4iw_get_ep(&parent_ep->com); - child_ep->parent_ep = parent_ep; init_timer(&child_ep->timer); state_set(&child_ep->com, MPA_REQ_WAIT); START_EP_TIMER(child_ep); /* maybe the request has already been queued up on the socket... */ process_mpa_request(child_ep); + return; } static int @@ -739,7 +746,10 @@ process_socket_event(struct c4iw_ep *ep) } if (state == LISTEN) { - process_newconn(ep); + /* socket listening events are handled at IWCM */ + CTR3(KTR_IW_CXGBE, "%s Invalid ep state:%u, ep:%p", __func__, + ep->com.state, ep); + BUG(); return; } @@ -750,7 +760,7 @@ process_socket_event(struct c4iw_ep *ep) } /* peer close */ - if ((so->so_rcv.sb_state & SBS_CANTRCVMORE) && state < CLOSING) { + if ((so->so_rcv.sb_state & SBS_CANTRCVMORE) && state <= CLOSING) { process_peer_close(ep); return; } @@ -772,10 +782,10 @@ TUNABLE_INT("hw.iw_cxgbe.db_delay_usecs" SYSCTL_INT(_hw_iw_cxgbe, OID_AUTO, db_delay_usecs, CTLFLAG_RW, &db_delay_usecs, 0, "Usecs to delay awaiting db fifo to drain"); -static int dack_mode = 1; +static int dack_mode = 0; TUNABLE_INT("hw.iw_cxgbe.dack_mode", &dack_mode); SYSCTL_INT(_hw_iw_cxgbe, OID_AUTO, dack_mode, CTLFLAG_RW, &dack_mode, 0, - "Delayed ack mode (default = 1)"); + "Delayed ack mode (default = 0)"); int c4iw_max_read_depth = 8; TUNABLE_INT("hw.iw_cxgbe.c4iw_max_read_depth", &c4iw_max_read_depth); @@ -802,10 +812,10 @@ TUNABLE_INT("hw.iw_cxgbe.c4iw_debug", &c SYSCTL_INT(_hw_iw_cxgbe, OID_AUTO, c4iw_debug, CTLFLAG_RW, &c4iw_debug, 0, "Enable debug logging (default = 0)"); -static int peer2peer; +static int peer2peer = 1; TUNABLE_INT("hw.iw_cxgbe.peer2peer", &peer2peer); SYSCTL_INT(_hw_iw_cxgbe, OID_AUTO, peer2peer, CTLFLAG_RW, &peer2peer, 0, - "Support peer2peer ULPs (default = 0)"); + "Support peer2peer ULPs (default = 1)"); static int p2p_type = FW_RI_INIT_P2PTYPE_READ_REQ; TUNABLE_INT("hw.iw_cxgbe.p2p_type", &p2p_type); @@ -819,13 +829,8 @@ SYSCTL_INT(_hw_iw_cxgbe, OID_AUTO, ep_ti static int mpa_rev = 1; TUNABLE_INT("hw.iw_cxgbe.mpa_rev", &mpa_rev); -#ifdef IW_CM_MPAV2 -SYSCTL_INT(_hw_iw_cxgbe, OID_AUTO, mpa_rev, CTLFLAG_RW, &mpa_rev, 0, - "MPA Revision, 0 supports amso1100, 1 is RFC0544 spec compliant, 2 is IETF MPA Peer Connect Draft compliant (default = 1)"); -#else SYSCTL_INT(_hw_iw_cxgbe, OID_AUTO, mpa_rev, CTLFLAG_RW, &mpa_rev, 0, - "MPA Revision, 0 supports amso1100, 1 is RFC0544 spec compliant (default = 1)"); -#endif + "MPA Revision, 0 supports amso1100, 1 is RFC5044 spec compliant, 2 is IETF MPA Peer Connect Draft compliant (default = 1)"); static int markers_enabled; TUNABLE_INT("hw.iw_cxgbe.markers_enabled", &markers_enabled); @@ -870,14 +875,16 @@ start_ep_timer(struct c4iw_ep *ep) add_timer(&ep->timer); } -static void +static int stop_ep_timer(struct c4iw_ep *ep) { del_timer_sync(&ep->timer); if (!test_and_set_bit(TIMEOUT, &ep->com.flags)) { c4iw_put_ep(&ep->com); + return 0; } + return 1; } static enum @@ -941,9 +948,10 @@ void _c4iw_free_ep(struct kref *kref) ep = container_of(kref, struct c4iw_ep, com.kref); epc = &ep->com; - KASSERT(!epc->so, ("%s ep->so %p", __func__, epc->so)); KASSERT(!epc->entry.tqe_prev, ("%s epc %p still on req list", __func__, epc)); + if (test_bit(QP_REFERENCED, &ep->com.flags)) + deref_qp(ep); kfree(ep); } @@ -1219,25 +1227,35 @@ static void close_complete_upcall(struct CTR2(KTR_IW_CXGBE, "%s:ccu1 %1", __func__, ep); ep->com.cm_id->event_handler(ep->com.cm_id, &event); - ep->com.cm_id->rem_ref(ep->com.cm_id); - ep->com.cm_id = NULL; - ep->com.qp = NULL; + deref_cm_id(&ep->com); set_bit(CLOSE_UPCALL, &ep->com.history); } CTR2(KTR_IW_CXGBE, "%s:ccuE %p", __func__, ep); } -static int abort_connection(struct c4iw_ep *ep) +static int send_abort(struct c4iw_ep *ep) { int err; CTR2(KTR_IW_CXGBE, "%s:abB %p", __func__, ep); - close_complete_upcall(ep, -ECONNRESET); - state_set(&ep->com, ABORTING); abort_socket(ep); - err = close_socket(&ep->com, 0); + + /* + * Since socket options were set as l_onoff=1 and l_linger=0 in in + * abort_socket, invoking soclose here sends a RST (reset) to the peer. + */ + err = close_socket(&ep->com, 1); set_bit(ABORT_CONN, &ep->com.history); CTR2(KTR_IW_CXGBE, "%s:abE %p", __func__, ep); + + /* + * TBD: iw_cgbe driver should receive ABORT reply for every ABORT + * request it has sent. But the current TOE driver is not propagating + * this ABORT reply event (via do_abort_rpl) to iw_cxgbe. So as a work- + * around de-refer 'ep' (which was refered before sending ABORT request) + * here instead of doing it in abort_rpl() handler of iw_cxgbe driver. + */ + c4iw_put_ep(&ep->com); return err; } @@ -1271,9 +1289,7 @@ static void peer_abort_upcall(struct c4i CTR2(KTR_IW_CXGBE, "%s:pau1 %p", __func__, ep); ep->com.cm_id->event_handler(ep->com.cm_id, &event); - ep->com.cm_id->rem_ref(ep->com.cm_id); - ep->com.cm_id = NULL; - ep->com.qp = NULL; + deref_cm_id(&ep->com); set_bit(ABORT_UPCALL, &ep->com.history); } CTR2(KTR_IW_CXGBE, "%s:pauE %p", __func__, ep); @@ -1327,17 +1343,16 @@ static void connect_reply_upcall(struct if (status < 0) { CTR3(KTR_IW_CXGBE, "%s:cru4 %p %d", __func__, ep, status); - ep->com.cm_id->rem_ref(ep->com.cm_id); - ep->com.cm_id = NULL; - ep->com.qp = NULL; + deref_cm_id(&ep->com); } CTR2(KTR_IW_CXGBE, "%s:cruE %p", __func__, ep); } -static void connect_request_upcall(struct c4iw_ep *ep) +static int connect_request_upcall(struct c4iw_ep *ep) { struct iw_cm_event event; + int ret; CTR3(KTR_IW_CXGBE, "%s: ep %p, mpa_v1 %d", __func__, ep, ep->tried_with_mpa_v1); @@ -1351,10 +1366,8 @@ static void connect_request_upcall(struc if (!ep->tried_with_mpa_v1) { /* this means MPA_v2 is used */ -#ifdef IW_CM_MPAV2 event.ord = ep->ord; event.ird = ep->ird; -#endif event.private_data_len = ep->plen - sizeof(struct mpa_v2_conn_params); event.private_data = ep->mpa_pkt + sizeof(struct mpa_message) + @@ -1362,19 +1375,21 @@ static void connect_request_upcall(struc } else { /* this means MPA_v1 is used. Send max supported */ -#ifdef IW_CM_MPAV2 event.ord = c4iw_max_read_depth; event.ird = c4iw_max_read_depth; -#endif event.private_data_len = ep->plen; event.private_data = ep->mpa_pkt + sizeof(struct mpa_message); } c4iw_get_ep(&ep->com); - ep->parent_ep->com.cm_id->event_handler(ep->parent_ep->com.cm_id, + ret = ep->parent_ep->com.cm_id->event_handler(ep->parent_ep->com.cm_id, &event); + if(ret) + c4iw_put_ep(&ep->com); + set_bit(CONNREQ_UPCALL, &ep->com.history); c4iw_put_ep(&ep->parent_ep->com); + return ret; } static void established_upcall(struct c4iw_ep *ep) @@ -1384,10 +1399,9 @@ static void established_upcall(struct c4 CTR2(KTR_IW_CXGBE, "%s:euB %p", __func__, ep); memset(&event, 0, sizeof(event)); event.event = IW_CM_EVENT_ESTABLISHED; -#ifdef IW_CM_MPAV2 event.ird = ep->ird; event.ord = ep->ord; -#endif + if (ep->com.cm_id) { CTR2(KTR_IW_CXGBE, "%s:eu1 %p", __func__, ep); @@ -1398,8 +1412,19 @@ static void established_upcall(struct c4 } - -static void process_mpa_reply(struct c4iw_ep *ep) +/* + * process_mpa_reply - process streaming mode MPA reply + * + * Returns: + * + * 0 upon success indicating a connect request was delivered to the ULP + * or the mpa request is incomplete but valid so far. + * + * 1 if a failure requires the caller to close the connection. + * + * 2 if a failure requires the caller to abort the connection. + */ +static int process_mpa_reply(struct c4iw_ep *ep) { struct mpa_message *mpa; struct mpa_v2_conn_params *mpa_v2_params; @@ -1412,17 +1437,17 @@ static void process_mpa_reply(struct c4i struct mbuf *top, *m; int flags = MSG_DONTWAIT; struct uio uio; + int disconnect = 0; CTR2(KTR_IW_CXGBE, "%s:pmrB %p", __func__, ep); /* - * Stop mpa timer. If it expired, then the state has - * changed and we bail since ep_timeout already aborted - * the connection. + * Stop mpa timer. If it expired, then + * we ignore the MPA reply. process_timeout() + * will abort the connection. */ - STOP_EP_TIMER(ep); - if (state_read(&ep->com) != MPA_REQ_SENT) - return; + if (STOP_EP_TIMER(ep)) + return 0; uio.uio_resid = 1000000; uio.uio_td = ep->com.thread; @@ -1434,7 +1459,7 @@ static void process_mpa_reply(struct c4i CTR2(KTR_IW_CXGBE, "%s:pmr1 %p", __func__, ep); START_EP_TIMER(ep); - return; + return 0; } err = -err; CTR2(KTR_IW_CXGBE, "%s:pmr2 %p", __func__, ep); @@ -1462,7 +1487,7 @@ static void process_mpa_reply(struct c4i CTR3(KTR_IW_CXGBE, "%s:pmr5 %p %d", __func__, ep, ep->mpa_pkt_len + m->m_len); err = (-EINVAL); - goto err; + goto err_stop_timer; } /* @@ -1480,8 +1505,9 @@ static void process_mpa_reply(struct c4i /* * if we don't even have the mpa message, then bail. */ - if (ep->mpa_pkt_len < sizeof(*mpa)) - return; + if (ep->mpa_pkt_len < sizeof(*mpa)) { + return 0; + } mpa = (struct mpa_message *) ep->mpa_pkt; /* Validate MPA header. */ @@ -1492,14 +1518,14 @@ static void process_mpa_reply(struct c4i printk(KERN_ERR MOD "%s MPA version mismatch. Local = %d, " " Received = %d\n", __func__, mpa_rev, mpa->revision); err = -EPROTO; - goto err; + goto err_stop_timer; } if (memcmp(mpa->key, MPA_KEY_REP, sizeof(mpa->key))) { CTR2(KTR_IW_CXGBE, "%s:pmr7 %p", __func__, ep); err = -EPROTO; - goto err; + goto err_stop_timer; } plen = ntohs(mpa->private_data_size); @@ -1511,7 +1537,7 @@ static void process_mpa_reply(struct c4i CTR2(KTR_IW_CXGBE, "%s:pmr8 %p", __func__, ep); err = -EPROTO; - goto err; + goto err_stop_timer; } /* @@ -1520,8 +1546,9 @@ static void process_mpa_reply(struct c4i if (ep->mpa_pkt_len > (sizeof(*mpa) + plen)) { CTR2(KTR_IW_CXGBE, "%s:pmr9 %p", __func__, ep); + STOP_EP_TIMER(ep); err = -EPROTO; - goto err; + goto err_stop_timer; } ep->plen = (u8) plen; @@ -1533,14 +1560,14 @@ static void process_mpa_reply(struct c4i if (ep->mpa_pkt_len < (sizeof(*mpa) + plen)) { CTR2(KTR_IW_CXGBE, "%s:pmra %p", __func__, ep); - return; + return 0; } if (mpa->flags & MPA_REJECT) { CTR2(KTR_IW_CXGBE, "%s:pmrb %p", __func__, ep); err = -ECONNREFUSED; - goto err; + goto err_stop_timer; } /* @@ -1683,6 +1710,7 @@ static void process_mpa_reply(struct c4i err = c4iw_modify_qp(ep->com.qp->rhp, ep->com.qp, C4IW_QP_ATTR_NEXT_STATE, &attrs, 0); err = -ENOMEM; + disconnect = 1; goto out; } @@ -1703,19 +1731,33 @@ static void process_mpa_reply(struct c4i err = c4iw_modify_qp(ep->com.qp->rhp, ep->com.qp, C4IW_QP_ATTR_NEXT_STATE, &attrs, 0); err = -ENOMEM; *** DIFF OUTPUT TRUNCATED AT 1000 LINES ***
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201612012338.uB1NcrBF087312>