From owner-dev-commits-src-all@freebsd.org Mon Jul 26 16:13:24 2021 Return-Path: Delivered-To: dev-commits-src-all@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 8E41266F5FD; Mon, 26 Jul 2021 16:13:24 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4GYQ2W5t5Qz3pYH; Mon, 26 Jul 2021 16:13:23 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id D7CDD1A313; Mon, 26 Jul 2021 16:13:22 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.16.1/8.16.1) with ESMTP id 16QGDMkG005360; Mon, 26 Jul 2021 16:13:22 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.16.1/8.16.1/Submit) id 16QGDM77005359; Mon, 26 Jul 2021 16:13:22 GMT (envelope-from git) Date: Mon, 26 Jul 2021 16:13:22 GMT Message-Id: <202107261613.16QGDM77005359@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-branches@FreeBSD.org From: Hans Petter Selasky Subject: git: 06307dd6e0df - stable/13 - ibcore: Issue DREQ when receiving REQ/REP for stale QP. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: hselasky X-Git-Repository: src X-Git-Refname: refs/heads/stable/13 X-Git-Reftype: branch X-Git-Commit: 06307dd6e0df91a867a6a42bb4da121f89085b54 Auto-Submitted: auto-generated X-BeenThere: dev-commits-src-all@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Commit messages for all branches of the src repository List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jul 2021 16:13:24 -0000 The branch stable/13 has been updated by hselasky: URL: https://cgit.FreeBSD.org/src/commit/?id=06307dd6e0df91a867a6a42bb4da121f89085b54 commit 06307dd6e0df91a867a6a42bb4da121f89085b54 Author: Hans Petter Selasky AuthorDate: 2021-06-16 13:01:38 +0000 Commit: Hans Petter Selasky CommitDate: 2021-07-26 16:04:29 +0000 ibcore: Issue DREQ when receiving REQ/REP for stale QP. From "InfiBand Architecture Specifications Volume 1": A QP is said to have a stale connection when only one side has connection information. A stale connection may result if the remote CM had dropped the connection and sent a DREQ but the DREQ was never received by the local CM. Alternatively the remote CM may have lost all record of past connections because its node crashed and rebooted, while the local CM did not become aware of the remote node's reboot and therefore did not clean up stale connections. And: A local CM may receive a REQ/REP for a stale connection. It shall abort the connection issuing REJ to the REQ/REP. It shall then issue DREQ with "DREQ:remote QPN" set to the remote QPN from the REQ/REP. This patch solves a problem with reuse of QPN. Current codebase, that is IPoIB, relies on a REAP-mechanism to do cleanup of the structures in CM. A problem with this is the timeconstants governing this mechanism; they are up to 768 seconds and the interface may look inresponsive in that period. Issuing a DREQ (and receiving a DREP) does the necessary cleanup and the interface comes up. Linux commit: 9315bc9a133011fdb084f2626b86db3ebb64661f Reviewed by: kib Sponsored by: Mellanox Technologies // NVIDIA Networking (cherry picked from commit 12a913d207d287733da75784b3aa35f5e48d0cef) --- sys/ofed/drivers/infiniband/core/ib_cm.c | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-) diff --git a/sys/ofed/drivers/infiniband/core/ib_cm.c b/sys/ofed/drivers/infiniband/core/ib_cm.c index 7dea2620cb60..376dcb420f5b 100644 --- a/sys/ofed/drivers/infiniband/core/ib_cm.c +++ b/sys/ofed/drivers/infiniband/core/ib_cm.c @@ -1679,6 +1679,7 @@ static struct cm_id_private * cm_match_req(struct cm_work *work, struct cm_id_private *listen_cm_id_priv, *cur_cm_id_priv; struct cm_timewait_info *timewait_info; struct cm_req_msg *req_msg; + struct ib_cm_id *cm_id; req_msg = (struct cm_req_msg *)work->mad_recv_wc->recv_buf.mad; @@ -1700,10 +1701,18 @@ static struct cm_id_private * cm_match_req(struct cm_work *work, timewait_info = cm_insert_remote_qpn(cm_id_priv->timewait_info); if (timewait_info) { cm_cleanup_timewait(cm_id_priv->timewait_info); + cur_cm_id_priv = cm_get_id(timewait_info->work.local_id, + timewait_info->work.remote_id); + spin_unlock_irq(&cm.lock); cm_issue_rej(work->port, work->mad_recv_wc, IB_CM_REJ_STALE_CONN, CM_MSG_RESPONSE_REQ, NULL, 0); + if (cur_cm_id_priv) { + cm_id = &cur_cm_id_priv->id; + ib_send_cm_dreq(cm_id, NULL, 0); + cm_deref_id(cur_cm_id_priv); + } return NULL; } @@ -2084,6 +2093,9 @@ static int cm_rep_handler(struct cm_work *work) struct cm_id_private *cm_id_priv; struct cm_rep_msg *rep_msg; int ret; + struct cm_id_private *cur_cm_id_priv; + struct ib_cm_id *cm_id; + struct cm_timewait_info *timewait_info; rep_msg = (struct cm_rep_msg *)work->mad_recv_wc->recv_buf.mad; cm_id_priv = cm_acquire_id(rep_msg->remote_comm_id, 0); @@ -2118,16 +2130,26 @@ static int cm_rep_handler(struct cm_work *work) goto error; } /* Check for a stale connection. */ - if (cm_insert_remote_qpn(cm_id_priv->timewait_info)) { + timewait_info = cm_insert_remote_qpn(cm_id_priv->timewait_info); + if (timewait_info) { rb_erase(&cm_id_priv->timewait_info->remote_id_node, &cm.remote_id_table); cm_id_priv->timewait_info->inserted_remote_id = 0; + cur_cm_id_priv = cm_get_id(timewait_info->work.local_id, + timewait_info->work.remote_id); + spin_unlock(&cm.lock); spin_unlock_irq(&cm_id_priv->lock); cm_issue_rej(work->port, work->mad_recv_wc, IB_CM_REJ_STALE_CONN, CM_MSG_RESPONSE_REP, NULL, 0); ret = -EINVAL; + if (cur_cm_id_priv) { + cm_id = &cur_cm_id_priv->id; + ib_send_cm_dreq(cm_id, NULL, 0); + cm_deref_id(cur_cm_id_priv); + } + goto error; } spin_unlock(&cm.lock);