From owner-dev-commits-src-all@freebsd.org Fri Jul 2 22:05:39 2021 Return-Path: Delivered-To: dev-commits-src-all@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 60D2A673299; Fri, 2 Jul 2021 22:05:39 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4GGq0323Qyz4Vbf; Fri, 2 Jul 2021 22:05:39 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 154631BEBD; Fri, 2 Jul 2021 22:05:39 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.16.1/8.16.1) with ESMTP id 162M5c68010286; Fri, 2 Jul 2021 22:05:38 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.16.1/8.16.1/Submit) id 162M5cbP010285; Fri, 2 Jul 2021 22:05:38 GMT (envelope-from git) Date: Fri, 2 Jul 2021 22:05:38 GMT Message-Id: <202107022205.162M5cbP010285@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-main@FreeBSD.org From: Warner Losh Subject: git: aa0ab681ae75 - main - nvme: coherently read status of completion records MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: imp X-Git-Repository: src X-Git-Refname: refs/heads/main X-Git-Reftype: branch X-Git-Commit: aa0ab681ae755e01cd69435fab50f6852f248c42 Auto-Submitted: auto-generated X-BeenThere: dev-commits-src-all@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Commit messages for all branches of the src repository List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Jul 2021 22:05:39 -0000 The branch main has been updated by imp: URL: https://cgit.FreeBSD.org/src/commit/?id=aa0ab681ae755e01cd69435fab50f6852f248c42 commit aa0ab681ae755e01cd69435fab50f6852f248c42 Author: Warner Losh AuthorDate: 2021-07-02 22:00:42 +0000 Commit: Warner Losh CommitDate: 2021-07-02 22:05:19 +0000 nvme: coherently read status of completion records Coherently read the phase bit of the status completion record. We loop over the completion record array, looking for all the transactions in the same phase that have been completed. In doing that, we have to be careful to read the status field first, and if it indicates a complete record, we need to read and process that record. Otherwise, the host might be overtaken by device when reading this completion record, leading to a mistaken belief that the record is in phase. This leads to the code using old values and looking at an already completed entry, which has no current tracker. To work around this problem, we read the status and make sure it is in phase, we then re-read the entire completion record guaranteeing it's complete, valid, and consistent . In addition we resync the dmatag to reflect changes since the prior loop for the bouncing dma case. Reviewed by: jrtc27@, chuck@ Found by: jrtc27 (this fix is based in part on her D30995 fix) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D31002 --- sys/dev/nvme/nvme_qpair.c | 25 +++++++++++++++++++++---- 1 file changed, 21 insertions(+), 4 deletions(-) diff --git a/sys/dev/nvme/nvme_qpair.c b/sys/dev/nvme/nvme_qpair.c index 0726ca248442..12770f38d42e 100644 --- a/sys/dev/nvme/nvme_qpair.c +++ b/sys/dev/nvme/nvme_qpair.c @@ -583,13 +583,30 @@ nvme_qpair_process_completions(struct nvme_qpair *qpair) } while (1) { - cpl = qpair->cpl[qpair->cq_head]; + uint16_t status; - /* Convert to host endian */ + /* + * We need to do this dance to avoid a race between the host and + * the device where the device overtakes the host while the host + * is reading this record, leaving the status field 'new' and + * the sqhd and cid fields potentially stale. If the phase + * doesn't match, that means status hasn't yet been updated and + * we'll get any pending changes next time. It also means that + * the phase must be the same the second time. We have to sync + * before reading to ensure any bouncing completes. + */ + status = le16toh(qpair->cpl[qpair->cq_head].status); + if (NVME_STATUS_GET_P(status) != qpair->phase) + break; + + bus_dmamap_sync(qpair->dma_tag, qpair->queuemem_map, + BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE); + cpl = qpair->cpl[qpair->cq_head]; nvme_completion_swapbytes(&cpl); - if (NVME_STATUS_GET_P(cpl.status) != qpair->phase) - break; + KASSERT( + NVME_STATUS_GET_P(status) == NVME_STATUS_GET_P(cpl.status), + ("Phase unexpectedly inconsistent")); tr = qpair->act_tr[cpl.cid];