Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 31 Jul 2021 00:21:39 GMT
From:      Warner Losh <imp@FreeBSD.org>
To:        src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-branches@FreeBSD.org
Subject:   git: f76c34659f9d - stable/12 - nvme: coherently read status of completion records
Message-ID:  <202107310021.16V0LdN3052270@gitrepo.freebsd.org>

next in thread | raw e-mail | index | archive | help
The branch stable/12 has been updated by imp:

URL: https://cgit.FreeBSD.org/src/commit/?id=f76c34659f9d94683e1161b8913b6466a2fdbaba

commit f76c34659f9d94683e1161b8913b6466a2fdbaba
Author:     Warner Losh <imp@FreeBSD.org>
AuthorDate: 2021-07-02 22:00:42 +0000
Commit:     Warner Losh <imp@FreeBSD.org>
CommitDate: 2021-07-31 00:02:53 +0000

    nvme: coherently read status of completion records
    
    Coherently read the phase bit of the status completion record. We loop
    over the completion record array, looking for all the transactions in
    the same phase that have been completed. In doing that, we have to be
    careful to read the status field first, and if it indicates a complete
    record, we need to read and process that record. Otherwise, the host
    might be overtaken by device when reading this completion record,
    leading to a mistaken belief that the record is in phase. This leads to
    the code using old values and looking at an already completed entry, which
    has no current tracker.
    
    To work around this problem, we read the status and make sure it is in
    phase, we then re-read the entire completion record guaranteeing it's
    complete, valid, and consistent . In addition we resync the dmatag to
    reflect changes since the prior loop for the bouncing dma case.
    
    Reviewed by:            jrtc27@, chuck@
    Found by:               jrtc27 (this fix is based in part on her D30995 fix)
    Sponsored by:           Netflix
    Differential Revision:  https://reviews.freebsd.org/D31002
    
    (cherry picked from commit aa0ab681ae755e01cd69435fab50f6852f248c42)
---
 sys/dev/nvme/nvme_qpair.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/sys/dev/nvme/nvme_qpair.c b/sys/dev/nvme/nvme_qpair.c
index e188ef9e3ebd..d126b9dbeb18 100644
--- a/sys/dev/nvme/nvme_qpair.c
+++ b/sys/dev/nvme/nvme_qpair.c
@@ -586,13 +586,30 @@ nvme_qpair_process_completions(struct nvme_qpair *qpair)
 	}
 
 	while (1) {
-		cpl = qpair->cpl[qpair->cq_head];
+		uint16_t status;
 
-		/* Convert to host endian */
+		/*
+		 * We need to do this dance to avoid a race between the host and
+		 * the device where the device overtakes the host while the host
+		 * is reading this record, leaving the status field 'new' and
+		 * the sqhd and cid fields potentially stale. If the phase
+		 * doesn't match, that means status hasn't yet been updated and
+		 * we'll get any pending changes next time. It also means that
+		 * the phase must be the same the second time. We have to sync
+		 * before reading to ensure any bouncing completes.
+		 */
+		status = le16toh(qpair->cpl[qpair->cq_head].status);
+		if (NVME_STATUS_GET_P(status) != qpair->phase)
+			break;
+
+		bus_dmamap_sync(qpair->dma_tag, qpair->queuemem_map,
+		    BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
+		cpl = qpair->cpl[qpair->cq_head];
 		nvme_completion_swapbytes(&cpl);
 
-		if (NVME_STATUS_GET_P(cpl.status) != qpair->phase)
-			break;
+		KASSERT(
+		    NVME_STATUS_GET_P(status) == NVME_STATUS_GET_P(cpl.status),
+		    ("Phase unexpectedly inconsistent"));
 
 		tr = qpair->act_tr[cpl.cid];
 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?202107310021.16V0LdN3052270>