From owner-svn-src-head@freebsd.org Sat Jun 1 15:37:45 2019 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 43BBD15BFE5F; Sat, 1 Jun 2019 15:37:45 +0000 (UTC) (envelope-from imp@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id DE3196D711; Sat, 1 Jun 2019 15:37:44 +0000 (UTC) (envelope-from imp@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id B6FC319BE2; Sat, 1 Jun 2019 15:37:44 +0000 (UTC) (envelope-from imp@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id x51FbidL057584; Sat, 1 Jun 2019 15:37:44 GMT (envelope-from imp@FreeBSD.org) Received: (from imp@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id x51FbixC057583; Sat, 1 Jun 2019 15:37:44 GMT (envelope-from imp@FreeBSD.org) Message-Id: <201906011537.x51FbixC057583@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: imp set sender to imp@FreeBSD.org using -f From: Warner Losh Date: Sat, 1 Jun 2019 15:37:44 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: svn commit: r348495 - head/sys/dev/nvme X-SVN-Group: head X-SVN-Commit-Author: imp X-SVN-Commit-Paths: head/sys/dev/nvme X-SVN-Commit-Revision: 348495 X-SVN-Commit-Repository: base MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: DE3196D711 X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [-2.98 / 15.00]; local_wl_from(0.00)[FreeBSD.org]; NEURAL_HAM_MEDIUM(-1.00)[-0.999,0]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; NEURAL_HAM_SHORT(-0.98)[-0.976,0]; ASN(0.00)[asn:11403, ipnet:2610:1c1:1::/48, country:US] X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 01 Jun 2019 15:37:45 -0000 Author: imp Date: Sat Jun 1 15:37:44 2019 New Revision: 348495 URL: https://svnweb.freebsd.org/changeset/base/348495 Log: Since a fatal trap can happen at aribtrary times, don't panic when the completions are not in a consistent state. Cope with the different places the normal I/O completion polling thread can be interrupted and then re-entered during a kernel panic + dump. Reviewed by: jhb and markj (both prior versions) Differential Revision: https://reviews.freebsd.org/D20478 Modified: head/sys/dev/nvme/nvme_qpair.c Modified: head/sys/dev/nvme/nvme_qpair.c ============================================================================== --- head/sys/dev/nvme/nvme_qpair.c Sat Jun 1 14:57:42 2019 (r348494) +++ head/sys/dev/nvme/nvme_qpair.c Sat Jun 1 15:37:44 2019 (r348495) @@ -31,6 +31,8 @@ __FBSDID("$FreeBSD$"); #include #include +#include +#include #include @@ -308,7 +310,7 @@ get_status_string(uint16_t sct, uint16_t sc) } static void -nvme_qpair_print_completion(struct nvme_qpair *qpair, +nvme_qpair_print_completion(struct nvme_qpair *qpair, struct nvme_completion *cpl) { uint16_t sct, sc; @@ -479,18 +481,51 @@ nvme_qpair_process_completions(struct nvme_qpair *qpai struct nvme_tracker *tr; struct nvme_completion cpl; int done = 0; + bool in_panic = dumping || SCHEDULER_STOPPED(); qpair->num_intr_handler_calls++; + /* + * qpair is not enabled, likely because a controller reset is is in + * progress. Ignore the interrupt - any I/O that was associated with + * this interrupt will get retried when the reset is complete. + */ if (!qpair->is_enabled) - /* - * qpair is not enabled, likely because a controller reset is - * is in progress. Ignore the interrupt - any I/O that was - * associated with this interrupt will get retried when the - * reset is complete. - */ return (false); + /* + * A panic can stop the CPU this routine is running on at any point. If + * we're called during a panic, complete the sq_head wrap protocol for + * the case where we are interrupted just after the increment at 1 + * below, but before we can reset cq_head to zero at 2. Also cope with + * the case where we do the zero at 2, but may or may not have done the + * phase adjustment at step 3. The panic machinery flushes all pending + * memory writes, so we can make these strong ordering assumptions + * that would otherwise be unwise if we were racing in real time. + */ + if (__predict_false(in_panic)) { + if (qpair->cq_head == qpair->num_entries) { + /* + * Here we know that we need to zero cq_head and then negate + * the phase, which hasn't been assigned if cq_head isn't + * zero due to the atomic_store_rel. + */ + qpair->cq_head = 0; + qpair->phase = !qpair->phase; + } else if (qpair->cq_head == 0) { + /* + * In this case, we know that the assignment at 2 + * happened below, but we don't know if it 3 happened or + * not. To do this, we look at the last completion + * entry and set the phase to the opposite phase + * that it has. This gets us back in sync + */ + cpl = qpair->cpl[qpair->num_entries - 1]; + nvme_completion_swapbytes(&cpl); + qpair->phase = !NVME_STATUS_GET_P(cpl.status); + } + } + bus_dmamap_sync(qpair->dma_tag, qpair->queuemem_map, BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE); while (1) { @@ -508,17 +543,35 @@ nvme_qpair_process_completions(struct nvme_qpair *qpai nvme_qpair_complete_tracker(qpair, tr, &cpl, ERROR_PRINT_ALL); qpair->sq_head = cpl.sqhd; done++; - } else { - nvme_printf(qpair->ctrlr, + } else if (!in_panic) { + /* + * A missing tracker is normally an error. However, a + * panic can stop the CPU this routine is running on + * after completing an I/O but before updating + * qpair->cq_head at 1 below. Later, we re-enter this + * routine to poll I/O associated with the kernel + * dump. We find that the tr has been set to null before + * calling the completion routine. If it hasn't + * completed (or it triggers a panic), then '1' below + * won't have updated cq_head. Rather than panic again, + * ignore this condition because it's not unexpected. + */ + nvme_printf(qpair->ctrlr, "cpl does not map to outstanding cmd\n"); /* nvme_dump_completion expects device endianess */ nvme_dump_completion(&qpair->cpl[qpair->cq_head]); - KASSERT(0, ("received completion for unknown cmd\n")); + KASSERT(0, ("received completion for unknown cmd")); } - if (++qpair->cq_head == qpair->num_entries) { - qpair->cq_head = 0; - qpair->phase = !qpair->phase; + /* + * There's a number of races with the following (see above) when + * the system panics. We compensate for each one of them by + * using the atomic store to force strong ordering (at least when + * viewed in the aftermath of a panic). + */ + if (++qpair->cq_head == qpair->num_entries) { /* 1 */ + atomic_store_rel_int(&qpair->cq_head, 0); /* 2 */ + qpair->phase = !qpair->phase; /* 3 */ } nvme_mmio_write_4(qpair->ctrlr, doorbell[qpair->id].cq_hdbl,