Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 25 Feb 2025 16:48:18 GMT
From:      Warner Losh <imp@FreeBSD.org>
To:        src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-main@FreeBSD.org
Subject:   git: dc95228d9847 - main - nvme: Fix hotplug on one of the amazon platforms
Message-ID:  <202502251648.51PGmISF013173@gitrepo.freebsd.org>

next in thread | raw e-mail | index | archive | help
The branch main has been updated by imp:

URL: https://cgit.FreeBSD.org/src/commit/?id=dc95228d98474aba940e3885164912b419c5579d

commit dc95228d98474aba940e3885164912b419c5579d
Author:     Warner Losh <imp@FreeBSD.org>
AuthorDate: 2025-02-25 16:29:14 +0000
Commit:     Warner Losh <imp@FreeBSD.org>
CommitDate: 2025-02-25 16:36:53 +0000

    nvme: Fix hotplug on one of the amazon platforms
    
    Amazon EC2 m7i cloud instances use PCI hotplug rather than ACPI
    hotplug. The card is removed and detach is called to remove the drive
    from the system. The hardware is no longer present at this point, but
    the bridge doesn't translate the now-missing hardware reads to all ff's
    leading us to conclude the hardware is there and we need to do a proper
    shutdown of it. Fix this oversight by asking the bridge if the device is
    still present as well. We need both tests since some systems one cane
    remove the card w/o a hotplug event and we want to fail-safe in those
    cases.
    
    Convert gone to a bool while I'm here and update a comment about
    shutting down the controller and why that's important.
    
    Tested by: cperciva
    Sponsored by: Netflix
---
 sys/dev/nvme/nvme_ctrlr.c | 31 +++++++++++++++++++------------
 1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/sys/dev/nvme/nvme_ctrlr.c b/sys/dev/nvme/nvme_ctrlr.c
index 98a9e62f851b..c47e7ece5a04 100644
--- a/sys/dev/nvme/nvme_ctrlr.c
+++ b/sys/dev/nvme/nvme_ctrlr.c
@@ -1612,7 +1612,8 @@ nvme_ctrlr_construct(struct nvme_controller *ctrlr, device_t dev)
 void
 nvme_ctrlr_destruct(struct nvme_controller *ctrlr, device_t dev)
 {
-	int	gone, i;
+	int	i;
+	bool	gone;
 
 	ctrlr->is_dying = true;
 
@@ -1622,10 +1623,16 @@ nvme_ctrlr_destruct(struct nvme_controller *ctrlr, device_t dev)
 		goto noadminq;
 
 	/*
-	 * Check whether it is a hot unplug or a clean driver detach.
-	 * If device is not there any more, skip any shutdown commands.
+	 * Check whether it is a hot unplug or a clean driver detach.  If device
+	 * is not there any more, skip any shutdown commands.  Some hotplug
+	 * bridges will return zeros instead of ff's when the device is
+	 * departing, so ask the bridge if the device is gone. Some systems can
+	 * remove the drive w/o the bridge knowing its gone (they don't really
+	 * do hotplug), so failsafe with detecting all ff's (impossible with
+	 * this hardware) as the device being gone.
 	 */
-	gone = (nvme_mmio_read_4(ctrlr, csts) == NVME_GONE);
+	gone = bus_child_present(dev) == 0 ||
+	    (nvme_mmio_read_4(ctrlr, csts) == NVME_GONE);
 	if (gone)
 		nvme_ctrlr_fail(ctrlr, true);
 	else
@@ -1653,17 +1660,17 @@ nvme_ctrlr_destruct(struct nvme_controller *ctrlr, device_t dev)
 	nvme_admin_qpair_destroy(&ctrlr->adminq);
 
 	/*
-	 *  Notify the controller of a shutdown, even though this is due to
-	 *   a driver unload, not a system shutdown (this path is not invoked
-	 *   during shutdown).  This ensures the controller receives a
-	 *   shutdown notification in case the system is shutdown before
-	 *   reloading the driver.
+	 * Notify the controller of a shutdown, even though this is due to a
+	 * driver unload, not a system shutdown (this path is not invoked uring
+	 * shutdown).  This ensures the controller receives a shutdown
+	 * notification in case the system is shutdown before reloading the
+	 * driver. Some NVMe drives need this to flush their cache to stable
+	 * media and consider it a safe shutdown in SMART stats.
 	 */
-	if (!gone)
+	if (!gone) {
 		nvme_ctrlr_shutdown(ctrlr);
-
-	if (!gone)
 		nvme_ctrlr_disable(ctrlr);
+	}
 
 noadminq:
 	if (ctrlr->taskqueue)



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?202502251648.51PGmISF013173>