Date: Tue, 25 Feb 2025 16:48:18 GMT From: Warner Losh <imp@FreeBSD.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-main@FreeBSD.org Subject: git: dc95228d9847 - main - nvme: Fix hotplug on one of the amazon platforms Message-ID: <202502251648.51PGmISF013173@gitrepo.freebsd.org>
next in thread | raw e-mail | index | archive | help
The branch main has been updated by imp: URL: https://cgit.FreeBSD.org/src/commit/?id=dc95228d98474aba940e3885164912b419c5579d commit dc95228d98474aba940e3885164912b419c5579d Author: Warner Losh <imp@FreeBSD.org> AuthorDate: 2025-02-25 16:29:14 +0000 Commit: Warner Losh <imp@FreeBSD.org> CommitDate: 2025-02-25 16:36:53 +0000 nvme: Fix hotplug on one of the amazon platforms Amazon EC2 m7i cloud instances use PCI hotplug rather than ACPI hotplug. The card is removed and detach is called to remove the drive from the system. The hardware is no longer present at this point, but the bridge doesn't translate the now-missing hardware reads to all ff's leading us to conclude the hardware is there and we need to do a proper shutdown of it. Fix this oversight by asking the bridge if the device is still present as well. We need both tests since some systems one cane remove the card w/o a hotplug event and we want to fail-safe in those cases. Convert gone to a bool while I'm here and update a comment about shutting down the controller and why that's important. Tested by: cperciva Sponsored by: Netflix --- sys/dev/nvme/nvme_ctrlr.c | 31 +++++++++++++++++++------------ 1 file changed, 19 insertions(+), 12 deletions(-) diff --git a/sys/dev/nvme/nvme_ctrlr.c b/sys/dev/nvme/nvme_ctrlr.c index 98a9e62f851b..c47e7ece5a04 100644 --- a/sys/dev/nvme/nvme_ctrlr.c +++ b/sys/dev/nvme/nvme_ctrlr.c @@ -1612,7 +1612,8 @@ nvme_ctrlr_construct(struct nvme_controller *ctrlr, device_t dev) void nvme_ctrlr_destruct(struct nvme_controller *ctrlr, device_t dev) { - int gone, i; + int i; + bool gone; ctrlr->is_dying = true; @@ -1622,10 +1623,16 @@ nvme_ctrlr_destruct(struct nvme_controller *ctrlr, device_t dev) goto noadminq; /* - * Check whether it is a hot unplug or a clean driver detach. - * If device is not there any more, skip any shutdown commands. + * Check whether it is a hot unplug or a clean driver detach. If device + * is not there any more, skip any shutdown commands. Some hotplug + * bridges will return zeros instead of ff's when the device is + * departing, so ask the bridge if the device is gone. Some systems can + * remove the drive w/o the bridge knowing its gone (they don't really + * do hotplug), so failsafe with detecting all ff's (impossible with + * this hardware) as the device being gone. */ - gone = (nvme_mmio_read_4(ctrlr, csts) == NVME_GONE); + gone = bus_child_present(dev) == 0 || + (nvme_mmio_read_4(ctrlr, csts) == NVME_GONE); if (gone) nvme_ctrlr_fail(ctrlr, true); else @@ -1653,17 +1660,17 @@ nvme_ctrlr_destruct(struct nvme_controller *ctrlr, device_t dev) nvme_admin_qpair_destroy(&ctrlr->adminq); /* - * Notify the controller of a shutdown, even though this is due to - * a driver unload, not a system shutdown (this path is not invoked - * during shutdown). This ensures the controller receives a - * shutdown notification in case the system is shutdown before - * reloading the driver. + * Notify the controller of a shutdown, even though this is due to a + * driver unload, not a system shutdown (this path is not invoked uring + * shutdown). This ensures the controller receives a shutdown + * notification in case the system is shutdown before reloading the + * driver. Some NVMe drives need this to flush their cache to stable + * media and consider it a safe shutdown in SMART stats. */ - if (!gone) + if (!gone) { nvme_ctrlr_shutdown(ctrlr); - - if (!gone) nvme_ctrlr_disable(ctrlr); + } noadminq: if (ctrlr->taskqueue)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?202502251648.51PGmISF013173>