Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 13 Jul 2021 01:55:34 GMT
From:      Alexander Motin <mav@FreeBSD.org>
To:        src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-branches@FreeBSD.org
Subject:   git: c38e334b175c - stable/12 - nvme(4): Report NPWA before NPWG as stripesize.
Message-ID:  <202107130155.16D1tYJY024098@gitrepo.freebsd.org>

next in thread | raw e-mail | index | archive | help
The branch stable/12 has been updated by mav:

URL: https://cgit.FreeBSD.org/src/commit/?id=c38e334b175c244ad596def2a77852fb294aff1b

commit c38e334b175c244ad596def2a77852fb294aff1b
Author:     Alexander Motin <mav@FreeBSD.org>
AuthorDate: 2021-07-06 02:19:48 +0000
Commit:     Alexander Motin <mav@FreeBSD.org>
CommitDate: 2021-07-13 01:55:14 +0000

    nvme(4): Report NPWA before NPWG as stripesize.
    
    New Samsung 980 SSDs report Namespace Preferred Write Alignment of
    8 (4KB) and Namespace Preferred Write Granularity of 32 (16KB).
    My quick tests show that 16KB is a minimal sequential write size
    when the SSD reaches peak IOPS, so writing much less is very slow.
    But writing slightly less or slightly more does not change much,
    so it seems not so much a size granularity as minimum I/O size.
    
    Thinking about different stripesize consumers:
     - Partition alignment should be based on NPWA by definition.
     - ZFS ashift in part of forcing alignment of all I/Os should also
    be based on NPWA.  In part of forcing size granularity, if really
    needed, it may be set to NPWG, but too big value can make ZFS too
    space-inefficient, and the 16KB is actually the biggest supported
    value there now.
     - ZFS recordsize/volblocksize could potentially be tuned up toward
    NPWG to work as I/O size granularity, but enabled compression makes
    it too fuzzy.  And those are normally user-configurable things.
     - ZFS I/O aggregation code could definitely use Optimal Write Size
    value and may be NPWG, but we don't have fields in GEOM now to report
    the minimal and optimal I/O sizes, and even maximal is not reported
    outside GEOM DISK to be used by ZFS.
    
    MFC after:      1 week
    
    (cherry picked from commit e3bcd07d834def94dcf570ac7350ca2c454ebf10)
---
 sys/dev/nvme/nvme_ns.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/sys/dev/nvme/nvme_ns.c b/sys/dev/nvme/nvme_ns.c
index 9452f6460973..3bdf30e9264c 100644
--- a/sys/dev/nvme/nvme_ns.c
+++ b/sys/dev/nvme/nvme_ns.c
@@ -231,10 +231,15 @@ nvme_ns_get_data(struct nvme_namespace *ns)
 uint32_t
 nvme_ns_get_stripesize(struct nvme_namespace *ns)
 {
+	uint32_t ss;
 
 	if (((ns->data.nsfeat >> NVME_NS_DATA_NSFEAT_NPVALID_SHIFT) &
-	    NVME_NS_DATA_NSFEAT_NPVALID_MASK) != 0 && ns->data.npwg != 0) {
-		return ((ns->data.npwg + 1) * nvme_ns_get_sector_size(ns));
+	    NVME_NS_DATA_NSFEAT_NPVALID_MASK) != 0) {
+		ss = nvme_ns_get_sector_size(ns);
+		if (ns->data.npwa != 0)
+			return ((ns->data.npwa + 1) * ss);
+		else if (ns->data.npwg != 0)
+			return ((ns->data.npwg + 1) * ss);
 	}
 	return (ns->boundary);
 }



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?202107130155.16D1tYJY024098>