From owner-dev-commits-src-all@freebsd.org Tue Jul 6 03:13:22 2021 Return-Path: <owner-dev-commits-src-all@freebsd.org> Delivered-To: dev-commits-src-all@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id C2F8766F038; Tue, 6 Jul 2021 03:13:22 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4GJngk55BCz3PJQ; Tue, 6 Jul 2021 03:13:22 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 959B711CED; Tue, 6 Jul 2021 03:13:22 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.16.1/8.16.1) with ESMTP id 1663DMrH070036; Tue, 6 Jul 2021 03:13:22 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.16.1/8.16.1/Submit) id 1663DMSJ070035; Tue, 6 Jul 2021 03:13:22 GMT (envelope-from git) Date: Tue, 6 Jul 2021 03:13:22 GMT Message-Id: <202107060313.1663DMSJ070035@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-main@FreeBSD.org From: Alexander Motin <mav@FreeBSD.org> Subject: git: e3bcd07d834d - main - nvme(4): Report NPWA before NPWG as stripesize. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: mav X-Git-Repository: src X-Git-Refname: refs/heads/main X-Git-Reftype: branch X-Git-Commit: e3bcd07d834def94dcf570ac7350ca2c454ebf10 Auto-Submitted: auto-generated X-BeenThere: dev-commits-src-all@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Commit messages for all branches of the src repository <dev-commits-src-all.freebsd.org> List-Unsubscribe: <https://lists.freebsd.org/mailman/options/dev-commits-src-all>, <mailto:dev-commits-src-all-request@freebsd.org?subject=unsubscribe> List-Archive: <http://lists.freebsd.org/pipermail/dev-commits-src-all/> List-Post: <mailto:dev-commits-src-all@freebsd.org> List-Help: <mailto:dev-commits-src-all-request@freebsd.org?subject=help> List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/dev-commits-src-all>, <mailto:dev-commits-src-all-request@freebsd.org?subject=subscribe> X-List-Received-Date: Tue, 06 Jul 2021 03:13:22 -0000 The branch main has been updated by mav: URL: https://cgit.FreeBSD.org/src/commit/?id=e3bcd07d834def94dcf570ac7350ca2c454ebf10 commit e3bcd07d834def94dcf570ac7350ca2c454ebf10 Author: Alexander Motin <mav@FreeBSD.org> AuthorDate: 2021-07-06 02:19:48 +0000 Commit: Alexander Motin <mav@FreeBSD.org> CommitDate: 2021-07-06 03:13:15 +0000 nvme(4): Report NPWA before NPWG as stripesize. New Samsung 980 SSDs report Namespace Preferred Write Alignment of 8 (4KB) and Namespace Preferred Write Granularity of 32 (16KB). My quick tests show that 16KB is a minimal sequential write size when the SSD reaches peak IOPS, so writing much less is very slow. But writing slightly less or slightly more does not change much, so it seems not so much a size granularity as minimum I/O size. Thinking about different stripesize consumers: - Partition alignment should be based on NPWA by definition. - ZFS ashift in part of forcing alignment of all I/Os should also be based on NPWA. In part of forcing size granularity, if really needed, it may be set to NPWG, but too big value can make ZFS too space-inefficient, and the 16KB is actually the biggest supported value there now. - ZFS recordsize/volblocksize could potentially be tuned up toward NPWG to work as I/O size granularity, but enabled compression makes it too fuzzy. And those are normally user-configurable things. - ZFS I/O aggregation code could definitely use Optimal Write Size value and may be NPWG, but we don't have fields in GEOM now to report the minimal and optimal I/O sizes, and even maximal is not reported outside GEOM DISK to be used by ZFS. MFC after: 1 week --- sys/dev/nvme/nvme_ns.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/sys/dev/nvme/nvme_ns.c b/sys/dev/nvme/nvme_ns.c index 82ab48efa826..8f97b08b88f4 100644 --- a/sys/dev/nvme/nvme_ns.c +++ b/sys/dev/nvme/nvme_ns.c @@ -231,10 +231,15 @@ nvme_ns_get_data(struct nvme_namespace *ns) uint32_t nvme_ns_get_stripesize(struct nvme_namespace *ns) { + uint32_t ss; if (((ns->data.nsfeat >> NVME_NS_DATA_NSFEAT_NPVALID_SHIFT) & - NVME_NS_DATA_NSFEAT_NPVALID_MASK) != 0 && ns->data.npwg != 0) { - return ((ns->data.npwg + 1) * nvme_ns_get_sector_size(ns)); + NVME_NS_DATA_NSFEAT_NPVALID_MASK) != 0) { + ss = nvme_ns_get_sector_size(ns); + if (ns->data.npwa != 0) + return ((ns->data.npwa + 1) * ss); + else if (ns->data.npwg != 0) + return ((ns->data.npwg + 1) * ss); } return (ns->boundary); }