From nobody Mon Sep 25 13:21:12 2023 X-Original-To: stable@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4RvNmx2XYkz4rKcx for ; Mon, 25 Sep 2023 13:21:21 +0000 (UTC) (envelope-from frank@harz2023.behrens.de) Received: from post.behrens.de (post.behrens.de [IPv6:2a01:170:1023::1:2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "post.behrens.de", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4RvNmv64Gdz4K6W; Mon, 25 Sep 2023 13:21:19 +0000 (UTC) (envelope-from frank@harz2023.behrens.de) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=harz2023.behrens.de header.s=pinky2 header.b=gvncdPgC; spf=pass (mx1.freebsd.org: domain of frank@harz2023.behrens.de designates 2a01:170:1023::1:2 as permitted sender) smtp.mailfrom=frank@harz2023.behrens.de; dmarc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= harz2023.behrens.de; h=message-id:date:mime-version:subject:to :references:from:cc:in-reply-to:content-type :content-transfer-encoding; s=pinky2; bh=gbwFhzDM33LUOUDSY8DZ1Yo 5YJtPeN8/1dQuDLNIVh0=; b=gvncdPgCCQelstbk3iIdIz/J97fXe8ST1yIRx0o kLSP/hYEyK4LQUBJ77w2ZRtqoTn7eN+rfp9/Zq8hkcP2BIBOIfOpcApCpCE9f+mx sKbQPEBGp0fMVGojd9GBzvLd0zCgVuuKGK+HsPp9HkRKfzbZcHpAYAmWtLp32g5E x4mU= Received: from [IPV6:fdfb:1999:428:bb80:1490:f3ce:c379:6f69] ([IPv6:fdfb:1999:428:bb80:1490:f3ce:c379:6f69]) (authenticated bits=0) by post.behrens.de (8.17.1/8.17.1) with ESMTPSA(MSP) id 38PDLCOf066985 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO cn=); Mon, 25 Sep 2023 15:21:12 +0200 (CEST) (envelope-from frank@harz2023.behrens.de) Message-ID: Date: Mon, 25 Sep 2023 15:21:12 +0200 List-Id: Production branch of FreeBSD source code List-Archive: https://lists.freebsd.org/archives/freebsd-stable List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.15.1 Subject: Re: nvd->nda switch and blocksize changes for ZFS Content-Language: de-DE, en-GB To: stable@freebsd.org References: <1b6190d1-1d42-6c99-bef6-c6b77edd386a@harz2023.behrens.de> <779546e4-1135-c808-372f-e77d347ecf65@aetern.org> From: Frank Behrens Cc: Warner Losh In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spamd-Bar: --- X-Spamd-Result: default: False [-3.40 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-0.999]; R_DKIM_ALLOW(-0.20)[harz2023.behrens.de:s=pinky2]; R_SPF_ALLOW(-0.20)[+mx]; MIME_GOOD(-0.10)[text/plain]; ONCE_RECEIVED(0.10)[]; TO_DN_SOME(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; RCVD_COUNT_ONE(0.00)[1]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_ALL(0.00)[]; ASN(0.00)[asn:8820, ipnet:2a01:170:1000::/36, country:DE]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MLMMJ_DEST(0.00)[stable@freebsd.org]; RCPT_COUNT_TWO(0.00)[2]; FROM_HAS_DN(0.00)[]; ARC_NA(0.00)[]; DMARC_NA(0.00)[behrens.de]; DKIM_TRACE(0.00)[harz2023.behrens.de:+]; MID_RHS_MATCH_FROM(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[] X-Rspamd-Queue-Id: 4RvNmv64Gdz4K6W Am 25.09.2023 um 13:58 schrieb Dimitry Andric: > # nvmecontrol identify nda0 and # nvmecontrol identify nvd0 (after > hw.nvme.use_nvd="1" and reboot) give the same result: >> Number of LBA Formats: 1 >> Current LBA Format: LBA Format #00 >> LBA Format #00: Data Size: 512 Metadata Size: 0 Performance: Best >> ... >> Optimal I/O Boundary: 0 blocks >> NVM Capacity: 1000204886016 bytes >> Preferred Write Granularity: 32 blocks >> Preferred Write Alignment: 8 blocks >> Preferred Deallocate Granul: 9600 blocks >> Preferred Deallocate Align: 9600 blocks >> Optimal Write Size: 256 blocks > My guess is that the "Preferred Write Granularity" is the optimal size, in this case 32 'blocks' of 512 bytes, so 16 kiB. This also matches the stripe size reported by geom, as you showed. > > The "Preferred Write Alignment" is 8 * 512 = 4 kiB, so you should align partitions etc to at least this. However, it cannot hurt to align everything to 16 kiB either, which is an integer multiple of 4 kiB. Eugene gave me a tip, so I looked into the drivers. dev/nvme/nvme_ns.c: nvme_ns_get_stripesize(struct nvme_namespace *ns) {         uint32_t ss;         if (((ns->data.nsfeat >> NVME_NS_DATA_NSFEAT_NPVALID_SHIFT) &             NVME_NS_DATA_NSFEAT_NPVALID_MASK) != 0) {                 ss = nvme_ns_get_sector_size(ns);                 if (ns->data.npwa != 0)                         return ((ns->data.npwa + 1) * ss);                 else if (ns->data.npwg != 0)                         return ((ns->data.npwg + 1) * ss);         }         return (ns->boundary); } cam/nvme/nvme_da.c:         if (((nsd->nsfeat >> NVME_NS_DATA_NSFEAT_NPVALID_SHIFT) &             NVME_NS_DATA_NSFEAT_NPVALID_MASK) != 0 && nsd->npwg != 0)                 disk->d_stripesize = ((nsd->npwg + 1) * disk->d_sectorsize);         else                 disk->d_stripesize = nsd->noiob * disk->d_sectorsize; So it seems, that nvd uses "sectorsize * Write Alignment" as stripesize  while nda uses "sectorsize * Write Granularity". My current interpretation is, that the nvd driver reports the wrong value for maximum performance and reliability. I should make a backup and re-create the pool. Maybe we should note in the 14.0 release notes, that the switch to nda is not a "nop". -- Frank Behrens Osterwieck, Germany