Date: Mon, 25 Sep 2023 15:21:12 +0200 From: Frank Behrens <frank@harz2023.behrens.de> To: stable@freebsd.org Cc: Warner Losh <imp@FreeBSD.org> Subject: Re: nvd->nda switch and blocksize changes for ZFS Message-ID: <c71b9a6f-53a8-86ef-6595-3485749a9465@harz2023.behrens.de> In-Reply-To: <E16E9C54-A552-4D86-9E59-71E0C68AC483@FreeBSD.org> References: <1b6190d1-1d42-6c99-bef6-c6b77edd386a@harz2023.behrens.de> <D20AFDEE-45F4-40AF-A401-023E69A5C8A6@FreeBSD.org> <779546e4-1135-c808-372f-e77d347ecf65@aetern.org> <bae9c711-5cc9-7dca-f6aa-445166cc540e@harz2023.behrens.de> <E16E9C54-A552-4D86-9E59-71E0C68AC483@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Am 25.09.2023 um 13:58 schrieb Dimitry Andric: > # nvmecontrol identify nda0 and # nvmecontrol identify nvd0 (after > hw.nvme.use_nvd="1" and reboot) give the same result: >> Number of LBA Formats: 1 >> Current LBA Format: LBA Format #00 >> LBA Format #00: Data Size: 512 Metadata Size: 0 Performance: Best >> ... >> Optimal I/O Boundary: 0 blocks >> NVM Capacity: 1000204886016 bytes >> Preferred Write Granularity: 32 blocks >> Preferred Write Alignment: 8 blocks >> Preferred Deallocate Granul: 9600 blocks >> Preferred Deallocate Align: 9600 blocks >> Optimal Write Size: 256 blocks > My guess is that the "Preferred Write Granularity" is the optimal size, in this case 32 'blocks' of 512 bytes, so 16 kiB. This also matches the stripe size reported by geom, as you showed. > > The "Preferred Write Alignment" is 8 * 512 = 4 kiB, so you should align partitions etc to at least this. However, it cannot hurt to align everything to 16 kiB either, which is an integer multiple of 4 kiB. Eugene gave me a tip, so I looked into the drivers. dev/nvme/nvme_ns.c: nvme_ns_get_stripesize(struct nvme_namespace *ns) { uint32_t ss; if (((ns->data.nsfeat >> NVME_NS_DATA_NSFEAT_NPVALID_SHIFT) & NVME_NS_DATA_NSFEAT_NPVALID_MASK) != 0) { ss = nvme_ns_get_sector_size(ns); if (ns->data.npwa != 0) return ((ns->data.npwa + 1) * ss); else if (ns->data.npwg != 0) return ((ns->data.npwg + 1) * ss); } return (ns->boundary); } cam/nvme/nvme_da.c: if (((nsd->nsfeat >> NVME_NS_DATA_NSFEAT_NPVALID_SHIFT) & NVME_NS_DATA_NSFEAT_NPVALID_MASK) != 0 && nsd->npwg != 0) disk->d_stripesize = ((nsd->npwg + 1) * disk->d_sectorsize); else disk->d_stripesize = nsd->noiob * disk->d_sectorsize; So it seems, that nvd uses "sectorsize * Write Alignment" as stripesize while nda uses "sectorsize * Write Granularity". My current interpretation is, that the nvd driver reports the wrong value for maximum performance and reliability. I should make a backup and re-create the pool. Maybe we should note in the 14.0 release notes, that the switch to nda is not a "nop". -- Frank Behrens Osterwieck, Germany
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?c71b9a6f-53a8-86ef-6595-3485749a9465>