From nobody Thu Apr 13 16:43:31 2023 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Py54c3MrYz45NXK; Thu, 13 Apr 2023 16:43:44 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: from mail-yw1-x1130.google.com (mail-yw1-x1130.google.com [IPv6:2607:f8b0:4864:20::1130]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Py54b5DQqz3vW6; Thu, 13 Apr 2023 16:43:43 +0000 (UTC) (envelope-from fjwcash@gmail.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20221208 header.b=puBahsTb; spf=pass (mx1.freebsd.org: domain of fjwcash@gmail.com designates 2607:f8b0:4864:20::1130 as permitted sender) smtp.mailfrom=fjwcash@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-yw1-x1130.google.com with SMTP id 00721157ae682-54c12009c30so400506777b3.9; Thu, 13 Apr 2023 09:43:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1681404222; x=1683996222; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=n3qu6bzZzuNavcQ7l7EKMS7ut+Xkc+eGIrL5iAOrC6U=; b=puBahsTb/jRUh30TcPePpEnWlVKZ/+188FMqPipO4PhkFlpuE5rZ7COC6VzVGRps7P gV0b5lAQDAz+ILzX8603Wwnlah1+iS4ih40ajwf/X6dWIXCqmuz6vqVNk+wLAA5OPw+D x8DTxQGlWbsLhTi6/Vkbln6BYQ3LoijB2Pjw605v1Q1IQJmJyBt6PPldoKQxUwei4oq0 7w3CM2gfSRgyA/i1s70GKcTT1aL+rC/fuAQMLFxvbdl6iMNX/+y2v6CPMH23A01yVvmW woNno1XMX0D9HLxuCuxe5V9/atlb8pLxRoH0plV7Nju5LIqt8TLbim3MN38j0YHhS2El W95w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681404222; x=1683996222; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=n3qu6bzZzuNavcQ7l7EKMS7ut+Xkc+eGIrL5iAOrC6U=; b=dNDoDJTj37BopyVOXFLVjVgSSEfxMrVtG9eo2DUUiVw1m+iI73eyJ4MDUb9A38kBGJ Qo21zKN0B/ZrHH8UdTTs5EmlvN1WlNGdlB9UAcAD5YNvHANh28UM/SJZvWXVW+LIHqMT h04Ofpp0fBRFTNtWWwAyuT3/k4XtelGIT6XnJxbrbIoxlVMvIr7+rRO9Uzw488KGrV/V HPd+6KaPMkSO27KIk80NJJ5Ohn5NJ/37JOX1dH70eqbRQXoB3gtiMZcCijNfpyQYB0pA l7TU3K6ge0MZdvoykGHsgjlME5qeD3RJw/xDrBOIL2ukGjDycYAGOZ5qQvCSWGnJYcrX 8b2A== X-Gm-Message-State: AAQBX9dBRJgsxa2M9Oc9Vo+dr3C5vZfEKVWwFjQRn86eFIrt40josvB/ pW73ZRdtP7yJ/H7ym86+HsNcoOm5Fvb0ZVM8g/gBNr90L8E= X-Google-Smtp-Source: AKy350YMS14mw2UmkHALhmbiHU9DrSMvyqXwLk+lmfAGqfOhKKQMGK7HOsozUdJnjsySSfU1hnP+upqnmHNO1xuI4ZQ= X-Received: by 2002:a05:690c:b85:b0:54f:6aa5:7c4f with SMTP id ck5-20020a05690c0b8500b0054f6aa57c4fmr5199296ywb.3.1681404222603; Thu, 13 Apr 2023 09:43:42 -0700 (PDT) List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org MIME-Version: 1.0 References: In-Reply-To: From: Freddie Cash Date: Thu, 13 Apr 2023 09:43:31 -0700 Message-ID: Subject: Re: M2 NVME support To: egoitz@ramattack.net, Freebsd fs , Freebsd hackers , freebsd-hardware@freebsd.org Content-Type: multipart/alternative; boundary="00000000000069924a05f93a6d46" X-Spamd-Result: default: False [-4.00 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-0.999]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36:c]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20221208]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; MLMMJ_DEST(0.00)[freebsd-fs@freebsd.org,freebsd-hackers@freebsd.org,freebsd-hardware@freebsd.org]; MIME_TRACE(0.00)[0:+,1:+,2:~]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::1130:from]; FROM_EQ_ENVFROM(0.00)[]; ARC_NA(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; MID_RHS_MATCH_FROMTLD(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; FROM_HAS_DN(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; FREEMAIL_FROM(0.00)[gmail.com]; TO_DN_SOME(0.00)[]; RCVD_TLS_LAST(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim] X-Rspamd-Queue-Id: 4Py54b5DQqz3vW6 X-Spamd-Bar: --- X-ThisMailContainsUnwantedMimeParts: N --00000000000069924a05f93a6d46 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable > > Le jeu. 13 avr. 23 =C3=A0 13:25:36 +0200, egoitz@ramattack.net < > egoitz@ramattack.net> > > We are in the process of buying new hardware for use with FreeBSD and > > ZFS. We are planning whether to buy M2 NVME disks or just SATA SSD disk= s > > (probably Samsung PM* ones). How is you experience with them?. Do you > > recommend one over the another?. Is perhaps better support from some of > > them from a specificic version to newer?. Or do they perhaps work bette= r > > with some specific disk controller?. > There were issues in the past where NVMe drives were "too fast" for ZFS, and various bottlenecks were uncovered. Most (all?) of those have been fixed over the past couple years. These issues were found on pools using all NVMe drives in various configurations for data storage (multiple raidz vdevs; multiple mirror vdevs). This was back when PCIe 3.0 NVMe drives were all the rage, or maybe when PCIe 4.0 drives first started appearing? If you're running a recent release of FreeBSD (13.x) with the newer versions of OpenZFS 2.x, then you shouldn't have any issues using NVMe drives. The hard part will be finding drives with MLC or 3D TLC NAND chips in multiple channels, with a large SLC cache, and lots of RAM onboard using good controllers, in order to get consistent, strong performance during writes. Especially when the drive is near full. Too many drives are moving to QLC NAND, or using DRAM-less controllers (using system RAM as a buffer) in order to reduce the cost. You'll want to do your research into the technology used on the drive before buying any specific drive. SATA SSDs will perform better than hard drives, but will be limited by the SATA bus to around 550 MBps of read/write throughput. NVMe drives will provide multiple GBps of read/write throughput (depending on the drive and PCIe bus version). Finding a motherboard that supports more than 2 M.2 slots will be very hard. If you want more than 2 drives, you'll have to look into PCIe add-in boards with M.2 slots. Really expensive ones will include PCIe switches onboard so they'll work in pretty much any motherboard with spare x16 slots (and maybe x8 slots, with reduced performance?). Less expensive add-in boards require PCIe bifurcation support in the BIOS, and will only work in specific slots on the motherboard. My home ZFS server uses an ASUS motherboard with PCIe bifurcation support, has an ASUS Hyper M.2 expansion card in the second PCIe x16 slot, with 2 WD Blue M.2 SSDs installed (card supports 4 M.2 drives). These are used to create a root pool using a single mirror vdev. /, /usr, and /var are mounted from there. There's 6x hard drives in a separate data pool using multiple mirror vdevs, with /home mounted from there (this pool has been migrated from IDE drives to SATA, from FreeBSD to Linux, and from raidz to mirror vdevs at various points in the past, without losing any data so far; yay ZFS!). At work, all our ZFS servers use 2.5" SATA SSDs for the root pool, and for separate L2ARC/SLOG devices, with 24-90 SATA hard drives for the storage pool. These are all running FreeBSD 13.x. If you want the best performance, and money isn't a restriction, then you'll want to look into servers that have U.2 (or whatever the next-gen small form factor interface name is) slots and backplanes. The drives cost a lot more than regular M.2 SSDs, but provide a lot more performance. Especially in AMD EPYC servers with 128 PCIe lanes to play with. :) --=20 Freddie Cash fjwcash@gmail.com --00000000000069924a05f93a6d46 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Le jeu. 13 avr. 23 =C3=A0 13:25:36 +0200, egoitz@ramattack.net <egoitz@ramattack.ne= t>
> We are in the process of buying new hardware for use with= FreeBSD and
> ZFS. We are planning whether to buy M2 NVME disks or just SATA SSD dis= ks
> (probably Samsung PM* ones). How is you experience with them?. Do you<= br> > recommend one over the another?. Is perhaps better support from some o= f
> them from a specificic version to newer?. Or do they perhaps work bett= er
> with some specific disk controller?.

There were issues in the past where NVMe drives were "too fast"= ; for ZFS, and various bottlenecks were uncovered.=C2=A0 Most (all?) of tho= se have been fixed over the past couple years.=C2=A0 These issues were foun= d on pools using all NVMe drives in various configurations for data storage= (multiple raidz vdevs; multiple mirror vdevs).=C2=A0 This was back when PC= Ie 3.0 NVMe drives were all the rage, or maybe when PCIe 4.0 drives first s= tarted appearing?

If you're running a recent r= elease of FreeBSD (13.x) with the newer versions of OpenZFS 2.x, then you s= houldn't have any issues using NVMe drives.=C2=A0 The hard part will be= finding drives with MLC or 3D TLC NAND chips in multiple channels, with a = large SLC cache, and lots of RAM onboard using good controllers, in order t= o get consistent, strong performance during writes.=C2=A0 Especially when t= he drive is near full.=C2=A0 Too many drives are moving to QLC NAND, or usi= ng DRAM-less controllers (using system RAM as a buffer) in order to reduce = the cost.=C2=A0 You'll want to do your research into the technology use= d on the drive before buying any specific drive.

S= ATA SSDs will perform better than hard drives, but will be limited by the S= ATA bus to around 550 MBps of read/write throughput.=C2=A0 NVMe drives will= provide multiple GBps of read/write throughput (depending on the drive and= PCIe bus version).=C2=A0 Finding a motherboard that supports more than 2 M= .2 slots will be very hard.=C2=A0 If you want more than 2 drives, you'l= l have to look into PCIe add-in boards with M.2 slots.=C2=A0 Really expensi= ve ones will include PCIe switches onboard so they'll work in pretty mu= ch any motherboard with spare x16 slots (and maybe x8 slots, with reduced p= erformance?).=C2=A0 Less expensive add-in boards require PCIe bifurcation s= upport in the BIOS, and will only work in specific slots on the motherboard= .

My home ZFS server uses an ASUS motherboard with= PCIe bifurcation support, has an ASUS Hyper M.2 expansion card in the seco= nd PCIe x16 slot, with 2 WD Blue M.2 SSDs installed (card supports 4 M.2 dr= ives).=C2=A0 These are used to create a root pool using a single mirror vde= v.=C2=A0 /, /usr, and /var are mounted from there.=C2=A0 There's 6x har= d drives in a separate data pool using multiple mirror vdevs, with /home mo= unted from there (this pool has been migrated from IDE drives to SATA, from= FreeBSD to Linux, and from raidz to mirror vdevs at various points in the = past, without losing any data so far; yay ZFS!).

A= t work, all our ZFS servers use 2.5" SATA SSDs for the root pool, and = for separate L2ARC/SLOG devices, with 24-90 SATA hard drives for the storag= e pool.=C2=A0 These are all running FreeBSD 13.x.

= If you want the best performance, and money isn't a restriction, then y= ou'll want to look into servers that have U.2 (or whatever the next-gen= small form factor interface name is) slots and backplanes.=C2=A0 The drive= s=C2=A0cost a lot more than regular M.2 SSDs, but provide a lot more perfor= mance.=C2=A0 Especially in AMD EPYC servers with 128 PCIe lanes to play wit= h.=C2=A0 :)

--
Freddie Cash
= fjwcash@gmail.com
--00000000000069924a05f93a6d46--