Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 9 Apr 2022 17:15:37 -0600
From:      Warner Losh <imp@bsdimp.com>
To:        Ravi Pokala <rpokala@freebsd.org>
Cc:        Warner Losh <imp@freebsd.org>, src-committers <src-committers@freebsd.org>,  "<dev-commits-src-all@freebsd.org>" <dev-commits-src-all@freebsd.org>, dev-commits-src-main@freebsd.org
Subject:   Re: 214df80a9cb3 - main - nvme: new define for size of host memory buffer sizes
Message-ID:  <CANCZdfq9t=3MUA03H3ANE_VU=BpWJqZgC8hgZZh0Q8FtfPm0Lw@mail.gmail.com>
In-Reply-To: <3BC728DB-39BB-43BC-BECE-720FECB5B20D@panasas.com>
References:  <202204090506.239567Ag038413@gitrepo.freebsd.org> <3BC728DB-39BB-43BC-BECE-720FECB5B20D@panasas.com>

next in thread | previous in thread | raw e-mail | index | archive | help
--00000000000033e8e205dc40e436
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Sat, Apr 9, 2022 at 4:50 PM Ravi Pokala <rpokala@freebsd.org> wrote:

> -----Original Message-----
> From: <owner-src-committers@freebsd.org> on behalf of Warner Losh
> <imp@FreeBSD.org>
> Date: 2022-04-08, Friday at 22:06
> To: <src-committers@FreeBSD.org>, <dev-commits-src-all@FreeBSD.org>,
> <dev-commits-src-main@FreeBSD.org>
> Subject: git: 214df80a9cb3 - main - nvme: new define for size of host
> memory buffer sizes
>
>     The branch main has been updated by imp:
>
>     URL:
> https://cgit.FreeBSD.org/src/commit/?id=3D214df80a9cb3e95a140b13af7d19dee=
c2bbfae76
>
>     commit 214df80a9cb3e95a140b13af7d19deec2bbfae76
>     Author:     Warner Losh <imp@FreeBSD.org>
>     AuthorDate: 2022-04-09 05:01:06 +0000
>     Commit:     Warner Losh <imp@FreeBSD.org>
>     CommitDate: 2022-04-09 05:05:25 +0000
>
>         nvme: new define for size of host memory buffer sizes
>
>         The nvme spec defines the various fields that specify sizes for
> host
>         memory buffers in terms of 4096 chunks. So, rather than use a bar=
e
> 4096
>         here, use NVME_HMB_UNITS. This is explicitly not the host page
> size of
>         4096, nor the default memory page size (mps) of the NVMe drive,
> but its
>         own thing and needs its own define.
>
> Hi Warner,
>
> Are you sure about that?
>

NVMe-1.4, Figure 297: Host Memory Buffer =E2=80=93 Host Memory Buffer Descr=
iptor
> Entry
>
> | Buffer Size (BSIZE): Indicates the number of contiguous
> | memory page size (CC.MPS) units for this descriptor.
> |
> | Buffer Address (BADD): Indicates the host memory address for
> | this descriptor aligned to the memory page size (CC.MPS).
> | The lower bits (n:0) of this field indicate the offset
> | within the memory page is 0h (e.g., if the memory page size
> | is 4 KiB, then bits 11:00 shall be 0h; if the memory page
> | size is 8 KiB, then bits 12:00 shall be 0h).
>
> They both reference mps, not 4096 bytes.
>

So, some I'm 100% sure about. There's one that was previously incorrectly
hard wired to
4k. More below.

>From Table 275 of Rev 2.0:

Host Memory Buffer Preferred Size (HMPRE): This field indicates
the preferred size that the host is requested to allocate for the
Host Memory Buffer feature in 4 KiB units. This value shall be
greater than or equal to the Host Memory Buffer Minimum Size.
If this field is non-zero, then the Host Memory Buffer feature is
supported. If this field is cleared to 0h, then the Host Memory
Buffer feature is not supported.

Host Memory Buffer Minimum Size (HMMIN): This field indicates
the minimum size that the host is requested to allocate for the
Host Memory Buffer feature in 4 KiB units. If this field is cleared
to 0h, then the host is requested to allocate any amount of host
memory possible up to the HMPRE value.

Host Memory Buffer Minimum Descriptor Entry Size (HMMINDS):
This field indicates the minimum usable size of a Host Memory
Buffer Descriptor Entry in 4 KiB units. If this field is cleared to 0h,
then the controller does not indicate any limitations on the Host
Memory Buffer Descriptor Entry size.

These are the hmmin, hmminds and hmpre fields of 'cdata' in the
driver.

    diff --git a/sys/dev/nvme/nvme_ctrlr.c b/sys/dev/nvme/nvme_ctrlr.c
>     index 95a2b5c4285d..6996b3151b0d 100644
>     --- a/sys/dev/nvme/nvme_ctrlr.c
>     +++ b/sys/dev/nvme/nvme_ctrlr.c
>     @@ -936,11 +936,11 @@ nvme_ctrlr_hmb_alloc(struct nvme_controller
> *ctrlr)
>         max =3D (uint64_t)physmem * PAGE_SIZE / 20;
>         TUNABLE_UINT64_FETCH("hw.nvme.hmb_max", &max);
>

max is a percent of available memory...


>     -   min =3D (long long unsigned)ctrlr->cdata.hmmin * 4096;
>     +   min =3D (long long unsigned)ctrlr->cdata.hmmin * NVME_HMB_UNITS;
>         if (max =3D=3D 0 || max < min)
>                 return;
>     -   pref =3D MIN((long long unsigned)ctrlr->cdata.hmpre * 4096, max);
>     -   minc =3D MAX(ctrlr->cdata.hmminds * 4096, PAGE_SIZE);
>     +   pref =3D MIN((long long unsigned)ctrlr->cdata.hmpre *
> NVME_HMB_UNITS, max);
>     +   minc =3D MAX(ctrlr->cdata.hmminds * NVME_HMB_UNITS, PAGE_SIZE);
>

So for all of these, we're good. They are in 4kiB chunks.


>         if (min > 0 && ctrlr->cdata.hmmaxd > 0)
>                 minc =3D MAX(minc, min / ctrlr->cdata.hmmaxd);
>         ctrlr->hmb_chunk =3D pref;
>     @@ -1023,7 +1023,7 @@ again:
>         for (i =3D 0; i < ctrlr->hmb_nchunks; i++) {
>                 ctrlr->hmb_desc_vaddr[i].addr =3D
>                     htole64(ctrlr->hmb_chunks[i].hmbc_paddr);
>     -           ctrlr->hmb_desc_vaddr[i].size =3D htole32(ctrlr->hmb_chun=
k /
> 4096);
>     +           ctrlr->hmb_desc_vaddr[i].size =3D htole32(ctrlr->hmb_chun=
k /
> NVME_HMB_UNITS);
>

This one, you are correct is wrong. I'll fix it when I bring in the changes
to fully support
MPS !=3D 0. For MPS =3D=3D 0, which are the only drives this driver support=
s
correctly, this
is a nop. It was wrong before, though. The text you quoted is correct about
this. There
are a couple of PAGE_SIZEs tinbetween these two chunks hat should be
ctrlr->min_page_size instead (since that's the page size we're really
using, not the
host's). But PAGE_SIZE !=3D 4k doesn't currently work with the nvme driver
due to
confusion like this.

We also currently set the MPS field in the CC incorrectly when it isn't 0.

Good catch. I'll update my pending changes.

Warner


>         }
>         bus_dmamap_sync(ctrlr->hmb_desc_tag, ctrlr->hmb_desc_map,
>             BUS_DMASYNC_PREWRITE);
>
>
>

--00000000000033e8e205dc40e436
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr"><br></div><br><div class=3D"gmail_quote">=
<div dir=3D"ltr" class=3D"gmail_attr">On Sat, Apr 9, 2022 at 4:50 PM Ravi P=
okala &lt;<a href=3D"mailto:rpokala@freebsd.org">rpokala@freebsd.org</a>&gt=
; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px=
 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">-----Or=
iginal Message-----<br>
From: &lt;<a href=3D"mailto:owner-src-committers@freebsd.org" target=3D"_bl=
ank">owner-src-committers@freebsd.org</a>&gt; on behalf of Warner Losh &lt;=
imp@FreeBSD.org&gt;<br>
Date: 2022-04-08, Friday at 22:06<br>
To: &lt;src-committers@FreeBSD.org&gt;, &lt;dev-commits-src-all@FreeBSD.org=
&gt;, &lt;dev-commits-src-main@FreeBSD.org&gt;<br>
Subject: git: 214df80a9cb3 - main - nvme: new define for size of host memor=
y buffer sizes<br>
<br>
=C2=A0 =C2=A0 The branch main has been updated by imp:<br>
<br>
=C2=A0 =C2=A0 URL: <a href=3D"https://cgit.FreeBSD.org/src/commit/?id=3D214=
df80a9cb3e95a140b13af7d19deec2bbfae76" rel=3D"noreferrer" target=3D"_blank"=
>https://cgit.FreeBSD.org/src/commit/?id=3D214df80a9cb3e95a140b13af7d19deec=
2bbfae76</a><br>
<br>
=C2=A0 =C2=A0 commit 214df80a9cb3e95a140b13af7d19deec2bbfae76<br>
=C2=A0 =C2=A0 Author:=C2=A0 =C2=A0 =C2=A0Warner Losh &lt;imp@FreeBSD.org&gt=
;<br>
=C2=A0 =C2=A0 AuthorDate: 2022-04-09 05:01:06 +0000<br>
=C2=A0 =C2=A0 Commit:=C2=A0 =C2=A0 =C2=A0Warner Losh &lt;imp@FreeBSD.org&gt=
;<br>
=C2=A0 =C2=A0 CommitDate: 2022-04-09 05:05:25 +0000<br>
<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 nvme: new define for size of host memory buffer=
 sizes<br>
<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 The nvme spec defines the various fields that s=
pecify sizes for host<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 memory buffers in terms of 4096 chunks. So, rat=
her than use a bare 4096<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 here, use NVME_HMB_UNITS. This is explicitly no=
t the host page size of<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 4096, nor the default memory page size (mps) of=
 the NVMe drive, but its<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 own thing and needs its own define.<br>
<br>
Hi Warner,<br>
<br>
Are you sure about that?<br></blockquote><div><br></div><blockquote class=
=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rg=
b(204,204,204);padding-left:1ex">NVMe-1.4, Figure 297: Host Memory Buffer =
=E2=80=93 Host Memory Buffer Descriptor Entry<br>
<br>
| Buffer Size (BSIZE): Indicates the number of contiguous<br>
| memory page size (CC.MPS) units for this descriptor.<br>
| <br>
| Buffer Address (BADD): Indicates the host memory address for<br>
| this descriptor aligned to the memory page size (CC.MPS). <br>
| The lower bits (n:0) of this field indicate the offset<br>
| within the memory page is 0h (e.g., if the memory page size<br>
| is 4 KiB, then bits 11:00 shall be 0h; if the memory page<br>
| size is 8 KiB, then bits 12:00 shall be 0h).<br>
<br>
They both reference mps, not 4096 bytes.<br></blockquote><div><br></div><di=
v>So, some I&#39;m 100% sure about. There&#39;s one that was previously inc=
orrectly hard wired to</div><div>4k. More below.</div><div><br></div><div>F=
rom Table 275 of Rev 2.0:</div><div><br></div><div>Host Memory Buffer Prefe=
rred Size (HMPRE): This field indicates</div><div>the preferred size that t=
he host is requested to allocate for the</div><div>Host Memory Buffer featu=
re in 4 KiB units. This value shall be</div><div>greater than or equal to t=
he Host Memory Buffer Minimum Size.</div><div>If this field is non-zero, th=
en the Host Memory Buffer feature is</div><div>supported. If this field is =
cleared to 0h, then the Host Memory</div><div>Buffer feature is not support=
ed.<br></div><div><br></div><div>Host Memory Buffer Minimum Size (HMMIN): T=
his field indicates</div><div>the minimum size that the host is requested t=
o allocate for the</div><div>Host Memory Buffer feature in 4 KiB units. If =
this field is cleared</div><div>to 0h, then the host is requested to alloca=
te any amount of host</div><div>memory possible up to the HMPRE value.<br><=
/div><div><br></div><div>Host Memory Buffer Minimum Descriptor Entry Size (=
HMMINDS):</div><div>This field indicates the minimum usable size of a Host =
Memory</div><div>Buffer Descriptor Entry in 4 KiB units. If this field is c=
leared to 0h,</div><div>then the controller does not indicate any limitatio=
ns on the Host</div><div>Memory Buffer Descriptor Entry size.<br></div><div=
><br></div><div>These are the hmmin, hmminds and hmpre fields of &#39;cdata=
&#39; in the</div><div>driver.</div><div><br></div><blockquote class=3D"gma=
il_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,2=
04,204);padding-left:1ex">=C2=A0 =C2=A0 diff --git a/sys/dev/nvme/nvme_ctrl=
r.c b/sys/dev/nvme/nvme_ctrlr.c<br>
=C2=A0 =C2=A0 index 95a2b5c4285d..6996b3151b0d 100644<br>
=C2=A0 =C2=A0 --- a/sys/dev/nvme/nvme_ctrlr.c<br>
=C2=A0 =C2=A0 +++ b/sys/dev/nvme/nvme_ctrlr.c<br>
=C2=A0 =C2=A0 @@ -936,11 +936,11 @@ nvme_ctrlr_hmb_alloc(struct nvme_contro=
ller *ctrlr)<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 max =3D (uint64_t)physmem * PAGE_SIZE / 20;<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 TUNABLE_UINT64_FETCH(&quot;hw.nvme.hmb_max&quot=
;, &amp;max);<br></blockquote><div><br></div><div>max is a percent of avail=
able memory...</div><div>=C2=A0</div><blockquote class=3D"gmail_quote" styl=
e=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);paddin=
g-left:1ex">
=C2=A0 =C2=A0 -=C2=A0 =C2=A0min =3D (long long unsigned)ctrlr-&gt;cdata.hmm=
in * 4096;<br>
=C2=A0 =C2=A0 +=C2=A0 =C2=A0min =3D (long long unsigned)ctrlr-&gt;cdata.hmm=
in * NVME_HMB_UNITS;<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (max =3D=3D 0 || max &lt; min)<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return;<br>
=C2=A0 =C2=A0 -=C2=A0 =C2=A0pref =3D MIN((long long unsigned)ctrlr-&gt;cdat=
a.hmpre * 4096, max);<br>
=C2=A0 =C2=A0 -=C2=A0 =C2=A0minc =3D MAX(ctrlr-&gt;cdata.hmminds * 4096, PA=
GE_SIZE);<br>
=C2=A0 =C2=A0 +=C2=A0 =C2=A0pref =3D MIN((long long unsigned)ctrlr-&gt;cdat=
a.hmpre * NVME_HMB_UNITS, max);<br>
=C2=A0 =C2=A0 +=C2=A0 =C2=A0minc =3D MAX(ctrlr-&gt;cdata.hmminds * NVME_HMB=
_UNITS, PAGE_SIZE);<br></blockquote><div><br></div><div>So for all of these=
, we&#39;re good. They are in 4kiB chunks.</div><div>=C2=A0</div><blockquot=
e class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px s=
olid rgb(204,204,204);padding-left:1ex">
=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (min &gt; 0 &amp;&amp; ctrlr-&gt;cdata.hmmax=
d &gt; 0)<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 minc =3D MAX(minc, =
min / ctrlr-&gt;cdata.hmmaxd);<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 ctrlr-&gt;hmb_chunk =3D pref;<br>
=C2=A0 =C2=A0 @@ -1023,7 +1023,7 @@ again:<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 for (i =3D 0; i &lt; ctrlr-&gt;hmb_nchunks; i++=
) {<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ctrlr-&gt;hmb_desc_=
vaddr[i].addr =3D<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 htole=
64(ctrlr-&gt;hmb_chunks[i].hmbc_paddr);<br>
=C2=A0 =C2=A0 -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0ctrlr-&gt;hmb_desc_=
vaddr[i].size =3D htole32(ctrlr-&gt;hmb_chunk / 4096);<br>
=C2=A0 =C2=A0 +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0ctrlr-&gt;hmb_desc_=
vaddr[i].size =3D htole32(ctrlr-&gt;hmb_chunk / NVME_HMB_UNITS);<br></block=
quote><div><br></div><div>This one, you are correct is wrong. I&#39;ll fix =
it when I bring in the changes to fully support</div><div>MPS !=3D 0. For M=
PS =3D=3D 0, which are the only drives this driver supports correctly, this=
</div><div>is a nop. It was wrong before, though. The text you quoted=C2=A0=
is correct about this. There</div><div>are a couple of PAGE_SIZEs tinbetwee=
n=C2=A0these two chunks hat should be</div><div>ctrlr-&gt;min_page_size ins=
tead (since that&#39;s the page size we&#39;re really using, not the</div><=
div>host&#39;s). But PAGE_SIZE !=3D 4k doesn&#39;t currently work with the =
nvme driver due to</div><div>confusion like this.</div><div><br></div><div>=
We also currently set the MPS field in the CC incorrectly when it isn&#39;t=
 0.</div><div><br></div><div>Good catch. I&#39;ll update my pending changes=
.</div><div><br></div><div>Warner</div><div>=C2=A0</div><blockquote class=
=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rg=
b(204,204,204);padding-left:1ex">
=C2=A0 =C2=A0 =C2=A0 =C2=A0 }<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 bus_dmamap_sync(ctrlr-&gt;hmb_desc_tag, ctrlr-&=
gt;hmb_desc_map,<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 BUS_DMASYNC_PREWRITE);<br>
<br>
<br>
</blockquote></div></div>

--00000000000033e8e205dc40e436--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfq9t=3MUA03H3ANE_VU=BpWJqZgC8hgZZh0Q8FtfPm0Lw>