FreeBSD Mail Archives

Date:      Wed, 25 May 2022 09:43:11 -0600
From:      Warner Losh <imp@bsdimp.com>
To:        matti k <mattik@gwsit.com.au>
Cc:        Alexander Motin <mav@freebsd.org>, Matteo Riondato <matteo@freebsd.org>,  FreeBSD Current <freebsd-current@freebsd.org>, Jim Harris <jimharris@freebsd.org>
Subject:   Re: nvme INVALID_FIELD in dmesg.boot
Message-ID:  <CANCZdfoJbHxUfgejeSPCSw57VPnA924vvhDYuw3Ta8BQAeCHYA@mail.gmail.com>
In-Reply-To: <CANCZdfrYP-Wz7a-%2B_WEKbT=Yb=mrk0YYifDkzekV6H2q865sDg@mail.gmail.com>
References:  <20220525122529.t2kwfg2q65dfiyyt@host-ubertino-mac-88e9fe7361f5.eduroam.ssid.10net.amherst.edu> <d8462935-2874-2e5c-a7aa-d5352bd0a3c2@FreeBSD.org> <20220526001715.4ffee96a@ws1.wobblyboot.net> <CANCZdfrYP-Wz7a-%2B_WEKbT=Yb=mrk0YYifDkzekV6H2q865sDg@mail.gmail.com>

index | next in thread | previous in thread | raw e-mail

[-- Attachment #1 --]
Here's a patch that might fix it

diff --git a/sys/dev/nvme/nvme_ctrlr.c b/sys/dev/nvme/nvme_ctrlr.c
index 2c5d521ecaa1..72c511de3be8 100644
--- a/sys/dev/nvme/nvme_ctrlr.c
+++ b/sys/dev/nvme/nvme_ctrlr.c
@@ -854,8 +854,9 @@ nvme_ctrlr_configure_aer(struct nvme_controller *ctrlr)
            NVME_CRIT_WARN_ST_READ_ONLY |
            NVME_CRIT_WARN_ST_VOLATILE_MEMORY_BACKUP;
        if (ctrlr->cdata.ver >= NVME_REV(1, 2))
-               ctrlr->async_event_config |= NVME_ASYNC_EVENT_NS_ATTRIBUTE |
-                   NVME_ASYNC_EVENT_FW_ACTIVATE;
+               ctrlr->async_event_config |=
+                   ctrlr->cdata.oaes & (NVME_ASYNC_EVENT_NS_ATTRIBUTE |
+                       NVME_ASYNC_EVENT_FW_ACTIVATE);

        status.done = 0;
        nvme_ctrlr_cmd_get_feature(ctrlr, NVME_FEAT_TEMPERATURE_THRESHOLD,

Warner

On Wed, May 25, 2022 at 9:29 AM Warner Losh <imp@bsdimp.com> wrote:

>
>
> On Wed, May 25, 2022 at 8:18 AM matti k <mattik@gwsit.com.au> wrote:
>
>> On Wed, 25 May 2022 09:58:54 -0400
>> Alexander Motin <mav@FreeBSD.org> wrote:
>>
>> > On 25.05.2022 08:25, Matteo Riondato wrote:
>> > > My dmesg.boot contains the following entries containing
>> > > "INVALID_FIELD" about nvme (I use nda(4) for my nvme disks, with
>> > > hw.nvme.use_nvd=0 in loader.conf):
>> > >
>> > > trismegistus ~ % grep -e 'nvme[0-9]\?' /var/run/dmesg.boot
>> > > nvme0: <Intel DC PC4500> mem 0xb8610000-0xb8613fff irq 40 at device
>> > > 0.0 numa-domain 0 on pci7
>> > > nvme1: <Intel DC PC4500> mem 0xb8510000-0xb8513fff irq 47 at device
>> > > 0.0 numa-domain 0 on pci8
>> > > nvme2: <Intel DC PC4500> mem 0xc5e10000-0xc5e13fff irq 48 at device
>> > > 0.0 numa-domain 0 on pci10
>> > > nvme3: <Intel DC PC4500> mem 0xc5d10000-0xc5d13fff irq 55 at device
>> > > 0.0 numa-domain 0 on pci11
>> > > nvme0: SET FEATURES (09) sqid:0 cid:15 nsid:0 cdw10:0000000b
>> > > cdw11:0000031f nvme0: INVALID_FIELD (00/02) sqid:0 cid:15 cdw0:0
>> > > nvme1: SET FEATURES (09) sqid:0 cid:15 nsid:0 cdw10:0000000b
>> > > cdw11:0000031f nvme1: INVALID_FIELD (00/02) sqid:0 cid:15 cdw0:0
>> > > nvme2: SET FEATURES (09) sqid:0 cid:15 nsid:0 cdw10:0000000b
>> > > cdw11:0000031f nvme2: INVALID_FIELD (00/02) sqid:0 cid:15 cdw0:0
>> > > nvme3: SET FEATURES (09) sqid:0 cid:15 nsid:0 cdw10:0000000b
>> > > cdw11:0000031f nvme3: INVALID_FIELD (00/02) sqid:0 cid:15 cdw0:0
>> > > nda0 at nvme0 bus 0 scbus16 target 0 lun 1
>> > > nda0: nvme version 1.2 x4 (max x4) lanes PCIe Gen3 (max Gen3) link
>> > > nda1 at nvme1 bus 0 scbus17 target 0 lun 1
>> > > nda1: nvme version 1.2 x4 (max x4) lanes PCIe Gen3 (max Gen3) link
>> > > nda2 at nvme2 bus 0 scbus18 target 0 lun 1
>> > > nda2: nvme version 1.2 x4 (max x4) lanes PCIe Gen3 (max Gen3) link
>> > > nda3 at nvme3 bus 0 scbus19 target 0 lun 1
>> > > nda3: nvme version 1.2 x4 (max x4) lanes PCIe Gen3 (max Gen3) link
>> > >
>> > > The disks seem to work fine, from what I can tell.
>> > >
>> > > Are the "INVALID_FIELD" messages harmless, or can they be avoided
>> > > with some tuning, or maybe with some patch?
>> >
>> > Those messages mean that driver tried to enable certain types of
>> > asynchronous events, but probably the hardware does not support some
>> > of those.  If you wish to experiment we could try to mask some of the
>> > bits in nvme_ctrlr_configure_aer() function to find out which one
>> > exactly, but for discontinued drives 4-5 years old it might not have
>> > too much sense.  It should not be critical unless you either overheat
>> > them, or somehow else they fail and wish to report it.
>> >
>>
>> I am intrigued to how you guru's know this, is it  because you know
>> the code well enough?
>>
>
> SET FEATURES (opcode 9) feature 0xb is indeed async event configuration.
> 0x31f is:
> SMART WARNING for available spares (0x1)
> SMART warning for temperature (0x2)
> SMART WARNING for device reliability (0x4)
> SMART WARNING for being read only (0x8)
> SMART WARNING for volatile memory backup (0x10)
> Namespace attribute change events (0x100)
> Firmware activation events (0x200)
>
> I wonder which one of those it doesn't like. My reading of the standard
> suggests that those
> should always be supported for a 1.2 and later drive... Thought maybe with
> the possible
> exception of the volatile memory backup, so let me do some digging here...
>
> We can get the last two items from OAES field of the controller
> identificaiton data. This is bytes 95:92,
> which if I'm counting right is the last word on the 040: line in the
> nvmecontrol identify -x nvmeX command:
>
> 040: 4e474e4b 30303150 000cca07 00230000 00010200 005b8d80 0030d400
> 00000100
>
> ----------------------------------------------------------------------------------------------------------^^^^^^^^^
>
> It looks like we don't currently test these bits before we add the last
> two (we do it unconditionally
> for >= 1.2, and maybe we should check these bits >= 1.2).
>
> Would you be able to test a fix for this?
>
> Warner
>

[-- Attachment #2 --]
<div dir="ltr">Here&#39;s a patch that might fix it<div><br></div><div>diff --git a/sys/dev/nvme/nvme_ctrlr.c b/sys/dev/nvme/nvme_ctrlr.c<br>index 2c5d521ecaa1..72c511de3be8 100644<br>--- a/sys/dev/nvme/nvme_ctrlr.c<br>+++ b/sys/dev/nvme/nvme_ctrlr.c<br>@@ -854,8 +854,9 @@ nvme_ctrlr_configure_aer(struct nvme_controller *ctrlr)<br>            NVME_CRIT_WARN_ST_READ_ONLY |<br>            NVME_CRIT_WARN_ST_VOLATILE_MEMORY_BACKUP;<br>        if (ctrlr-&gt;cdata.ver &gt;= NVME_REV(1, 2))<br>-               ctrlr-&gt;async_event_config |= NVME_ASYNC_EVENT_NS_ATTRIBUTE |<br>-                   NVME_ASYNC_EVENT_FW_ACTIVATE;<br>+               ctrlr-&gt;async_event_config |=<br>+                   ctrlr-&gt;cdata.oaes &amp; (NVME_ASYNC_EVENT_NS_ATTRIBUTE |<br>+                       NVME_ASYNC_EVENT_FW_ACTIVATE);<br><br>        status.done = 0;<br>        nvme_ctrlr_cmd_get_feature(ctrlr, NVME_FEAT_TEMPERATURE_THRESHOLD,<br></div><div><br></div><div>Warner</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, May 25, 2022 at 9:29 AM Warner Losh &lt;<a href="mailto:imp@bsdimp.com">imp@bsdimp.com</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, May 25, 2022 at 8:18 AM matti k &lt;<a href="mailto:mattik@gwsit.com.au" target="_blank">mattik@gwsit.com.au</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Wed, 25 May 2022 09:58:54 -0400<br>
Alexander Motin &lt;mav@FreeBSD.org&gt; wrote:<br>
<br>
&gt; On 25.05.2022 08:25, Matteo Riondato wrote:<br>
&gt; &gt; My dmesg.boot contains the following entries containing<br>
&gt; &gt; &quot;INVALID_FIELD&quot; about nvme (I use nda(4) for my nvme disks, with<br>
&gt; &gt; hw.nvme.use_nvd=0 in loader.conf):<br>
&gt; &gt; <br>
&gt; &gt; trismegistus ~ % grep -e &#39;nvme[0-9]\?&#39; /var/run/dmesg.boot<br>
&gt; &gt; nvme0: &lt;Intel DC PC4500&gt; mem 0xb8610000-0xb8613fff irq 40 at device<br>
&gt; &gt; 0.0 numa-domain 0 on pci7<br>
&gt; &gt; nvme1: &lt;Intel DC PC4500&gt; mem 0xb8510000-0xb8513fff irq 47 at device<br>
&gt; &gt; 0.0 numa-domain 0 on pci8<br>
&gt; &gt; nvme2: &lt;Intel DC PC4500&gt; mem 0xc5e10000-0xc5e13fff irq 48 at device<br>
&gt; &gt; 0.0 numa-domain 0 on pci10<br>
&gt; &gt; nvme3: &lt;Intel DC PC4500&gt; mem 0xc5d10000-0xc5d13fff irq 55 at device<br>
&gt; &gt; 0.0 numa-domain 0 on pci11<br>
&gt; &gt; nvme0: SET FEATURES (09) sqid:0 cid:15 nsid:0 cdw10:0000000b<br>
&gt; &gt; cdw11:0000031f nvme0: INVALID_FIELD (00/02) sqid:0 cid:15 cdw0:0<br>
&gt; &gt; nvme1: SET FEATURES (09) sqid:0 cid:15 nsid:0 cdw10:0000000b<br>
&gt; &gt; cdw11:0000031f nvme1: INVALID_FIELD (00/02) sqid:0 cid:15 cdw0:0<br>
&gt; &gt; nvme2: SET FEATURES (09) sqid:0 cid:15 nsid:0 cdw10:0000000b<br>
&gt; &gt; cdw11:0000031f nvme2: INVALID_FIELD (00/02) sqid:0 cid:15 cdw0:0<br>
&gt; &gt; nvme3: SET FEATURES (09) sqid:0 cid:15 nsid:0 cdw10:0000000b<br>
&gt; &gt; cdw11:0000031f nvme3: INVALID_FIELD (00/02) sqid:0 cid:15 cdw0:0<br>
&gt; &gt; nda0 at nvme0 bus 0 scbus16 target 0 lun 1<br>
&gt; &gt; nda0: nvme version 1.2 x4 (max x4) lanes PCIe Gen3 (max Gen3) link<br>
&gt; &gt; nda1 at nvme1 bus 0 scbus17 target 0 lun 1<br>
&gt; &gt; nda1: nvme version 1.2 x4 (max x4) lanes PCIe Gen3 (max Gen3) link<br>
&gt; &gt; nda2 at nvme2 bus 0 scbus18 target 0 lun 1<br>
&gt; &gt; nda2: nvme version 1.2 x4 (max x4) lanes PCIe Gen3 (max Gen3) link<br>
&gt; &gt; nda3 at nvme3 bus 0 scbus19 target 0 lun 1<br>
&gt; &gt; nda3: nvme version 1.2 x4 (max x4) lanes PCIe Gen3 (max Gen3) link<br>
&gt; &gt; <br>
&gt; &gt; The disks seem to work fine, from what I can tell.<br>
&gt; &gt; <br>
&gt; &gt; Are the &quot;INVALID_FIELD&quot; messages harmless, or can they be avoided<br>
&gt; &gt; with some tuning, or maybe with some patch?<br>
&gt; <br>
&gt; Those messages mean that driver tried to enable certain types of <br>
&gt; asynchronous events, but probably the hardware does not support some<br>
&gt; of those.  If you wish to experiment we could try to mask some of the<br>
&gt; bits in nvme_ctrlr_configure_aer() function to find out which one<br>
&gt; exactly, but for discontinued drives 4-5 years old it might not have<br>
&gt; too much sense.  It should not be critical unless you either overheat<br>
&gt; them, or somehow else they fail and wish to report it.<br>
&gt; <br>
<br>
I am intrigued to how you guru&#39;s know this, is it  because you know<br>
the code well enough?<br></blockquote><div><br></div><div>SET FEATURES (opcode 9) feature 0xb is indeed async event configuration.</div><div>0x31f is:<br></div><div>SMART WARNING for available spares (0x1)</div><div>SMART warning for temperature (0x2)</div><div>SMART WARNING for device reliability (0x4)</div><div>SMART WARNING for being read only (0x8)</div><div>SMART WARNING for volatile memory backup (0x10)</div><div>Namespace attribute change events (0x100)</div><div>Firmware activation events (0x200)</div><div><br></div><div>I wonder which one of those it doesn&#39;t like. My reading of the standard suggests that those</div><div>should always be supported for a 1.2 and later drive... Thought maybe with the possible</div><div>exception of the volatile memory backup, so let me do some digging here...</div><div><br></div><div>We can get the last two items from OAES field of the controller identificaiton data. This is bytes 95:92,</div><div>which if I&#39;m counting right is the last word on the 040: line in the nvmecontrol identify -x nvmeX command:</div><div><br></div><div>040: 4e474e4b 30303150 000cca07 00230000 00010200 005b8d80 0030d400 00000100<br></div><div>----------------------------------------------------------------------------------------------------------^^^^^^^^^</div><div><br></div><div>It looks like we don&#39;t currently test these bits before we add the last two (we do it unconditionally</div><div>for &gt;= 1.2, and maybe we should check these bits &gt;= 1.2).</div><div><br></div><div>Would you be able to test a fix for this?</div><div><br></div><div>Warner<br></div></div></div>
</blockquote></div>

home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfoJbHxUfgejeSPCSw57VPnA924vvhDYuw3Ta8BQAeCHYA>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation