Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 12 Oct 2023 21:45:32 -0600
From:      Warner Losh <imp@bsdimp.com>
To:        Pete Wright <pete@nomadlogic.org>
Cc:        FreeBSD Current <freebsd-current@freebsd.org>
Subject:   Re: nvme timeout issues with hardware and bhyve vm's
Message-ID:  <CANCZdfrQTd3F-j81HsamUCJG4DyUk_-yPOtbZY4Q926_ihatsQ@mail.gmail.com>
In-Reply-To: <90d3e532-8ea7-4eea-8e31-8c363285a156@nomadlogic.org>
References:  <90d3e532-8ea7-4eea-8e31-8c363285a156@nomadlogic.org>

next in thread | previous in thread | raw e-mail | index | archive | help
--0000000000002086d3060790e45c
Content-Type: text/plain; charset="UTF-8"

What version is that kernel?

Warner

On Thu, Oct 12, 2023, 9:41 PM Pete Wright <pete@nomadlogic.org> wrote:

> hey there - i was curious if anyone has had issues with nvme devices
> recently.  i'm chasing down similar issues on my workstation which has a
> physical NVMe zroot, and on a bhyve VM which has a large pool exposed as
> a NVMe device (and is backed by a zvol).
>
> on the most recent bhyve issue the VM reported this:
>
> Oct 13 02:52:52 emby kernel: nvme1: RECOVERY_START 13737432416007567 vs
> 13737432371683671
> Oct 13 02:52:52 emby kernel: nvme1: RECOVERY_START 13737432718499597 vs
> 13737432371683671
> Oct 13 02:52:52 emby kernel: nvme1: timeout with nothing complete,
> resetting
> Oct 13 02:52:52 emby kernel: nvme1: Resetting controller due to a timeout.
> Oct 13 02:52:52 emby kernel: nvme1: RECOVERY_WAITING
> Oct 13 02:52:52 emby kernel: nvme1: resetting controller
> Oct 13 02:52:53 emby kernel: nvme1: waiting
> Oct 13 02:53:23 emby syslogd: last message repeated 114 times
> Oct 13 02:53:23 emby kernel: nvme1: controller ready did not become 1
> within 30500 ms
> Oct 13 02:53:23 emby kernel: nvme1: failing outstanding i/o
> Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:1 cid:119 nsid:1
> lba:4968850592 len:256
> Oct 13 02:53:23 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0
> m:0 dnr:1 sqid:1 cid:119 cdw0:0
> Oct 13 02:53:23 emby kernel: nvme1: failing outstanding i/o
> Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:6 cid:0 nsid:1
> lba:5241952432 len:32
> Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:3 cid:123 nsid:1
> lba:4968850336 len:256
> Oct 13 02:53:23 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0
> m:0 dnr:1 sqid:3 cid:123 cdw0:0
> Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:3 cid:0 nsid:1
> lba:5242495888 len:256
> Oct 13 02:53:23 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0
> m:0 dnr:0 sqid:3 cid:0 cdw0:0
> Oct 13 02:53:23 emby kernel: nvme1: READ sqid:3 cid:0 nsid:1 lba:528 len:16
> Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:5 cid:0 nsid:1
> lba:4934226784 len:96
> Oct 13 02:53:23 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0
> m:0 dnr:0 sqid:3 cid:0 cdw0:0
> Oct 13 02:53:23 emby kernel: nvme1: READ sqid:3 cid:0 nsid:1
> lba:6442449936 len:16
> Oct 13 02:53:25 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0
> m:0 dnr:0 sqid:3 cid:0 cdw0:0
> Oct 13 02:53:25 emby kernel: nvme1: READ sqid:3 cid:0 nsid:1
> lba:6442450448 len:16
> Oct 13 02:53:25 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0
> m:0 dnr:0 sqid:3 cid:0 cdw0:0
> Oct 13 02:53:25 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0
> m:0 dnr:0 sqid:5 cid:0 cdw0:0
> Oct 13 02:53:25 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0
> m:0 dnr:0 sqid:6 cid:0 cdw0:0
> Oct 13 02:53:25 emby kernel: nvd1: detached
>
>
>
> I had similar issues on my workstation as well.  Scrubbing the NVMe
> device on my real-hardware workstation hasn't turned up any issues, but
> the system has locked up a handful of times.
>
> Just curious if others have seen the same, or if someone could point me
> in the right direction...
>
> thanks!
> -pete
>
> --
> Pete Wright
> pete@nomadlogic.org
>
>

--0000000000002086d3060790e45c
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"auto"><div>What version is that kernel?</div><div dir=3D"auto">=
<br></div><div dir=3D"auto">Warner=C2=A0<br><br><div class=3D"gmail_quote" =
dir=3D"auto"><div dir=3D"ltr" class=3D"gmail_attr">On Thu, Oct 12, 2023, 9:=
41 PM Pete Wright &lt;<a href=3D"mailto:pete@nomadlogic.org">pete@nomadlogi=
c.org</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"ma=
rgin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">hey there - i =
was curious if anyone has had issues with nvme devices <br>
recently.=C2=A0 i&#39;m chasing down similar issues on my workstation which=
 has a <br>
physical NVMe zroot, and on a bhyve VM which has a large pool exposed as <b=
r>
a NVMe device (and is backed by a zvol).<br>
<br>
on the most recent bhyve issue the VM reported this:<br>
<br>
Oct 13 02:52:52 emby kernel: nvme1: RECOVERY_START 13737432416007567 vs <br=
>
13737432371683671<br>
Oct 13 02:52:52 emby kernel: nvme1: RECOVERY_START 13737432718499597 vs <br=
>
13737432371683671<br>
Oct 13 02:52:52 emby kernel: nvme1: timeout with nothing complete, resettin=
g<br>
Oct 13 02:52:52 emby kernel: nvme1: Resetting controller due to a timeout.<=
br>
Oct 13 02:52:52 emby kernel: nvme1: RECOVERY_WAITING<br>
Oct 13 02:52:52 emby kernel: nvme1: resetting controller<br>
Oct 13 02:52:53 emby kernel: nvme1: waiting<br>
Oct 13 02:53:23 emby syslogd: last message repeated 114 times<br>
Oct 13 02:53:23 emby kernel: nvme1: controller ready did not become 1 <br>
within 30500 ms<br>
Oct 13 02:53:23 emby kernel: nvme1: failing outstanding i/o<br>
Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:1 cid:119 nsid:1 <br>
lba:4968850592 len:256<br>
Oct 13 02:53:23 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 <br>
m:0 dnr:1 sqid:1 cid:119 cdw0:0<br>
Oct 13 02:53:23 emby kernel: nvme1: failing outstanding i/o<br>
Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:6 cid:0 nsid:1 <br>
lba:5241952432 len:32<br>
Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:3 cid:123 nsid:1 <br>
lba:4968850336 len:256<br>
Oct 13 02:53:23 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 <br>
m:0 dnr:1 sqid:3 cid:123 cdw0:0<br>
Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:3 cid:0 nsid:1 <br>
lba:5242495888 len:256<br>
Oct 13 02:53:23 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 <br>
m:0 dnr:0 sqid:3 cid:0 cdw0:0<br>
Oct 13 02:53:23 emby kernel: nvme1: READ sqid:3 cid:0 nsid:1 lba:528 len:16=
<br>
Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:5 cid:0 nsid:1 <br>
lba:4934226784 len:96<br>
Oct 13 02:53:23 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 <br>
m:0 dnr:0 sqid:3 cid:0 cdw0:0<br>
Oct 13 02:53:23 emby kernel: nvme1: READ sqid:3 cid:0 nsid:1 <br>
lba:6442449936 len:16<br>
Oct 13 02:53:25 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 <br>
m:0 dnr:0 sqid:3 cid:0 cdw0:0<br>
Oct 13 02:53:25 emby kernel: nvme1: READ sqid:3 cid:0 nsid:1 <br>
lba:6442450448 len:16<br>
Oct 13 02:53:25 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 <br>
m:0 dnr:0 sqid:3 cid:0 cdw0:0<br>
Oct 13 02:53:25 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 <br>
m:0 dnr:0 sqid:5 cid:0 cdw0:0<br>
Oct 13 02:53:25 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 <br>
m:0 dnr:0 sqid:6 cid:0 cdw0:0<br>
Oct 13 02:53:25 emby kernel: nvd1: detached<br>
<br>
<br>
<br>
I had similar issues on my workstation as well.=C2=A0 Scrubbing the NVMe <b=
r>
device on my real-hardware workstation hasn&#39;t turned up any issues, but=
 <br>
the system has locked up a handful of times.<br>
<br>
Just curious if others have seen the same, or if someone could point me <br=
>
in the right direction...<br>
<br>
thanks!<br>
-pete<br>
<br>
-- <br>
Pete Wright<br>
<a href=3D"mailto:pete@nomadlogic.org" target=3D"_blank" rel=3D"noreferrer"=
>pete@nomadlogic.org</a><br>
<br>
</blockquote></div></div></div>

--0000000000002086d3060790e45c--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfrQTd3F-j81HsamUCJG4DyUk_-yPOtbZY4Q926_ihatsQ>