Date: Thu, 12 Oct 2023 21:45:32 -0600 From: Warner Losh <imp@bsdimp.com> To: Pete Wright <pete@nomadlogic.org> Cc: FreeBSD Current <freebsd-current@freebsd.org> Subject: Re: nvme timeout issues with hardware and bhyve vm's Message-ID: <CANCZdfrQTd3F-j81HsamUCJG4DyUk_-yPOtbZY4Q926_ihatsQ@mail.gmail.com> In-Reply-To: <90d3e532-8ea7-4eea-8e31-8c363285a156@nomadlogic.org> References: <90d3e532-8ea7-4eea-8e31-8c363285a156@nomadlogic.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--0000000000002086d3060790e45c Content-Type: text/plain; charset="UTF-8" What version is that kernel? Warner On Thu, Oct 12, 2023, 9:41 PM Pete Wright <pete@nomadlogic.org> wrote: > hey there - i was curious if anyone has had issues with nvme devices > recently. i'm chasing down similar issues on my workstation which has a > physical NVMe zroot, and on a bhyve VM which has a large pool exposed as > a NVMe device (and is backed by a zvol). > > on the most recent bhyve issue the VM reported this: > > Oct 13 02:52:52 emby kernel: nvme1: RECOVERY_START 13737432416007567 vs > 13737432371683671 > Oct 13 02:52:52 emby kernel: nvme1: RECOVERY_START 13737432718499597 vs > 13737432371683671 > Oct 13 02:52:52 emby kernel: nvme1: timeout with nothing complete, > resetting > Oct 13 02:52:52 emby kernel: nvme1: Resetting controller due to a timeout. > Oct 13 02:52:52 emby kernel: nvme1: RECOVERY_WAITING > Oct 13 02:52:52 emby kernel: nvme1: resetting controller > Oct 13 02:52:53 emby kernel: nvme1: waiting > Oct 13 02:53:23 emby syslogd: last message repeated 114 times > Oct 13 02:53:23 emby kernel: nvme1: controller ready did not become 1 > within 30500 ms > Oct 13 02:53:23 emby kernel: nvme1: failing outstanding i/o > Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:1 cid:119 nsid:1 > lba:4968850592 len:256 > Oct 13 02:53:23 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 > m:0 dnr:1 sqid:1 cid:119 cdw0:0 > Oct 13 02:53:23 emby kernel: nvme1: failing outstanding i/o > Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:6 cid:0 nsid:1 > lba:5241952432 len:32 > Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:3 cid:123 nsid:1 > lba:4968850336 len:256 > Oct 13 02:53:23 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 > m:0 dnr:1 sqid:3 cid:123 cdw0:0 > Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:3 cid:0 nsid:1 > lba:5242495888 len:256 > Oct 13 02:53:23 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 > m:0 dnr:0 sqid:3 cid:0 cdw0:0 > Oct 13 02:53:23 emby kernel: nvme1: READ sqid:3 cid:0 nsid:1 lba:528 len:16 > Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:5 cid:0 nsid:1 > lba:4934226784 len:96 > Oct 13 02:53:23 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 > m:0 dnr:0 sqid:3 cid:0 cdw0:0 > Oct 13 02:53:23 emby kernel: nvme1: READ sqid:3 cid:0 nsid:1 > lba:6442449936 len:16 > Oct 13 02:53:25 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 > m:0 dnr:0 sqid:3 cid:0 cdw0:0 > Oct 13 02:53:25 emby kernel: nvme1: READ sqid:3 cid:0 nsid:1 > lba:6442450448 len:16 > Oct 13 02:53:25 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 > m:0 dnr:0 sqid:3 cid:0 cdw0:0 > Oct 13 02:53:25 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 > m:0 dnr:0 sqid:5 cid:0 cdw0:0 > Oct 13 02:53:25 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 > m:0 dnr:0 sqid:6 cid:0 cdw0:0 > Oct 13 02:53:25 emby kernel: nvd1: detached > > > > I had similar issues on my workstation as well. Scrubbing the NVMe > device on my real-hardware workstation hasn't turned up any issues, but > the system has locked up a handful of times. > > Just curious if others have seen the same, or if someone could point me > in the right direction... > > thanks! > -pete > > -- > Pete Wright > pete@nomadlogic.org > > --0000000000002086d3060790e45c Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"auto"><div>What version is that kernel?</div><div dir=3D"auto">= <br></div><div dir=3D"auto">Warner=C2=A0<br><br><div class=3D"gmail_quote" = dir=3D"auto"><div dir=3D"ltr" class=3D"gmail_attr">On Thu, Oct 12, 2023, 9:= 41 PM Pete Wright <<a href=3D"mailto:pete@nomadlogic.org">pete@nomadlogi= c.org</a>> wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"ma= rgin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">hey there - i = was curious if anyone has had issues with nvme devices <br> recently.=C2=A0 i'm chasing down similar issues on my workstation which= has a <br> physical NVMe zroot, and on a bhyve VM which has a large pool exposed as <b= r> a NVMe device (and is backed by a zvol).<br> <br> on the most recent bhyve issue the VM reported this:<br> <br> Oct 13 02:52:52 emby kernel: nvme1: RECOVERY_START 13737432416007567 vs <br= > 13737432371683671<br> Oct 13 02:52:52 emby kernel: nvme1: RECOVERY_START 13737432718499597 vs <br= > 13737432371683671<br> Oct 13 02:52:52 emby kernel: nvme1: timeout with nothing complete, resettin= g<br> Oct 13 02:52:52 emby kernel: nvme1: Resetting controller due to a timeout.<= br> Oct 13 02:52:52 emby kernel: nvme1: RECOVERY_WAITING<br> Oct 13 02:52:52 emby kernel: nvme1: resetting controller<br> Oct 13 02:52:53 emby kernel: nvme1: waiting<br> Oct 13 02:53:23 emby syslogd: last message repeated 114 times<br> Oct 13 02:53:23 emby kernel: nvme1: controller ready did not become 1 <br> within 30500 ms<br> Oct 13 02:53:23 emby kernel: nvme1: failing outstanding i/o<br> Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:1 cid:119 nsid:1 <br> lba:4968850592 len:256<br> Oct 13 02:53:23 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 <br> m:0 dnr:1 sqid:1 cid:119 cdw0:0<br> Oct 13 02:53:23 emby kernel: nvme1: failing outstanding i/o<br> Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:6 cid:0 nsid:1 <br> lba:5241952432 len:32<br> Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:3 cid:123 nsid:1 <br> lba:4968850336 len:256<br> Oct 13 02:53:23 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 <br> m:0 dnr:1 sqid:3 cid:123 cdw0:0<br> Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:3 cid:0 nsid:1 <br> lba:5242495888 len:256<br> Oct 13 02:53:23 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 <br> m:0 dnr:0 sqid:3 cid:0 cdw0:0<br> Oct 13 02:53:23 emby kernel: nvme1: READ sqid:3 cid:0 nsid:1 lba:528 len:16= <br> Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:5 cid:0 nsid:1 <br> lba:4934226784 len:96<br> Oct 13 02:53:23 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 <br> m:0 dnr:0 sqid:3 cid:0 cdw0:0<br> Oct 13 02:53:23 emby kernel: nvme1: READ sqid:3 cid:0 nsid:1 <br> lba:6442449936 len:16<br> Oct 13 02:53:25 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 <br> m:0 dnr:0 sqid:3 cid:0 cdw0:0<br> Oct 13 02:53:25 emby kernel: nvme1: READ sqid:3 cid:0 nsid:1 <br> lba:6442450448 len:16<br> Oct 13 02:53:25 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 <br> m:0 dnr:0 sqid:3 cid:0 cdw0:0<br> Oct 13 02:53:25 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 <br> m:0 dnr:0 sqid:5 cid:0 cdw0:0<br> Oct 13 02:53:25 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 <br> m:0 dnr:0 sqid:6 cid:0 cdw0:0<br> Oct 13 02:53:25 emby kernel: nvd1: detached<br> <br> <br> <br> I had similar issues on my workstation as well.=C2=A0 Scrubbing the NVMe <b= r> device on my real-hardware workstation hasn't turned up any issues, but= <br> the system has locked up a handful of times.<br> <br> Just curious if others have seen the same, or if someone could point me <br= > in the right direction...<br> <br> thanks!<br> -pete<br> <br> -- <br> Pete Wright<br> <a href=3D"mailto:pete@nomadlogic.org" target=3D"_blank" rel=3D"noreferrer"= >pete@nomadlogic.org</a><br> <br> </blockquote></div></div></div> --0000000000002086d3060790e45c--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfrQTd3F-j81HsamUCJG4DyUk_-yPOtbZY4Q926_ihatsQ>