From nobody Fri Oct 13 03:45:32 2023 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4S6C8Y2cH8z4wMcM for ; Fri, 13 Oct 2023 03:45:49 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-ej1-x62d.google.com (mail-ej1-x62d.google.com [IPv6:2a00:1450:4864:20::62d]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4S6C8Y0bVtz3Gqr for ; Fri, 13 Oct 2023 03:45:49 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-ej1-x62d.google.com with SMTP id a640c23a62f3a-99bdeae1d0aso269820166b.1 for ; Thu, 12 Oct 2023 20:45:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20230601.gappssmtp.com; s=20230601; t=1697168745; x=1697773545; darn=freebsd.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=ONgu5ezs6e/tZk7MKHoLTAN985ZOfaUowBpa2nVDm20=; b=cZekgxQStSyAKRsuFCtsUhUhMbM8ynOmenNAhejDLUIZs/EeQ99TQbnMZl7W0P7yh/ umA+zATs2VxsDHJiwum/H3EI+hSuCDpq0Ef5ocpyESRuHhGgqMBWd0/d7cW4P5I+HDEi u7onxi8srFSWLzIFWzb9DnSsEmppRSSAlsc6aCow19JEJ+cdIdQQcmqPGshBnON4FGqE pr+eU5wX/7WvN5JNy6sWgydZy26vtSTbMwSaCb4mT3f5CZt9eQ2RXs65xsuXyHZoVAum 7MXmYF7ok3AQnyFgG2QukRGCfHaIzvjAHKU8g0y7PDX2kpKTAZYQbqq1m7r4ldHj2eY5 kR5A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697168745; x=1697773545; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ONgu5ezs6e/tZk7MKHoLTAN985ZOfaUowBpa2nVDm20=; b=Q6RsGBOJyKDWdRmJPnKW2U5cNDjHrSqpI2GU5W7HbxaqZgERDMxynurWSJRFP5ApOl C/8pHd9ar7weu5xkYLfodwejwUKba78FnHPpGySCzi6QhXIQhfQ3nVdRlRjDdc9bhboZ D9PfiTNJ6eNf9jbqjvNIVnnNuSslO16mDFwNLHF3jm93YyZ+Lk+VxnDFDkSekXbqDp1a npyvCcfrZqdSh+jsf1fFFnee9B3DVaJ9kwS1hG3Jfy0cVnRH0aRM0Aj/6Bhb8BXra4B2 f0mnPC4Jj7wC+qTpiRHdLQGxAQelPfMj4IHzt+nqTQv6cIi1gKzsrmr4VN6jTslZ2Pit qhqQ== X-Gm-Message-State: AOJu0YxUK7DG1iq4AfLra73eMm0dS7NtXSaU5ZBbhn6TKI/8fzycLoD5 v4oQvIOKsw9SOjAoC6gAqMeU++E5hUeY+fWFu0IV4g== X-Google-Smtp-Source: AGHT+IFUBHIpSQBmKdUU/5sd9u8dDC3LDB6+pG193OT2ZyxVt311Y2zPFiubwPVouT4LqDMemwpWiwH0GrUhNrOpMks= X-Received: by 2002:a17:906:304c:b0:9b2:b768:3cfc with SMTP id d12-20020a170906304c00b009b2b7683cfcmr21765885ejd.1.1697168744279; Thu, 12 Oct 2023 20:45:44 -0700 (PDT) List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 References: <90d3e532-8ea7-4eea-8e31-8c363285a156@nomadlogic.org> In-Reply-To: <90d3e532-8ea7-4eea-8e31-8c363285a156@nomadlogic.org> From: Warner Losh Date: Thu, 12 Oct 2023 21:45:32 -0600 Message-ID: Subject: Re: nvme timeout issues with hardware and bhyve vm's To: Pete Wright Cc: FreeBSD Current Content-Type: multipart/alternative; boundary="0000000000002086d3060790e45c" X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US] X-Rspamd-Queue-Id: 4S6C8Y0bVtz3Gqr --0000000000002086d3060790e45c Content-Type: text/plain; charset="UTF-8" What version is that kernel? Warner On Thu, Oct 12, 2023, 9:41 PM Pete Wright wrote: > hey there - i was curious if anyone has had issues with nvme devices > recently. i'm chasing down similar issues on my workstation which has a > physical NVMe zroot, and on a bhyve VM which has a large pool exposed as > a NVMe device (and is backed by a zvol). > > on the most recent bhyve issue the VM reported this: > > Oct 13 02:52:52 emby kernel: nvme1: RECOVERY_START 13737432416007567 vs > 13737432371683671 > Oct 13 02:52:52 emby kernel: nvme1: RECOVERY_START 13737432718499597 vs > 13737432371683671 > Oct 13 02:52:52 emby kernel: nvme1: timeout with nothing complete, > resetting > Oct 13 02:52:52 emby kernel: nvme1: Resetting controller due to a timeout. > Oct 13 02:52:52 emby kernel: nvme1: RECOVERY_WAITING > Oct 13 02:52:52 emby kernel: nvme1: resetting controller > Oct 13 02:52:53 emby kernel: nvme1: waiting > Oct 13 02:53:23 emby syslogd: last message repeated 114 times > Oct 13 02:53:23 emby kernel: nvme1: controller ready did not become 1 > within 30500 ms > Oct 13 02:53:23 emby kernel: nvme1: failing outstanding i/o > Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:1 cid:119 nsid:1 > lba:4968850592 len:256 > Oct 13 02:53:23 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 > m:0 dnr:1 sqid:1 cid:119 cdw0:0 > Oct 13 02:53:23 emby kernel: nvme1: failing outstanding i/o > Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:6 cid:0 nsid:1 > lba:5241952432 len:32 > Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:3 cid:123 nsid:1 > lba:4968850336 len:256 > Oct 13 02:53:23 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 > m:0 dnr:1 sqid:3 cid:123 cdw0:0 > Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:3 cid:0 nsid:1 > lba:5242495888 len:256 > Oct 13 02:53:23 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 > m:0 dnr:0 sqid:3 cid:0 cdw0:0 > Oct 13 02:53:23 emby kernel: nvme1: READ sqid:3 cid:0 nsid:1 lba:528 len:16 > Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:5 cid:0 nsid:1 > lba:4934226784 len:96 > Oct 13 02:53:23 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 > m:0 dnr:0 sqid:3 cid:0 cdw0:0 > Oct 13 02:53:23 emby kernel: nvme1: READ sqid:3 cid:0 nsid:1 > lba:6442449936 len:16 > Oct 13 02:53:25 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 > m:0 dnr:0 sqid:3 cid:0 cdw0:0 > Oct 13 02:53:25 emby kernel: nvme1: READ sqid:3 cid:0 nsid:1 > lba:6442450448 len:16 > Oct 13 02:53:25 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 > m:0 dnr:0 sqid:3 cid:0 cdw0:0 > Oct 13 02:53:25 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 > m:0 dnr:0 sqid:5 cid:0 cdw0:0 > Oct 13 02:53:25 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 > m:0 dnr:0 sqid:6 cid:0 cdw0:0 > Oct 13 02:53:25 emby kernel: nvd1: detached > > > > I had similar issues on my workstation as well. Scrubbing the NVMe > device on my real-hardware workstation hasn't turned up any issues, but > the system has locked up a handful of times. > > Just curious if others have seen the same, or if someone could point me > in the right direction... > > thanks! > -pete > > -- > Pete Wright > pete@nomadlogic.org > > --0000000000002086d3060790e45c Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
What version is that kernel?
=
Warner=C2=A0

On Thu, Oct 12, 2023, 9:= 41 PM Pete Wright <pete@nomadlogi= c.org> wrote:
hey there - i = was curious if anyone has had issues with nvme devices
recently.=C2=A0 i'm chasing down similar issues on my workstation which= has a
physical NVMe zroot, and on a bhyve VM which has a large pool exposed as a NVMe device (and is backed by a zvol).

on the most recent bhyve issue the VM reported this:

Oct 13 02:52:52 emby kernel: nvme1: RECOVERY_START 13737432416007567 vs 13737432371683671
Oct 13 02:52:52 emby kernel: nvme1: RECOVERY_START 13737432718499597 vs 13737432371683671
Oct 13 02:52:52 emby kernel: nvme1: timeout with nothing complete, resettin= g
Oct 13 02:52:52 emby kernel: nvme1: Resetting controller due to a timeout.<= br> Oct 13 02:52:52 emby kernel: nvme1: RECOVERY_WAITING
Oct 13 02:52:52 emby kernel: nvme1: resetting controller
Oct 13 02:52:53 emby kernel: nvme1: waiting
Oct 13 02:53:23 emby syslogd: last message repeated 114 times
Oct 13 02:53:23 emby kernel: nvme1: controller ready did not become 1
within 30500 ms
Oct 13 02:53:23 emby kernel: nvme1: failing outstanding i/o
Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:1 cid:119 nsid:1
lba:4968850592 len:256
Oct 13 02:53:23 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0
m:0 dnr:1 sqid:1 cid:119 cdw0:0
Oct 13 02:53:23 emby kernel: nvme1: failing outstanding i/o
Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:6 cid:0 nsid:1
lba:5241952432 len:32
Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:3 cid:123 nsid:1
lba:4968850336 len:256
Oct 13 02:53:23 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0
m:0 dnr:1 sqid:3 cid:123 cdw0:0
Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:3 cid:0 nsid:1
lba:5242495888 len:256
Oct 13 02:53:23 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0
m:0 dnr:0 sqid:3 cid:0 cdw0:0
Oct 13 02:53:23 emby kernel: nvme1: READ sqid:3 cid:0 nsid:1 lba:528 len:16=
Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:5 cid:0 nsid:1
lba:4934226784 len:96
Oct 13 02:53:23 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0
m:0 dnr:0 sqid:3 cid:0 cdw0:0
Oct 13 02:53:23 emby kernel: nvme1: READ sqid:3 cid:0 nsid:1
lba:6442449936 len:16
Oct 13 02:53:25 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0
m:0 dnr:0 sqid:3 cid:0 cdw0:0
Oct 13 02:53:25 emby kernel: nvme1: READ sqid:3 cid:0 nsid:1
lba:6442450448 len:16
Oct 13 02:53:25 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0
m:0 dnr:0 sqid:3 cid:0 cdw0:0
Oct 13 02:53:25 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0
m:0 dnr:0 sqid:5 cid:0 cdw0:0
Oct 13 02:53:25 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0
m:0 dnr:0 sqid:6 cid:0 cdw0:0
Oct 13 02:53:25 emby kernel: nvd1: detached



I had similar issues on my workstation as well.=C2=A0 Scrubbing the NVMe device on my real-hardware workstation hasn't turned up any issues, but=
the system has locked up a handful of times.

Just curious if others have seen the same, or if someone could point me in the right direction...

thanks!
-pete

--
Pete Wright
pete@nomadlogic.org

--0000000000002086d3060790e45c--