From nobody Fri Oct 13 03:40:57 2023 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4S6C393nVfz4wMRX for ; Fri, 13 Oct 2023 03:41:09 +0000 (UTC) (envelope-from pete@nomadlogic.org) Received: from mail.nomadlogic.org (mail.nomadlogic.org [66.165.241.226]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "mail.nomadlogic.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4S6C382JTNz3FlS for ; Fri, 13 Oct 2023 03:41:08 +0000 (UTC) (envelope-from pete@nomadlogic.org) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=nomadlogic.org header.s=04242021 header.b=E34B1Ltr; spf=pass (mx1.freebsd.org: domain of pete@nomadlogic.org designates 66.165.241.226 as permitted sender) smtp.mailfrom=pete@nomadlogic.org; dmarc=pass (policy=quarantine) header.from=nomadlogic.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nomadlogic.org; s=04242021; t=1697168460; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=701FwNRPc3nu7tgAmk87df25JaaQ5OghELGsvoQ1cmw=; b=E34B1Ltrmkm02eHmkFyx2Nhj/R5YPTb6LR9kX1NZNYbJA0/5MTobVW73fSXqIx25TTHqws Q1d0RSXHj7M8TgHf64JyGyyigTdq/fubV7QSNCA1cXKMkU1P/ttSpmFEr6Q5gVFW+1LbYq INcrVsLzXIYGAPnGIyi4ZmSIKGB11tk= Received: from [192.168.1.240] (cpe-24-24-168-214.socal.res.rr.com [24.24.168.214]) by mail.nomadlogic.org (OpenSMTPD) with ESMTPSA id 795a528f (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO) for ; Fri, 13 Oct 2023 03:40:59 +0000 (UTC) Message-ID: <90d3e532-8ea7-4eea-8e31-8c363285a156@nomadlogic.org> Date: Thu, 12 Oct 2023 20:40:57 -0700 List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US To: freebsd-current@freebsd.org From: Pete Wright Subject: nvme timeout issues with hardware and bhyve vm's Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spamd-Bar: --- X-Spamd-Result: default: False [-3.98 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.99)[-0.994]; DMARC_POLICY_ALLOW(-0.50)[nomadlogic.org,quarantine]; R_SPF_ALLOW(-0.20)[+mx]; R_DKIM_ALLOW(-0.20)[nomadlogic.org:s=04242021]; MIME_GOOD(-0.10)[text/plain]; XM_UA_NO_VERSION(0.01)[]; ASN(0.00)[asn:29802, ipnet:66.165.240.0/22, country:US]; FROM_EQ_ENVFROM(0.00)[]; MLMMJ_DEST(0.00)[freebsd-current@freebsd.org]; RCVD_COUNT_ONE(0.00)[1]; MIME_TRACE(0.00)[0:+]; DKIM_TRACE(0.00)[nomadlogic.org:+]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; FROM_HAS_DN(0.00)[]; ARC_NA(0.00)[]; RCVD_TLS_ALL(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; TO_DN_NONE(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[freebsd-current@freebsd.org]; MID_RHS_MATCH_FROM(0.00)[] X-Rspamd-Queue-Id: 4S6C382JTNz3FlS hey there - i was curious if anyone has had issues with nvme devices recently. i'm chasing down similar issues on my workstation which has a physical NVMe zroot, and on a bhyve VM which has a large pool exposed as a NVMe device (and is backed by a zvol). on the most recent bhyve issue the VM reported this: Oct 13 02:52:52 emby kernel: nvme1: RECOVERY_START 13737432416007567 vs 13737432371683671 Oct 13 02:52:52 emby kernel: nvme1: RECOVERY_START 13737432718499597 vs 13737432371683671 Oct 13 02:52:52 emby kernel: nvme1: timeout with nothing complete, resetting Oct 13 02:52:52 emby kernel: nvme1: Resetting controller due to a timeout. Oct 13 02:52:52 emby kernel: nvme1: RECOVERY_WAITING Oct 13 02:52:52 emby kernel: nvme1: resetting controller Oct 13 02:52:53 emby kernel: nvme1: waiting Oct 13 02:53:23 emby syslogd: last message repeated 114 times Oct 13 02:53:23 emby kernel: nvme1: controller ready did not become 1 within 30500 ms Oct 13 02:53:23 emby kernel: nvme1: failing outstanding i/o Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:1 cid:119 nsid:1 lba:4968850592 len:256 Oct 13 02:53:23 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 m:0 dnr:1 sqid:1 cid:119 cdw0:0 Oct 13 02:53:23 emby kernel: nvme1: failing outstanding i/o Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:6 cid:0 nsid:1 lba:5241952432 len:32 Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:3 cid:123 nsid:1 lba:4968850336 len:256 Oct 13 02:53:23 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 m:0 dnr:1 sqid:3 cid:123 cdw0:0 Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:3 cid:0 nsid:1 lba:5242495888 len:256 Oct 13 02:53:23 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 m:0 dnr:0 sqid:3 cid:0 cdw0:0 Oct 13 02:53:23 emby kernel: nvme1: READ sqid:3 cid:0 nsid:1 lba:528 len:16 Oct 13 02:53:23 emby kernel: nvme1: WRITE sqid:5 cid:0 nsid:1 lba:4934226784 len:96 Oct 13 02:53:23 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 m:0 dnr:0 sqid:3 cid:0 cdw0:0 Oct 13 02:53:23 emby kernel: nvme1: READ sqid:3 cid:0 nsid:1 lba:6442449936 len:16 Oct 13 02:53:25 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 m:0 dnr:0 sqid:3 cid:0 cdw0:0 Oct 13 02:53:25 emby kernel: nvme1: READ sqid:3 cid:0 nsid:1 lba:6442450448 len:16 Oct 13 02:53:25 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 m:0 dnr:0 sqid:3 cid:0 cdw0:0 Oct 13 02:53:25 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 m:0 dnr:0 sqid:5 cid:0 cdw0:0 Oct 13 02:53:25 emby kernel: nvme1: ABORTED - BY REQUEST (00/07) crd:0 m:0 dnr:0 sqid:6 cid:0 cdw0:0 Oct 13 02:53:25 emby kernel: nvd1: detached I had similar issues on my workstation as well. Scrubbing the NVMe device on my real-hardware workstation hasn't turned up any issues, but the system has locked up a handful of times. Just curious if others have seen the same, or if someone could point me in the right direction... thanks! -pete -- Pete Wright pete@nomadlogic.org