From nobody Sun May 22 05:27:29 2022 X-Original-To: fs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 058D71B3B820 for ; Sun, 22 May 2022 05:27:30 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4L5TVn3Yyrz3qDn for ; Sun, 22 May 2022 05:27:29 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 2C21D1554B for ; Sun, 22 May 2022 05:27:29 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 24M5RTlC090857 for ; Sun, 22 May 2022 05:27:29 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 24M5RT1w090856 for fs@FreeBSD.org; Sun, 22 May 2022 05:27:29 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: fs@FreeBSD.org Subject: [Bug 264141] nvme(4): Heavy load to SSD wedges 13.1 system: Controller in fatal status, resetting ... Resetting controller due to a timeout and possible hot unplug. Date: Sun, 22 May 2022 05:27:29 +0000 X-Bugzilla-Reason: CC AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 13.1-RELEASE X-Bugzilla-Keywords: needs-qa, regression X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: imp@FreeBSD.org X-Bugzilla-Status: Open X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: fs@FreeBSD.org X-Bugzilla-Flags: maintainer-feedback? X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated List-Id: Filesystems List-Archive: https://lists.freebsd.org/archives/freebsd-fs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-fs@freebsd.org MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1653197249; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UyGkBjPT4GrjcqlOFUUtuQfRZLZCHPt2dw/0AEZLH2s=; b=qKbQUc7n5NyG2rP5rAcoeBFTOfxM2wekTeu4LkqqXe/bb5OcuKPQUJ/LrAklsdIuAzkYGO SSIJSqUwcxt7TGIfe/spzzxitFsvkynujXTuYvDvqrp4J63axogv1WiHpMUrD5AdFUNNWZ +wmzNf80H85nh7opsT+qalBgkOKRUCtOXTPh1fgg9unjh2pVAyMkWF+Dv9WkasIWkeDJZR s3k4+bfEDVk5g8q3aq2Edppx3o5a66pOFXuABor89TUZkpjoBJSXYpY8fRTtVbEtDwIi0/ omWXowHVi/EAMFd31HLrKYkoW1MigLtbuBiH+L8MdxYkDdjAHiYrgCPzEocpOQ== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1653197249; a=rsa-sha256; cv=none; b=at4OWo+Kd4Dn8Df387W4xWxkaLhKkERqMkt9koC/k9QgOs1DzEEOaOrtPGh7g4q/OiSYxK BY7O7vMmUS373ze9ciI/a4U9Gcb8re9bojo68zoSUKuc6hvaIpfTUW6CBBtZEwgFTIlXKK 44WE8NIolktaLkElXVVMNKPlC2LXfc1P9I8qYth+e2a4OCINDHs0ivUAeJJjbBH/5tVatz r7qrg81H761fl46VImQhJBmmbhV+kMmW1DZbpj8fXzerQYOKvbRY7Z2qJnV/cOh1Tztcj8 RDDeAnif3PBzmlu6Le7geAt8Rk86XIUPgJZHGSQoZr3zhuynseloeWjkTcKxeQ== ARC-Authentication-Results: i=1; mx1.freebsd.org; none X-ThisMailContainsUnwantedMimeParts: N https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D264141 --- Comment #7 from Warner Losh --- nda is an alternative to nvd that uses CAM. Unless you need really high IOP= S, nda generally is better than nvd. In loader.conf, add 'hw.nvme.use_nvd=3D0' and reboot. We provide a compatible /dev/nvd* that points to /dev/nda* so almost all us= es of /dev/nvd* should work. But with zfs, chances are you won't notice. I wrote this code, but had trouble driving the nvme drives I have access too off the cliff to test all pathological behaviors. This is one I tested in simulation. However, looking at the code, I fear that this workaround likely won't help you. The message happens when we fail the controller, and that seems to be happening when reset fails (which we should report directly, but apparently don't). Do you have issues with the machines being too hot or having poor airflow o= ver the nvme cards so they get too hot? In general, FreeBSD (or any OS) shouldn= 't be able to schedule so much I/O that the card's SoC controller fails... At least not in a repeatable way across multiple drive types. The 'possible hotplug' means we read all 'f's before trying to do a reset. If the card is= n't there at all, we'll timeout and fail the controller (which maybe what's rea= lly going on). That suggests power and/or cabling issues if it isn't thermal somehow. It would be good to eliminate these possibilities if at all possib= le. --=20 You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug.=