From owner-freebsd-bugs@freebsd.org Fri Mar 16 05:28:48 2018 Return-Path: Delivered-To: freebsd-bugs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9B6B5F5213B for ; Fri, 16 Mar 2018 05:28:48 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.ysv.freebsd.org (mxrelay.ysv.freebsd.org [IPv6:2001:1900:2254:206a::19:3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mxrelay.ysv.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 380D16C788 for ; Fri, 16 Mar 2018 05:28:48 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.ysv.freebsd.org (Postfix) with ESMTPS id 7D7551B22D for ; Fri, 16 Mar 2018 05:28:47 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id w2G5Sl7H034650 for ; Fri, 16 Mar 2018 05:28:47 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id w2G5SlFu034647 for freebsd-bugs@FreeBSD.org; Fri, 16 Mar 2018 05:28:47 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: freebsd-bugs@FreeBSD.org Subject: [Bug 211713] NVME controller failure: resetting (Samsung SM961 SSD Drives) Date: Fri, 16 Mar 2018 05:28:46 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.3-RELEASE X-Bugzilla-Keywords: needs-qa X-Bugzilla-Severity: Affects Many People X-Bugzilla-Who: imp@FreeBSD.org X-Bugzilla-Status: Open X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-bugs@FreeBSD.org X-Bugzilla-Flags: mfc-stable10? mfc-stable11? X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Mar 2018 05:28:48 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D211713 --- Comment #51 from Warner Losh --- You might try hw.nvme.enable_aborts=3D1 in loader.conf. This will enable ab= orting the command on timeouts when there's no fatal error indicated. This might h= elp. Also, r331046 has a workaround suggested by Jim Harris. IF there's no fatal error signaled, we'll poll the completion queue. If that works, we move on (with a loud printf that will likely have a performance issue, but we'll see it). If not, and no fatal error signaled and aborts are enabled, we'll abort the command. Otherwise we'll reset the card (the current behavior). I could never recreate this problem, despite buying the exact card (I think) that others have reported as being bad. So, if you can reproduce this probl= em, please try r331046 or later and let me know if that helps or not. --=20 You are receiving this mail because: You are the assignee for the bug.=