From owner-freebsd-bugs@freebsd.org Sat Mar 31 07:04:19 2018 Return-Path: Delivered-To: freebsd-bugs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id CA4C0F68811 for ; Sat, 31 Mar 2018 07:04:19 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.ysv.freebsd.org (mxrelay.ysv.freebsd.org [IPv6:2001:1900:2254:206a::19:3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mxrelay.ysv.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 676A17577F for ; Sat, 31 Mar 2018 07:04:19 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.ysv.freebsd.org (Postfix) with ESMTPS id 8147619232 for ; Sat, 31 Mar 2018 07:04:18 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id w2V74Ilw003041 for ; Sat, 31 Mar 2018 07:04:18 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id w2V74Ird003040 for freebsd-bugs@FreeBSD.org; Sat, 31 Mar 2018 07:04:18 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: freebsd-bugs@FreeBSD.org Subject: [Bug 211713] NVME controller failure: resetting (Samsung SM961 SSD Drives) Date: Sat, 31 Mar 2018 07:04:15 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.3-RELEASE X-Bugzilla-Keywords: needs-qa X-Bugzilla-Severity: Affects Many People X-Bugzilla-Who: freebsd-ssa@mailden.net X-Bugzilla-Status: Open X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-bugs@FreeBSD.org X-Bugzilla-Flags: mfc-stable10? mfc-stable11? X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 31 Mar 2018 07:04:20 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D211713 --- Comment #58 from stan --- following my comment #57, here more debug info in another context with same hardware :=20 I am able to boot TrueOS-Desktop-201803131015 with `hw.nvme.per_cpi_io_queues=3D"0"` set in /boot/loader.conf. Everything works well, BUT fatal error comes when trying to resume after s3 suspend mode :=20 I see kernel messages ending with :=20 ``` (=E2=80=A6) kernel: WARN_ON(=E2=80=A6stripped=E2=80=A6) CSR SSP Base Not fi= ne (=E2=80=A6) kernel: CSR HTP Not fine (=E2=80=A6) kernel: WARN_ON(=E2=80=A6stripped=E2=80=A6) Clearing unexpected= auxiliary request for power well 2 ``` then : ``` nvme0: resetting controller nvme0: controller ready did not become 0 within 30000 ms nvme0: failing queued i/o nvme0: READ sqid:1 cid:0 nsid: 1 lba:324015968 len:20 nvme0: ABORTED - BY REQUEST (00/07) sqid:1 cid:0 cdw0:0 ``` and similar errors repeated a dozen times,=20 then the fatal : ``` nvd0: lost device - 0 outstanding nvd0: removing device entry nvme0: WRITE sqid:1 cid:0 nsid:1 lba:4416948 len:48 nvme0: ABORTED - BY REQUEST (00/07) sqid:1 cid:0 cdw0:0 Fatal trap 12: page fault while in kernel mode cpuid =3D 4; apic id =3D 04 fault virtual address =3D 0x8 fault code =3D supervisor read data, page not prese= nt instruction pointer =3D 0x20:0xffffffff80a3b141 stack pointer =3D 0x28:0xfffffe0000545820 frame pointer =3D 0x28:0xfffffe0000545860 code segment =3D base 0x0, limit 0xfffff, type 0x1b =3D DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags =3D interrupt enabled, resume, IOPL =3D 0 current process =3D 0 (nvme taskq) [ thread pid 0 tid 100077 ] stopped at g_disk_done+0xc1: movq 0x8(%rax),%rdi db> ``` --=20 You are receiving this mail because: You are the assignee for the bug.=