From owner-freebsd-bugs@freebsd.org Tue Nov 6 22:25:49 2018 Return-Path: Delivered-To: freebsd-bugs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 41C9D1103071 for ; Tue, 6 Nov 2018 22:25:49 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id AE60973206 for ; Tue, 6 Nov 2018 22:25:48 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: by mailman.ysv.freebsd.org (Postfix) id 739AD1103070; Tue, 6 Nov 2018 22:25:48 +0000 (UTC) Delivered-To: bugs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 51DA4110306F for ; Tue, 6 Nov 2018 22:25:48 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.ysv.freebsd.org (mxrelay.ysv.freebsd.org [IPv6:2001:1900:2254:206a::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (Client CN "mxrelay.ysv.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id A7440731FD for ; Tue, 6 Nov 2018 22:25:47 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.ysv.freebsd.org (Postfix) with ESMTPS id BC25010C9D for ; Tue, 6 Nov 2018 22:25:46 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id wA6MPkId073410 for ; Tue, 6 Nov 2018 22:25:46 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id wA6MPk1X073409 for bugs@FreeBSD.org; Tue, 6 Nov 2018 22:25:46 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 211713] NVME controller failure: resetting (Samsung SM961 SSD Drives) Date: Tue, 06 Nov 2018 22:25:44 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.3-RELEASE X-Bugzilla-Keywords: needs-qa X-Bugzilla-Severity: Affects Many People X-Bugzilla-Who: mentalbarcode@fastest.cc X-Bugzilla-Status: Open X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: bugs@FreeBSD.org X-Bugzilla-Flags: mfc-stable10? mfc-stable11? X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-Rspamd-Queue-Id: AE60973206 X-Spamd-Result: default: False [-105.65 / 200.00]; FORGED_RECIPIENTS_FORWARDING(0.00)[]; ALLOW_DOMAIN_WHITELIST(-100.00)[freebsd.org]; FORWARDED(0.00)[bugs@mailman.ysv.freebsd.org]; SPF_FAIL_FORWARDING(0.00)[]; TO_DN_NONE(0.00)[]; HAS_XAW(0.00)[]; R_SPF_SOFTFAIL(0.00)[~all]; XAW_SERVICE_ACCT(1.00)[]; MX_GOOD(-0.01)[cached: mx66.freebsd.org]; RCVD_IN_DNSWL_MED(-0.20)[5.0.0.0.0.5.0.0.0.0.0.0.0.0.0.0.a.6.0.2.4.5.2.2.0.0.9.1.1.0.0.2.list.dnswl.org : 127.0.9.2]; NEURAL_HAM_SHORT(-1.00)[-0.998,0]; RCVD_NO_TLS_LAST(0.10)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; IP_SCORE(-3.44)[ip: (-9.62), ipnet: 2001:1900:2254::/48(-4.30), asn: 10310(-3.22), country: US(-0.08)]; ASN(0.00)[asn:10310, ipnet:2001:1900:2254::/48, country:US]; FORGED_RECIPIENTS(0.00)[bugs@FreeBSD.org,freebsd-bugs@freebsd.org]; TO_DOM_EQ_FROM_DOM(0.00)[]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[freebsd.org]; RCPT_COUNT_ONE(0.00)[1]; FROM_NO_DN(0.00)[]; RCVD_COUNT_SEVEN(0.00)[7] X-Rspamd-Server: mx1.freebsd.org X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Nov 2018 22:25:49 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D211713 --- Comment #68 from David --- I'm testing FreeBSD 12.0-BETA3 r340039 GENERIC, and I have an PM961 PCIe NV= Me m.2 1TB drive that came with my Lenovo ThinkPad P50. P/N: MZSLW1T0HMLH-000L1 Produced Oct 2016 That drive is recognized by FreeBSD 12, but is not usable whatsoever (can't read/write to it). I've used this drive with Debian testing since 2016 wit= hout trouble on my ThinkPad P50. I installed FreeBSD 12 on an internal 2TB HDD in the ThinkPad in order to t= est FreeBSD, but the PM961 continued to cause boot delays -- I would see "nvme0: Missing interrupt" messages until the system finally gave up and continued = with the boot process. I attempted to install FreeBSD 11 on the 2TB HDD but the install failed whe= n it had trouble recognizing the nvme drive. Initially I thought the missing interrupt problem with FreeBSD was caused by the LUKS encryption on the nvme drive because I had not formatted that drive yet since I was dual booting. So I purchased another Samsung NVMe SSD 960 P= RO m.2 1TB drive P/N: MZVKP1T0HMJP, and that drive works with FreeBSD 12. The = new nvme was installed in the ThinkPad along with the original nvme and HDD dri= ve.=20 The 2TB HDD and the new 1TB nvme drives are dedicated to FreeBSD using ZFS.= I attempted to create a ZFS mirror using the two nvme drives and FreeBSD successfully wrote to the original nvme drive (because it overwrote my Linux partitions) but the overall `zfs_create_diskpart` process failed and I had = to start over using only the new nvme drive, which worked. I eventually remov= ed the original nvme drive from my laptop because of the constant "missing interrupt" delays. However, after removing the original nmve drive and while installing a virt= ual machine in VirtualBox on my new nvme, my laptop went into (what seemed to b= e) ACPI S3 suspended mode, and after I woke the machine the laptop rebooted itself. Thinking the problem was VirtualBox, I removed that software and s= etup Bhyve instead. During a virtual machine install in Bhyve, the laptop went = into an S3-style suspended mode again, and this time when I woke the machine I noticed the nvme0 resetting controller, write, read, and aborted-by-request messages in `dmesg` (output attached above). For the most part, the new nvme device seems stable with FreeBSD 12. I hav= en't test it with FreeBSD 11. I don't know if KDE's baloo service crashing and creating a 256GB core dump every single time I login is part of the problem using this drive. Today I disabled Baloo file indexing and installed anoth= er virtual machine using Bhyve and the system hasn't reported any problems with the nvme. I also used `dd` to create some 10GB and 100GB files using input from /dev/urandom, and that didn't cause any issues so far. Lastly, cold boots on the new nvme (without the old nvme installed in the laptop) are normal. However, reboots can take literally 2 minutes to comple= te.=20 This includes an extended delay on the BIOS screen before reaching the GELI password prompt, and a delay after loading the kernel before moving on to t= he ---<>--- screen, and the entire boot process is sluggish until finally reaching the login prompt. I've never experienced this with Debian testing= and I suspect the FreeBSD nvme driver is leaving the system in a weird state. IIRC setting hw.nvme.enable_aborts=3D1 while the original nvme drive is sti= ll in the laptop causes a kernel panic while booting. I haven't tried setting hw.nvme.per_cpu_io_queues=3D0 since the system is usable and not completely instable. Hardware details: # nvmecontrol devlist nvme0: SAMSUNG MZSLW1T0HMLH-000L1 nvme0ns1 (976762MB) nvme1: Samsung SSD 960 PRO 1TB nvme1ns1 (976762MB) # pciconf -lbace nvme0 nvme0@pci0:2:0:0: class=3D0x010802 card=3D0xa801144d chip=3D0xa804144= d rev=3D0x00 hdr=3D0x00 bar [10] =3D type Memory, range 64, base 0xd4400000, size 16384, enab= led cap 01[40] =3D powerspec 3 supports D0 D3 current D0 cap 05[50] =3D MSI supports 32 messages, 64 bit=20 cap 10[70] =3D PCI-Express 2 endpoint max data 256(256) FLR RO NS link x4(x4) speed 8.0(8.0) ASPM L1(L1) cap 11[b0] =3D MSI-X supports 33 messages, enabled Table in map 0x10[0x3000], PBA in map 0x10[0x2000] ecap 0001[100] =3D AER 2 0 fatal 0 non-fatal 1 corrected ecap 0003[148] =3D Serial 1 0000000000000000 ecap 0004[158] =3D Power Budgeting 1 ecap 0019[168] =3D PCIe Sec 1 lane errors 0 ecap 0018[188] =3D LTR 1 ecap 001e[190] =3D unknown 1 PCI-e errors =3D Correctable Error Detected Unsupported Request Detected Corrected =3D Advisory Non-Fatal Error # pciconf -lbace nvme1 nvme1@pci0:62:0:0: class=3D0x010802 card=3D0xa801144d chip=3D0xa804144= d rev=3D0x00 hdr=3D0x00 bar [10] =3D type Memory, range 64, base 0xd4200000, size 16384, enab= led cap 01[40] =3D powerspec 3 supports D0 D3 current D0 cap 05[50] =3D MSI supports 32 messages, 64 bit=20 cap 10[70] =3D PCI-Express 2 endpoint max data 256(256) FLR RO NS link x4(x4) speed 8.0(8.0) ASPM L1(L1) cap 11[b0] =3D MSI-X supports 8 messages, enabled Table in map 0x10[0x3000], PBA in map 0x10[0x2000] ecap 0001[100] =3D AER 2 0 fatal 0 non-fatal 1 corrected ecap 0003[148] =3D Serial 1 0000000000000000 ecap 0004[158] =3D Power Budgeting 1 ecap 0019[168] =3D PCIe Sec 1 lane errors 0 ecap 0018[188] =3D LTR 1 ecap 001e[190] =3D unknown 1 PCI-e errors =3D Correctable Error Detected Unsupported Request Detected Corrected =3D Advisory Non-Fatal Error # diskinfo -t /dev/nvme0ns1 /dev/nvme0ns1 512 # sectorsize 1024209543168 # mediasize in bytes (954G) 2000409264 # mediasize in sectors 0 # stripesize 0 # stripeoffset No # TRIM/UNMAP support Unknown # Rotation rate in RPM Seek times: Full stroke:^C Nov 6 18:58:09 fenixbsd kernel: nvme0: Missing interrupt Nov 6 18:58:39 fenixbsd syslogd: last message repeated 1 times # diskinfo -t /dev/nvme1ns1 /dev/nvme1ns1 512 # sectorsize 1024209543168 # mediasize in bytes (954G) 2000409264 # mediasize in sectors 0 # stripesize 0 # stripeoffset No # TRIM/UNMAP support Unknown # Rotation rate in RPM Seek times: Full stroke: 250 iter in 0.011499 sec =3D 0.046 msec Half stroke: 250 iter in 0.010018 sec =3D 0.040 msec Quarter stroke: 500 iter in 0.015302 sec =3D 0.031 msec Short forward: 400 iter in 0.013087 sec =3D 0.033 msec Short backward: 400 iter in 0.012144 sec =3D 0.030 msec Seq outer: 2048 iter in 0.041548 sec =3D 0.020 msec Seq inner: 2048 iter in 0.042294 sec =3D 0.021 msec Transfer rates: outside: 102400 kbytes in 0.066412 sec =3D 1541890 kbytes/= sec middle: 102400 kbytes in 0.064908 sec =3D 1577618 kbytes/= sec inside: 102400 kbytes in 0.064534 sec =3D 1586760 kbytes/= sec --=20 You are receiving this mail because: You are the assignee for the bug.=