Date: Tue, 06 Nov 2018 22:25:44 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 211713] NVME controller failure: resetting (Samsung SM961 SSD Drives) Message-ID: <bug-211713-227-kfim229tW1@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-211713-227@https.bugs.freebsd.org/bugzilla/>
index | next in thread | previous in thread | raw e-mail
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211713 --- Comment #68 from David <mentalbarcode@fastest.cc> --- I'm testing FreeBSD 12.0-BETA3 r340039 GENERIC, and I have an PM961 PCIe NVMe m.2 1TB drive that came with my Lenovo ThinkPad P50. P/N: MZSLW1T0HMLH-000L1 Produced Oct 2016 That drive is recognized by FreeBSD 12, but is not usable whatsoever (can't read/write to it). I've used this drive with Debian testing since 2016 without trouble on my ThinkPad P50. I installed FreeBSD 12 on an internal 2TB HDD in the ThinkPad in order to test FreeBSD, but the PM961 continued to cause boot delays -- I would see "nvme0: Missing interrupt" messages until the system finally gave up and continued with the boot process. I attempted to install FreeBSD 11 on the 2TB HDD but the install failed when it had trouble recognizing the nvme drive. Initially I thought the missing interrupt problem with FreeBSD was caused by the LUKS encryption on the nvme drive because I had not formatted that drive yet since I was dual booting. So I purchased another Samsung NVMe SSD 960 PRO m.2 1TB drive P/N: MZVKP1T0HMJP, and that drive works with FreeBSD 12. The new nvme was installed in the ThinkPad along with the original nvme and HDD drive. The 2TB HDD and the new 1TB nvme drives are dedicated to FreeBSD using ZFS. I attempted to create a ZFS mirror using the two nvme drives and FreeBSD successfully wrote to the original nvme drive (because it overwrote my Linux partitions) but the overall `zfs_create_diskpart` process failed and I had to start over using only the new nvme drive, which worked. I eventually removed the original nvme drive from my laptop because of the constant "missing interrupt" delays. However, after removing the original nmve drive and while installing a virtual machine in VirtualBox on my new nvme, my laptop went into (what seemed to be) ACPI S3 suspended mode, and after I woke the machine the laptop rebooted itself. Thinking the problem was VirtualBox, I removed that software and setup Bhyve instead. During a virtual machine install in Bhyve, the laptop went into an S3-style suspended mode again, and this time when I woke the machine I noticed the nvme0 resetting controller, write, read, and aborted-by-request messages in `dmesg` (output attached above). For the most part, the new nvme device seems stable with FreeBSD 12. I haven't test it with FreeBSD 11. I don't know if KDE's baloo service crashing and creating a 256GB core dump every single time I login is part of the problem using this drive. Today I disabled Baloo file indexing and installed another virtual machine using Bhyve and the system hasn't reported any problems with the nvme. I also used `dd` to create some 10GB and 100GB files using input from /dev/urandom, and that didn't cause any issues so far. Lastly, cold boots on the new nvme (without the old nvme installed in the laptop) are normal. However, reboots can take literally 2 minutes to complete. This includes an extended delay on the BIOS screen before reaching the GELI password prompt, and a delay after loading the kernel before moving on to the ---<<BOOT>>--- screen, and the entire boot process is sluggish until finally reaching the login prompt. I've never experienced this with Debian testing and I suspect the FreeBSD nvme driver is leaving the system in a weird state. IIRC setting hw.nvme.enable_aborts=1 while the original nvme drive is still in the laptop causes a kernel panic while booting. I haven't tried setting hw.nvme.per_cpu_io_queues=0 since the system is usable and not completely instable. Hardware details: # nvmecontrol devlist nvme0: SAMSUNG MZSLW1T0HMLH-000L1 nvme0ns1 (976762MB) nvme1: Samsung SSD 960 PRO 1TB nvme1ns1 (976762MB) # pciconf -lbace nvme0 nvme0@pci0:2:0:0: class=0x010802 card=0xa801144d chip=0xa804144d rev=0x00 hdr=0x00 bar [10] = type Memory, range 64, base 0xd4400000, size 16384, enabled cap 01[40] = powerspec 3 supports D0 D3 current D0 cap 05[50] = MSI supports 32 messages, 64 bit cap 10[70] = PCI-Express 2 endpoint max data 256(256) FLR RO NS link x4(x4) speed 8.0(8.0) ASPM L1(L1) cap 11[b0] = MSI-X supports 33 messages, enabled Table in map 0x10[0x3000], PBA in map 0x10[0x2000] ecap 0001[100] = AER 2 0 fatal 0 non-fatal 1 corrected ecap 0003[148] = Serial 1 0000000000000000 ecap 0004[158] = Power Budgeting 1 ecap 0019[168] = PCIe Sec 1 lane errors 0 ecap 0018[188] = LTR 1 ecap 001e[190] = unknown 1 PCI-e errors = Correctable Error Detected Unsupported Request Detected Corrected = Advisory Non-Fatal Error # pciconf -lbace nvme1 nvme1@pci0:62:0:0: class=0x010802 card=0xa801144d chip=0xa804144d rev=0x00 hdr=0x00 bar [10] = type Memory, range 64, base 0xd4200000, size 16384, enabled cap 01[40] = powerspec 3 supports D0 D3 current D0 cap 05[50] = MSI supports 32 messages, 64 bit cap 10[70] = PCI-Express 2 endpoint max data 256(256) FLR RO NS link x4(x4) speed 8.0(8.0) ASPM L1(L1) cap 11[b0] = MSI-X supports 8 messages, enabled Table in map 0x10[0x3000], PBA in map 0x10[0x2000] ecap 0001[100] = AER 2 0 fatal 0 non-fatal 1 corrected ecap 0003[148] = Serial 1 0000000000000000 ecap 0004[158] = Power Budgeting 1 ecap 0019[168] = PCIe Sec 1 lane errors 0 ecap 0018[188] = LTR 1 ecap 001e[190] = unknown 1 PCI-e errors = Correctable Error Detected Unsupported Request Detected Corrected = Advisory Non-Fatal Error # diskinfo -t /dev/nvme0ns1 /dev/nvme0ns1 512 # sectorsize 1024209543168 # mediasize in bytes (954G) 2000409264 # mediasize in sectors 0 # stripesize 0 # stripeoffset No # TRIM/UNMAP support Unknown # Rotation rate in RPM Seek times: Full stroke:^C Nov 6 18:58:09 fenixbsd kernel: nvme0: Missing interrupt Nov 6 18:58:39 fenixbsd syslogd: last message repeated 1 times # diskinfo -t /dev/nvme1ns1 /dev/nvme1ns1 512 # sectorsize 1024209543168 # mediasize in bytes (954G) 2000409264 # mediasize in sectors 0 # stripesize 0 # stripeoffset No # TRIM/UNMAP support Unknown # Rotation rate in RPM Seek times: Full stroke: 250 iter in 0.011499 sec = 0.046 msec Half stroke: 250 iter in 0.010018 sec = 0.040 msec Quarter stroke: 500 iter in 0.015302 sec = 0.031 msec Short forward: 400 iter in 0.013087 sec = 0.033 msec Short backward: 400 iter in 0.012144 sec = 0.030 msec Seq outer: 2048 iter in 0.041548 sec = 0.020 msec Seq inner: 2048 iter in 0.042294 sec = 0.021 msec Transfer rates: outside: 102400 kbytes in 0.066412 sec = 1541890 kbytes/sec middle: 102400 kbytes in 0.064908 sec = 1577618 kbytes/sec inside: 102400 kbytes in 0.064534 sec = 1586760 kbytes/sec -- You are receiving this mail because: You are the assignee for the bug.help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-211713-227-kfim229tW1>
