Date: Tue, 06 Nov 2018 22:25:44 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 211713] NVME controller failure: resetting (Samsung SM961 SSD Drives) Message-ID: <bug-211713-227-kfim229tW1@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-211713-227@https.bugs.freebsd.org/bugzilla/> References: <bug-211713-227@https.bugs.freebsd.org/bugzilla/>
next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D211713 --- Comment #68 from David <mentalbarcode@fastest.cc> --- I'm testing FreeBSD 12.0-BETA3 r340039 GENERIC, and I have an PM961 PCIe NV= Me m.2 1TB drive that came with my Lenovo ThinkPad P50. P/N: MZSLW1T0HMLH-000L1 Produced Oct 2016 That drive is recognized by FreeBSD 12, but is not usable whatsoever (can't read/write to it). I've used this drive with Debian testing since 2016 wit= hout trouble on my ThinkPad P50. I installed FreeBSD 12 on an internal 2TB HDD in the ThinkPad in order to t= est FreeBSD, but the PM961 continued to cause boot delays -- I would see "nvme0: Missing interrupt" messages until the system finally gave up and continued = with the boot process. I attempted to install FreeBSD 11 on the 2TB HDD but the install failed whe= n it had trouble recognizing the nvme drive. Initially I thought the missing interrupt problem with FreeBSD was caused by the LUKS encryption on the nvme drive because I had not formatted that drive yet since I was dual booting. So I purchased another Samsung NVMe SSD 960 P= RO m.2 1TB drive P/N: MZVKP1T0HMJP, and that drive works with FreeBSD 12. The = new nvme was installed in the ThinkPad along with the original nvme and HDD dri= ve.=20 The 2TB HDD and the new 1TB nvme drives are dedicated to FreeBSD using ZFS.= I attempted to create a ZFS mirror using the two nvme drives and FreeBSD successfully wrote to the original nvme drive (because it overwrote my Linux partitions) but the overall `zfs_create_diskpart` process failed and I had = to start over using only the new nvme drive, which worked. I eventually remov= ed the original nvme drive from my laptop because of the constant "missing interrupt" delays. However, after removing the original nmve drive and while installing a virt= ual machine in VirtualBox on my new nvme, my laptop went into (what seemed to b= e) ACPI S3 suspended mode, and after I woke the machine the laptop rebooted itself. Thinking the problem was VirtualBox, I removed that software and s= etup Bhyve instead. During a virtual machine install in Bhyve, the laptop went = into an S3-style suspended mode again, and this time when I woke the machine I noticed the nvme0 resetting controller, write, read, and aborted-by-request messages in `dmesg` (output attached above). For the most part, the new nvme device seems stable with FreeBSD 12. I hav= en't test it with FreeBSD 11. I don't know if KDE's baloo service crashing and creating a 256GB core dump every single time I login is part of the problem using this drive. Today I disabled Baloo file indexing and installed anoth= er virtual machine using Bhyve and the system hasn't reported any problems with the nvme. I also used `dd` to create some 10GB and 100GB files using input from /dev/urandom, and that didn't cause any issues so far. Lastly, cold boots on the new nvme (without the old nvme installed in the laptop) are normal. However, reboots can take literally 2 minutes to comple= te.=20 This includes an extended delay on the BIOS screen before reaching the GELI password prompt, and a delay after loading the kernel before moving on to t= he ---<<BOOT>>--- screen, and the entire boot process is sluggish until finally reaching the login prompt. I've never experienced this with Debian testing= and I suspect the FreeBSD nvme driver is leaving the system in a weird state. IIRC setting hw.nvme.enable_aborts=3D1 while the original nvme drive is sti= ll in the laptop causes a kernel panic while booting. I haven't tried setting hw.nvme.per_cpu_io_queues=3D0 since the system is usable and not completely instable. Hardware details: # nvmecontrol devlist nvme0: SAMSUNG MZSLW1T0HMLH-000L1 nvme0ns1 (976762MB) nvme1: Samsung SSD 960 PRO 1TB nvme1ns1 (976762MB) # pciconf -lbace nvme0 nvme0@pci0:2:0:0: class=3D0x010802 card=3D0xa801144d chip=3D0xa804144= d rev=3D0x00 hdr=3D0x00 bar [10] =3D type Memory, range 64, base 0xd4400000, size 16384, enab= led cap 01[40] =3D powerspec 3 supports D0 D3 current D0 cap 05[50] =3D MSI supports 32 messages, 64 bit=20 cap 10[70] =3D PCI-Express 2 endpoint max data 256(256) FLR RO NS link x4(x4) speed 8.0(8.0) ASPM L1(L1) cap 11[b0] =3D MSI-X supports 33 messages, enabled Table in map 0x10[0x3000], PBA in map 0x10[0x2000] ecap 0001[100] =3D AER 2 0 fatal 0 non-fatal 1 corrected ecap 0003[148] =3D Serial 1 0000000000000000 ecap 0004[158] =3D Power Budgeting 1 ecap 0019[168] =3D PCIe Sec 1 lane errors 0 ecap 0018[188] =3D LTR 1 ecap 001e[190] =3D unknown 1 PCI-e errors =3D Correctable Error Detected Unsupported Request Detected Corrected =3D Advisory Non-Fatal Error # pciconf -lbace nvme1 nvme1@pci0:62:0:0: class=3D0x010802 card=3D0xa801144d chip=3D0xa804144= d rev=3D0x00 hdr=3D0x00 bar [10] =3D type Memory, range 64, base 0xd4200000, size 16384, enab= led cap 01[40] =3D powerspec 3 supports D0 D3 current D0 cap 05[50] =3D MSI supports 32 messages, 64 bit=20 cap 10[70] =3D PCI-Express 2 endpoint max data 256(256) FLR RO NS link x4(x4) speed 8.0(8.0) ASPM L1(L1) cap 11[b0] =3D MSI-X supports 8 messages, enabled Table in map 0x10[0x3000], PBA in map 0x10[0x2000] ecap 0001[100] =3D AER 2 0 fatal 0 non-fatal 1 corrected ecap 0003[148] =3D Serial 1 0000000000000000 ecap 0004[158] =3D Power Budgeting 1 ecap 0019[168] =3D PCIe Sec 1 lane errors 0 ecap 0018[188] =3D LTR 1 ecap 001e[190] =3D unknown 1 PCI-e errors =3D Correctable Error Detected Unsupported Request Detected Corrected =3D Advisory Non-Fatal Error # diskinfo -t /dev/nvme0ns1 /dev/nvme0ns1 512 # sectorsize 1024209543168 # mediasize in bytes (954G) 2000409264 # mediasize in sectors 0 # stripesize 0 # stripeoffset No # TRIM/UNMAP support Unknown # Rotation rate in RPM Seek times: Full stroke:^C Nov 6 18:58:09 fenixbsd kernel: nvme0: Missing interrupt Nov 6 18:58:39 fenixbsd syslogd: last message repeated 1 times # diskinfo -t /dev/nvme1ns1 /dev/nvme1ns1 512 # sectorsize 1024209543168 # mediasize in bytes (954G) 2000409264 # mediasize in sectors 0 # stripesize 0 # stripeoffset No # TRIM/UNMAP support Unknown # Rotation rate in RPM Seek times: Full stroke: 250 iter in 0.011499 sec =3D 0.046 msec Half stroke: 250 iter in 0.010018 sec =3D 0.040 msec Quarter stroke: 500 iter in 0.015302 sec =3D 0.031 msec Short forward: 400 iter in 0.013087 sec =3D 0.033 msec Short backward: 400 iter in 0.012144 sec =3D 0.030 msec Seq outer: 2048 iter in 0.041548 sec =3D 0.020 msec Seq inner: 2048 iter in 0.042294 sec =3D 0.021 msec Transfer rates: outside: 102400 kbytes in 0.066412 sec =3D 1541890 kbytes/= sec middle: 102400 kbytes in 0.064908 sec =3D 1577618 kbytes/= sec inside: 102400 kbytes in 0.064534 sec =3D 1586760 kbytes/= sec --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-211713-227-kfim229tW1>