Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 06 Nov 2018 22:25:44 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 211713] NVME controller failure: resetting (Samsung SM961 SSD Drives)
Message-ID:  <bug-211713-227-kfim229tW1@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-211713-227@https.bugs.freebsd.org/bugzilla/>
References:  <bug-211713-227@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D211713

--- Comment #68 from David <mentalbarcode@fastest.cc> ---
I'm testing FreeBSD 12.0-BETA3 r340039 GENERIC, and I have an PM961 PCIe NV=
Me
m.2 1TB drive that came with my Lenovo ThinkPad P50.
P/N: MZSLW1T0HMLH-000L1 Produced Oct 2016

That drive is recognized by FreeBSD 12, but is not usable whatsoever (can't
read/write to it).  I've used this drive with Debian testing since 2016 wit=
hout
trouble on my ThinkPad P50.

I installed FreeBSD 12 on an internal 2TB HDD in the ThinkPad in order to t=
est
FreeBSD, but the PM961 continued to cause boot delays -- I would see "nvme0:
Missing interrupt" messages until the system finally gave up and continued =
with
the boot process.

I attempted to install FreeBSD 11 on the 2TB HDD but the install failed whe=
n it
had trouble recognizing the nvme drive.

Initially I thought the missing interrupt problem with FreeBSD was caused by
the LUKS encryption on the nvme drive because I had not formatted that drive
yet since I was dual booting. So I purchased another Samsung NVMe SSD 960 P=
RO
m.2 1TB drive P/N: MZVKP1T0HMJP, and that drive works with FreeBSD 12. The =
new
nvme was installed in the ThinkPad along with the original nvme and HDD dri=
ve.=20
The 2TB HDD and the new 1TB nvme drives are dedicated to FreeBSD using ZFS.=
  I
attempted to create a ZFS mirror using the two nvme drives and FreeBSD
successfully wrote to the original nvme drive (because it overwrote my Linux
partitions) but the overall `zfs_create_diskpart` process failed and I had =
to
start over using only the new nvme drive, which worked.  I eventually remov=
ed
the original nvme drive from my laptop because of the constant "missing
interrupt" delays.

However, after removing the original nmve drive and while installing a virt=
ual
machine in VirtualBox on my new nvme, my laptop went into (what seemed to b=
e)
ACPI S3 suspended mode, and after I woke the machine the laptop rebooted
itself.  Thinking the problem was VirtualBox, I removed that software and s=
etup
Bhyve instead.  During a virtual machine install in Bhyve, the laptop went =
into
an S3-style suspended mode again, and this time when I woke the machine I
noticed the nvme0 resetting controller, write, read, and aborted-by-request
messages in `dmesg` (output attached above).

For the most part, the new nvme device seems stable with FreeBSD 12.  I hav=
en't
test it with FreeBSD 11.  I don't know if KDE's baloo service crashing and
creating a 256GB core dump every single time I login is part of the problem
using this drive.  Today I disabled Baloo file indexing and installed anoth=
er
virtual machine using Bhyve and the system hasn't reported any problems with
the nvme.  I also used `dd` to create some 10GB and 100GB files using input
from /dev/urandom, and that didn't cause any issues so far.

Lastly, cold boots on the new nvme (without the old nvme installed in the
laptop) are normal. However, reboots can take literally 2 minutes to comple=
te.=20
This includes an extended delay on the BIOS screen before reaching the GELI
password prompt, and a delay after loading the kernel before moving on to t=
he
---<<BOOT>>--- screen, and the entire boot process is sluggish until finally
reaching the login prompt.  I've never experienced this with Debian testing=
 and
I suspect the FreeBSD nvme driver is leaving the system in a weird state.

IIRC setting hw.nvme.enable_aborts=3D1 while the original nvme drive is sti=
ll in
the laptop causes a kernel panic while booting.  I haven't tried setting
hw.nvme.per_cpu_io_queues=3D0 since the system is usable and not completely
instable.

Hardware details:

# nvmecontrol devlist
 nvme0: SAMSUNG MZSLW1T0HMLH-000L1
    nvme0ns1 (976762MB)
 nvme1: Samsung SSD 960 PRO 1TB
    nvme1ns1 (976762MB)

# pciconf -lbace nvme0
nvme0@pci0:2:0:0:       class=3D0x010802 card=3D0xa801144d chip=3D0xa804144=
d rev=3D0x00
hdr=3D0x00
    bar   [10] =3D type Memory, range 64, base 0xd4400000, size 16384, enab=
led
    cap 01[40] =3D powerspec 3  supports D0 D3  current D0
    cap 05[50] =3D MSI supports 32 messages, 64 bit=20
    cap 10[70] =3D PCI-Express 2 endpoint max data 256(256) FLR RO NS
                 link x4(x4) speed 8.0(8.0) ASPM L1(L1)
    cap 11[b0] =3D MSI-X supports 33 messages, enabled
                 Table in map 0x10[0x3000], PBA in map 0x10[0x2000]
    ecap 0001[100] =3D AER 2 0 fatal 0 non-fatal 1 corrected
    ecap 0003[148] =3D Serial 1 0000000000000000
    ecap 0004[158] =3D Power Budgeting 1
    ecap 0019[168] =3D PCIe Sec 1 lane errors 0
    ecap 0018[188] =3D LTR 1
    ecap 001e[190] =3D unknown 1
  PCI-e errors =3D Correctable Error Detected
                 Unsupported Request Detected
     Corrected =3D Advisory Non-Fatal Error

# pciconf -lbace nvme1
nvme1@pci0:62:0:0:      class=3D0x010802 card=3D0xa801144d chip=3D0xa804144=
d rev=3D0x00
hdr=3D0x00
    bar   [10] =3D type Memory, range 64, base 0xd4200000, size 16384, enab=
led
    cap 01[40] =3D powerspec 3  supports D0 D3  current D0
    cap 05[50] =3D MSI supports 32 messages, 64 bit=20
    cap 10[70] =3D PCI-Express 2 endpoint max data 256(256) FLR RO NS
                 link x4(x4) speed 8.0(8.0) ASPM L1(L1)
    cap 11[b0] =3D MSI-X supports 8 messages, enabled
                 Table in map 0x10[0x3000], PBA in map 0x10[0x2000]
    ecap 0001[100] =3D AER 2 0 fatal 0 non-fatal 1 corrected
    ecap 0003[148] =3D Serial 1 0000000000000000
    ecap 0004[158] =3D Power Budgeting 1
    ecap 0019[168] =3D PCIe Sec 1 lane errors 0
    ecap 0018[188] =3D LTR 1
    ecap 001e[190] =3D unknown 1
  PCI-e errors =3D Correctable Error Detected
                 Unsupported Request Detected
     Corrected =3D Advisory Non-Fatal Error

# diskinfo -t /dev/nvme0ns1
/dev/nvme0ns1
        512             # sectorsize
        1024209543168   # mediasize in bytes (954G)
        2000409264      # mediasize in sectors
        0               # stripesize
        0               # stripeoffset
        No              # TRIM/UNMAP support
        Unknown         # Rotation rate in RPM

Seek times:
        Full stroke:^C
Nov  6 18:58:09 fenixbsd kernel: nvme0: Missing interrupt
Nov  6 18:58:39 fenixbsd syslogd: last message repeated 1 times

# diskinfo -t /dev/nvme1ns1
/dev/nvme1ns1
        512             # sectorsize
        1024209543168   # mediasize in bytes (954G)
        2000409264      # mediasize in sectors
        0               # stripesize
        0               # stripeoffset
        No              # TRIM/UNMAP support
        Unknown         # Rotation rate in RPM

Seek times:
        Full stroke:      250 iter in   0.011499 sec =3D    0.046 msec
        Half stroke:      250 iter in   0.010018 sec =3D    0.040 msec
        Quarter stroke:   500 iter in   0.015302 sec =3D    0.031 msec
        Short forward:    400 iter in   0.013087 sec =3D    0.033 msec
        Short backward:   400 iter in   0.012144 sec =3D    0.030 msec
        Seq outer:       2048 iter in   0.041548 sec =3D    0.020 msec
        Seq inner:       2048 iter in   0.042294 sec =3D    0.021 msec

Transfer rates:
        outside:       102400 kbytes in   0.066412 sec =3D  1541890 kbytes/=
sec
        middle:        102400 kbytes in   0.064908 sec =3D  1577618 kbytes/=
sec
        inside:        102400 kbytes in   0.064534 sec =3D  1586760 kbytes/=
sec

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-211713-227-kfim229tW1>