Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 6 Oct 2015 11:03:43 -0700
From:      Jim Harris <jimharris@freebsd.org>
To:        Steven Hartland <killing@multiplay.co.uk>, Sean Kelly <smkelly@smkelly.org>
Cc:        FreeBSD-STABLE Mailing List <freebsd-stable@freebsd.org>
Subject:   Re: Dell NVMe issues
Message-ID:  <CAJP=Hc9-oQnk2r48OBXVCQbMDn0URDMDb80a0i0XvUDPuuLkrA@mail.gmail.com>
In-Reply-To: <5613FA02.2080205@multiplay.co.uk>
References:  <BC5F191D-FEB2-4ADC-9D6B-240C80B2301C@smkelly.org> <5613FA02.2080205@multiplay.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
--089e0115f3c453352f05217373e9
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On Tue, Oct 6, 2015 at 9:42 AM, Steven Hartland <killing@multiplay.co.uk>
wrote:

> Also looks like nvme exposes a timeout_period sysctl you could try
> increasing that as it could be too small for a full disk TRIM.
>

> Under CAM SCSI da support we have a delete_max which limits the max singl=
e
> request size for a delete it may be we need something similar for nvme as
> well to prevent this as it should still be chunking the deletes to ensure
> this sort of thing doesn't happen.


See attached.  Sean - can you try this patch with TRIM re-enabled in ZFS?

I would be curious if TRIM passes without this patch if you increase the
timeout_period as suggested.

-Jim




>
>
> On 06/10/2015 16:18, Sean Kelly wrote:
>
>> Back in May, I posted about issues I was having with a Dell PE R630 with
>> 4x800GB NVMe SSDs. I would get kernel panics due to the inability to ass=
ign
>> all the interrupts because of
>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D199321 <
>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D199321>. Jim Harris
>> helped fix this issue so I bought several more of these servers, Includi=
ng
>> ones with 4x1.6TB drives=E2=80=A6
>>
>> while the new servers with 4x800GB drives still work, the ones with
>> 4x1.6TB drives do not. When I do a
>>         zpool create tank mirror nvd0 nvd1 mirror nvd2 nvd3
>> the command never returns and the kernel logs:
>>         nvme0: resetting controller
>>         nvme0: controller ready did not become 0 within 2000 ms
>>
>> I=E2=80=99ve tried several different things trying to understand where t=
he actual
>> problem is.
>> WORKS: dd if=3D/dev/nvd0 of=3D/dev/null bs=3D1m
>> WORKS: dd if=3D/dev/zero of=3D/dev/nvd0 bs=3D1m
>> WORKS: newfs /dev/nvd0
>> FAILS: zpool create tank mirror nvd[01]
>> FAILS: gpart add -t freebsd-zfs nvd[01] && zpool create tank mirror
>> nvd[01]p1
>> FAILS: gpart add -t freebsd-zfs -s 1400g nvd[01[ && zpool create tank
>> nvd[01]p1
>> WORKS: gpart add -t freebsd-zfs -s 800g nvd[01] && zpool create tank
>> nvd[01]p1
>>
>> NOTE: The above commands are more about getting the point across, not
>> validity. I wiped the disk clean between gpart attempts and used GPT.
>>
>> So it seems like zpool works if I don=E2=80=99t cross past ~800GB. But o=
ther
>> things like dd and newfs work.
>>
>> When I get the kernel messages about the controller resetting and then
>> not responding, the NVMe subsystem hangs entirely. Since my boot disks a=
re
>> not NVMe, the system continues to work but no more NVMe stuff can be don=
e.
>> Further, attempting to reboot hangs and I have to do a power cycle.
>>
>> Any thoughts on what the deal may be here?
>>
>> 10.2-RELEASE-p5
>>
>> nvme0@pci0:132:0:0:     class=3D0x010802 card=3D0x1f971028 chip=3D0xa820=
144d
>> rev=3D0x03 hdr=3D0x00
>>      vendor     =3D 'Samsung Electronics Co Ltd'
>>      class      =3D mass storage
>>      subclass   =3D NVM
>>
>>
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
>

--089e0115f3c453352f05217373e9
Content-Type: application/octet-stream; name="nvd.patch"
Content-Disposition: attachment; filename="nvd.patch"
Content-Transfer-Encoding: base64
X-Attachment-Id: f_iffo6nt30

ZGlmZiAtLWdpdCBhL3N5cy9kZXYvbnZkL252ZC5jIGIvc3lzL2Rldi9udmQvbnZkLmMKaW5kZXgg
ZDc1MjgzMi4uMzAxNWUzOSAxMDA2NDQKLS0tIGEvc3lzL2Rldi9udmQvbnZkLmMKKysrIGIvc3lz
L2Rldi9udmQvbnZkLmMKQEAgLTMyLDYgKzMyLDcgQEAgX19GQlNESUQoIiRGcmVlQlNEOiByZWxl
bmcvMTAuMi9zeXMvZGV2L252ZC9udmQuYyAyODU5MTkgMjAxNS0wNy0yNyAxNzo1MDowNVogamkK
ICNpbmNsdWRlIDxzeXMva2VybmVsLmg+CiAjaW5jbHVkZSA8c3lzL21hbGxvYy5oPgogI2luY2x1
ZGUgPHN5cy9tb2R1bGUuaD4KKyNpbmNsdWRlIDxzeXMvc3lzY3RsLmg+CiAjaW5jbHVkZSA8c3lz
L3N5c3RtLmg+CiAjaW5jbHVkZSA8c3lzL3Rhc2txdWV1ZS5oPgogCkBAIC04NSw2ICs4NiwxMSBA
QCBzdHJ1Y3QgbnZkX2NvbnRyb2xsZXIgewogc3RhdGljIFRBSUxRX0hFQUQoLCBudmRfY29udHJv
bGxlcikJY3RybHJfaGVhZDsKIHN0YXRpYyBUQUlMUV9IRUFEKGRpc2tfbGlzdCwgbnZkX2Rpc2sp
CWRpc2tfaGVhZDsKIAorc3RhdGljIFNZU0NUTF9OT0RFKF9odywgT0lEX0FVVE8sIG52ZCwgQ1RM
RkxBR19SRCwgMCwgIm52ZCBkcml2ZXIgcGFyYW1ldGVycyIpOworc3RhdGljIHVpbnQ2NF90IG52
ZF9kZWxldGVfbWF4ID0gKDQgKiAxMDI0ICogMTAyNCAqIDEwMjQpOyAgLyogNEdCICovCitTWVND
VExfVVFVQUQoX2h3X252ZCwgT0lEX0FVVE8sIGRlbGV0ZV9tYXgsIENUTEZMQUdfUldUVU4sICZu
dmRfZGVsZXRlX21heCwgMCwKKwkgICAgICJudmQgbWF4aW11bSBCSU9fREVMRVRFIHNpemUiKTsK
Kwogc3RhdGljIGludCBudmRfbW9kZXZlbnQobW9kdWxlX3QgbW9kLCBpbnQgdHlwZSwgdm9pZCAq
YXJnKQogewogCWludCBlcnJvciA9IDA7CkBAIC0yNzksNiArMjg1LDggQEAgbnZkX25ld19kaXNr
KHN0cnVjdCBudm1lX25hbWVzcGFjZSAqbnMsIHZvaWQgKmN0cmxyX2FyZykKIAlkaXNrLT5kX3Nl
Y3RvcnNpemUgPSBudm1lX25zX2dldF9zZWN0b3Jfc2l6ZShucyk7CiAJZGlzay0+ZF9tZWRpYXNp
emUgPSAob2ZmX3QpbnZtZV9uc19nZXRfc2l6ZShucyk7CiAJZGlzay0+ZF9kZWxtYXhzaXplID0g
KG9mZl90KW52bWVfbnNfZ2V0X3NpemUobnMpOworCWlmIChkaXNrLT5kX2RlbG1heHNpemUgPiBu
dmRfZGVsZXRlX21heCkKKwkJZGlzay0+ZF9kZWxtYXhzaXplID0gbnZkX2RlbGV0ZV9tYXg7CiAK
IAlpZiAoVEFJTFFfRU1QVFkoJmRpc2tfaGVhZCkpCiAJCWRpc2stPmRfdW5pdCA9IDA7Cg==
--089e0115f3c453352f05217373e9--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJP=Hc9-oQnk2r48OBXVCQbMDn0URDMDb80a0i0XvUDPuuLkrA>