Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 6 Oct 2015 11:03:43 -0700
From:      Jim Harris <jimharris@freebsd.org>
To:        Steven Hartland <killing@multiplay.co.uk>, Sean Kelly <smkelly@smkelly.org>
Cc:        FreeBSD-STABLE Mailing List <freebsd-stable@freebsd.org>
Subject:   Re: Dell NVMe issues
Message-ID:  <CAJP=Hc9-oQnk2r48OBXVCQbMDn0URDMDb80a0i0XvUDPuuLkrA@mail.gmail.com>
In-Reply-To: <5613FA02.2080205@multiplay.co.uk>
References:  <BC5F191D-FEB2-4ADC-9D6B-240C80B2301C@smkelly.org> <5613FA02.2080205@multiplay.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help

[-- Attachment #1 --]
On Tue, Oct 6, 2015 at 9:42 AM, Steven Hartland <killing@multiplay.co.uk>
wrote:

> Also looks like nvme exposes a timeout_period sysctl you could try
> increasing that as it could be too small for a full disk TRIM.
>

> Under CAM SCSI da support we have a delete_max which limits the max single
> request size for a delete it may be we need something similar for nvme as
> well to prevent this as it should still be chunking the deletes to ensure
> this sort of thing doesn't happen.


See attached.  Sean - can you try this patch with TRIM re-enabled in ZFS?

I would be curious if TRIM passes without this patch if you increase the
timeout_period as suggested.

-Jim




>
>
> On 06/10/2015 16:18, Sean Kelly wrote:
>
>> Back in May, I posted about issues I was having with a Dell PE R630 with
>> 4x800GB NVMe SSDs. I would get kernel panics due to the inability to assign
>> all the interrupts because of
>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199321 <
>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199321>. Jim Harris
>> helped fix this issue so I bought several more of these servers, Including
>> ones with 4x1.6TB drives…
>>
>> while the new servers with 4x800GB drives still work, the ones with
>> 4x1.6TB drives do not. When I do a
>>         zpool create tank mirror nvd0 nvd1 mirror nvd2 nvd3
>> the command never returns and the kernel logs:
>>         nvme0: resetting controller
>>         nvme0: controller ready did not become 0 within 2000 ms
>>
>> I’ve tried several different things trying to understand where the actual
>> problem is.
>> WORKS: dd if=/dev/nvd0 of=/dev/null bs=1m
>> WORKS: dd if=/dev/zero of=/dev/nvd0 bs=1m
>> WORKS: newfs /dev/nvd0
>> FAILS: zpool create tank mirror nvd[01]
>> FAILS: gpart add -t freebsd-zfs nvd[01] && zpool create tank mirror
>> nvd[01]p1
>> FAILS: gpart add -t freebsd-zfs -s 1400g nvd[01[ && zpool create tank
>> nvd[01]p1
>> WORKS: gpart add -t freebsd-zfs -s 800g nvd[01] && zpool create tank
>> nvd[01]p1
>>
>> NOTE: The above commands are more about getting the point across, not
>> validity. I wiped the disk clean between gpart attempts and used GPT.
>>
>> So it seems like zpool works if I don’t cross past ~800GB. But other
>> things like dd and newfs work.
>>
>> When I get the kernel messages about the controller resetting and then
>> not responding, the NVMe subsystem hangs entirely. Since my boot disks are
>> not NVMe, the system continues to work but no more NVMe stuff can be done.
>> Further, attempting to reboot hangs and I have to do a power cycle.
>>
>> Any thoughts on what the deal may be here?
>>
>> 10.2-RELEASE-p5
>>
>> nvme0@pci0:132:0:0:     class=0x010802 card=0x1f971028 chip=0xa820144d
>> rev=0x03 hdr=0x00
>>      vendor     = 'Samsung Electronics Co Ltd'
>>      class      = mass storage
>>      subclass   = NVM
>>
>>
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
>

[-- Attachment #2 --]
diff --git a/sys/dev/nvd/nvd.c b/sys/dev/nvd/nvd.c
index d752832..3015e39 100644
--- a/sys/dev/nvd/nvd.c
+++ b/sys/dev/nvd/nvd.c
@@ -32,6 +32,7 @@ __FBSDID("$FreeBSD: releng/10.2/sys/dev/nvd/nvd.c 285919 2015-07-27 17:50:05Z ji
 #include <sys/kernel.h>
 #include <sys/malloc.h>
 #include <sys/module.h>
+#include <sys/sysctl.h>
 #include <sys/systm.h>
 #include <sys/taskqueue.h>
 
@@ -85,6 +86,11 @@ struct nvd_controller {
 static TAILQ_HEAD(, nvd_controller)	ctrlr_head;
 static TAILQ_HEAD(disk_list, nvd_disk)	disk_head;
 
+static SYSCTL_NODE(_hw, OID_AUTO, nvd, CTLFLAG_RD, 0, "nvd driver parameters");
+static uint64_t nvd_delete_max = (4 * 1024 * 1024 * 1024);  /* 4GB */
+SYSCTL_UQUAD(_hw_nvd, OID_AUTO, delete_max, CTLFLAG_RWTUN, &nvd_delete_max, 0,
+	     "nvd maximum BIO_DELETE size");
+
 static int nvd_modevent(module_t mod, int type, void *arg)
 {
 	int error = 0;
@@ -279,6 +285,8 @@ nvd_new_disk(struct nvme_namespace *ns, void *ctrlr_arg)
 	disk->d_sectorsize = nvme_ns_get_sector_size(ns);
 	disk->d_mediasize = (off_t)nvme_ns_get_size(ns);
 	disk->d_delmaxsize = (off_t)nvme_ns_get_size(ns);
+	if (disk->d_delmaxsize > nvd_delete_max)
+		disk->d_delmaxsize = nvd_delete_max;
 
 	if (TAILQ_EMPTY(&disk_head))
 		disk->d_unit = 0;

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJP=Hc9-oQnk2r48OBXVCQbMDn0URDMDb80a0i0XvUDPuuLkrA>