Date: Tue, 17 May 2016 11:27:05 +0200 From: Borja Marcos <borjam@sarenet.es> To: Steven Hartland <killing@multiplay.co.uk> Cc: freebsd-stable@freebsd.org Subject: Re: ZFS and NVMe, trim caused stalling Message-ID: <87668F1E-D165-4195-9DB0-4764038FC075@sarenet.es> In-Reply-To: <20a155fd-8695-ca42-6a72-32cb78864a22@multiplay.co.uk> References: <5E710EA5-C9B0-4521-85F1-3FE87555B0AF@bsdimp.com> <BD7424F9-2968-410D-8146-27496054BCFA@sarenet.es> <20a155fd-8695-ca42-6a72-32cb78864a22@multiplay.co.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
> On 17 May 2016, at 11:09, Steven Hartland <killing@multiplay.co.uk> = wrote: >=20 >> I understand that, but I don=E2=80=99t think it=E2=80=99s a good that = ZFS depends blindly on a driver feature such >> as that. Of course, it=E2=80=99s great to exploit it. >>=20 >> I have also noticed that ZFS has a good throttling mechanism for = write operations. A similar >> mechanism should throttle trim requests so that trim requests don=E2=80= =99t clog the whole system. > It already does. I see that there=E2=80=99s a limit to the number of active TRIM = requests, but not an explicit delay such as it=E2=80=99s applied to write requests. So, even with a single = maximum active TRIM request, it seems that TRIM wins.=20 >>=20 >>> I=E2=80=99d be extremely hesitant to tossing away TRIMs. They are = actually quite important for >>> the FTL in the drive=E2=80=99s firmware to proper manage the NAND = wear. More free space always >>> reduces write amplification. It tends to go as 1 / freespace, so = simply dropping them on >>> the floor should be done with great reluctance. >> I understand. I was wondering about choosing the lesser between two = evils. A 15 minute >> I/O stall (I deleted 2 TB of data, that=E2=80=99s a lot, but not so = unrealistic) or settings trims aside >> during the peak activity. >>=20 >> I see that I was wrong on that, as a throttling mechanism would be = more than enough probably, >> unless the system is close to running out of space. >>=20 >> I=E2=80=99ve filed a bug report anyway. And copying to -stable. >>=20 >>=20 >> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D209571 >>=20 > TBH it sounds like you may have badly behaved HW, we've used ZFS + = TRIM and for years on large production boxes and while we're seen slow = down we haven't experienced the total lockups you're describing. I am using ZFS+TRIM on SATA SSD disks for a very long time. Actually, a = single SSD I tried at home can TRIM at around 2 GB/s.=20 Warner Losh told me that the nvd driver is not currently coalescing the = TRIMs, which is a disadvantage compared to the ada driver, which does. > The graphs on you're ticket seem to indicate peak throughput of = 250MB/s which is extremely slow for standard SSD's let alone NVMe ones = and when you add in the fact you have 10 well it seems like something is = VERY wrong. The pool is a raidz2 vdev with 10 P3500 NVMe disks. That graph is the = throughput of just one of the disks (the other 9 graphs are identical). = Bonnie++ reports around 1.7 Gigabytes/s writing =E2=80=9Cintelligently=E2=80=9D, = 1 GB/s =E2=80=9Crewriting=E2=80=9D and almost 2 GB/s =E2=80=9Creading = intelligently=E2=80=9D which, as far as I know, is more or less reasonable. The really slow part are the TRIM requests. When destroying the files = (it=E2=80=99s four concurrent bonnie++ tasks writing a total of 2 = Terabytes)=20 > I just did a quick test on our DB box here creating and then deleting = a 2G file as you describe and I couldn't even spot the delete in the = general noise it was so quick to process and that's a 6 disk machine = with P3700=E2=80=99s. Totalling 2 TB? In my case it was FOUR files, 512 GB each. I=E2=80=99m realy puzzled,=20 Borja.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?87668F1E-D165-4195-9DB0-4764038FC075>