From owner-freebsd-stable@freebsd.org Tue May 17 09:35:04 2016 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id F319FB3E245 for ; Tue, 17 May 2016 09:35:03 +0000 (UTC) (envelope-from borjam@sarenet.es) Received: from cu1176c.smtpx.saremail.com (cu1176c.smtpx.saremail.com [195.16.148.151]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 845CF1290 for ; Tue, 17 May 2016 09:35:03 +0000 (UTC) (envelope-from borjam@sarenet.es) Received: from [172.16.8.36] (izaro.sarenet.es [192.148.167.11]) by proxypop02.sare.net (Postfix) with ESMTPSA id 8EB109DC9E4; Tue, 17 May 2016 11:27:05 +0200 (CEST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: ZFS and NVMe, trim caused stalling From: Borja Marcos In-Reply-To: <20a155fd-8695-ca42-6a72-32cb78864a22@multiplay.co.uk> Date: Tue, 17 May 2016 11:27:05 +0200 Cc: freebsd-stable@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <87668F1E-D165-4195-9DB0-4764038FC075@sarenet.es> References: <5E710EA5-C9B0-4521-85F1-3FE87555B0AF@bsdimp.com> <20a155fd-8695-ca42-6a72-32cb78864a22@multiplay.co.uk> To: Steven Hartland X-Mailer: Apple Mail (2.3124) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 May 2016 09:35:04 -0000 > On 17 May 2016, at 11:09, Steven Hartland = wrote: >=20 >> I understand that, but I don=E2=80=99t think it=E2=80=99s a good that = ZFS depends blindly on a driver feature such >> as that. Of course, it=E2=80=99s great to exploit it. >>=20 >> I have also noticed that ZFS has a good throttling mechanism for = write operations. A similar >> mechanism should throttle trim requests so that trim requests don=E2=80= =99t clog the whole system. > It already does. I see that there=E2=80=99s a limit to the number of active TRIM = requests, but not an explicit delay such as it=E2=80=99s applied to write requests. So, even with a single = maximum active TRIM request, it seems that TRIM wins.=20 >>=20 >>> I=E2=80=99d be extremely hesitant to tossing away TRIMs. They are = actually quite important for >>> the FTL in the drive=E2=80=99s firmware to proper manage the NAND = wear. More free space always >>> reduces write amplification. It tends to go as 1 / freespace, so = simply dropping them on >>> the floor should be done with great reluctance. >> I understand. I was wondering about choosing the lesser between two = evils. A 15 minute >> I/O stall (I deleted 2 TB of data, that=E2=80=99s a lot, but not so = unrealistic) or settings trims aside >> during the peak activity. >>=20 >> I see that I was wrong on that, as a throttling mechanism would be = more than enough probably, >> unless the system is close to running out of space. >>=20 >> I=E2=80=99ve filed a bug report anyway. And copying to -stable. >>=20 >>=20 >> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D209571 >>=20 > TBH it sounds like you may have badly behaved HW, we've used ZFS + = TRIM and for years on large production boxes and while we're seen slow = down we haven't experienced the total lockups you're describing. I am using ZFS+TRIM on SATA SSD disks for a very long time. Actually, a = single SSD I tried at home can TRIM at around 2 GB/s.=20 Warner Losh told me that the nvd driver is not currently coalescing the = TRIMs, which is a disadvantage compared to the ada driver, which does. > The graphs on you're ticket seem to indicate peak throughput of = 250MB/s which is extremely slow for standard SSD's let alone NVMe ones = and when you add in the fact you have 10 well it seems like something is = VERY wrong. The pool is a raidz2 vdev with 10 P3500 NVMe disks. That graph is the = throughput of just one of the disks (the other 9 graphs are identical). = Bonnie++ reports around 1.7 Gigabytes/s writing =E2=80=9Cintelligently=E2=80=9D, = 1 GB/s =E2=80=9Crewriting=E2=80=9D and almost 2 GB/s =E2=80=9Creading = intelligently=E2=80=9D which, as far as I know, is more or less reasonable. The really slow part are the TRIM requests. When destroying the files = (it=E2=80=99s four concurrent bonnie++ tasks writing a total of 2 = Terabytes)=20 > I just did a quick test on our DB box here creating and then deleting = a 2G file as you describe and I couldn't even spot the delete in the = general noise it was so quick to process and that's a 6 disk machine = with P3700=E2=80=99s. Totalling 2 TB? In my case it was FOUR files, 512 GB each. I=E2=80=99m realy puzzled,=20 Borja.