From nobody Thu May 2 14:34:22 2024 X-Original-To: stable@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4VVbzx49vTz5JQKB for ; Thu, 2 May 2024 14:34:37 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-ej1-x632.google.com (mail-ej1-x632.google.com [IPv6:2a00:1450:4864:20::632]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4VVbzw2q7mz4L75 for ; Thu, 2 May 2024 14:34:36 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-ej1-x632.google.com with SMTP id a640c23a62f3a-a595c61553cso205928966b.1 for ; Thu, 02 May 2024 07:34:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20230601.gappssmtp.com; s=20230601; t=1714660474; x=1715265274; darn=freebsd.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=6vum0Fg+BB44P1zj4neZ+AEa+Sfdmg+TqL6+cdoQFrU=; b=llTuo2VXDQz+xuJT9mf6SbYdy0CjwIXc1DS+37/6RWXLJ0uydEXW8EZ6FGWRiwktNw yCWvmz/+0glaisRfZYPsX1w8RXm7lZ5h7Q0OlHV08TbVBv+Zv/boaCtECcFDggy7kLEu +468+N3Du0tw6RrYIjxMn/I6IlpvFLfzlbytESDfrT/SaRlqjbOUop/fMw0TPPLbtk2N VQx3y9/nFCgMD7PcPWkhzOElAxgzc0iYbHEZJaQ3IlbL2SsNW30+PIMF8szxNLzS+mJe yoNP0Gv59WNTEX2DSuR88PrLGaOBaL9xr1+km//I8+ploIyeB7M/zjnzxqpYh/cQWKQU pDeg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714660474; x=1715265274; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=6vum0Fg+BB44P1zj4neZ+AEa+Sfdmg+TqL6+cdoQFrU=; b=U1QuE4q47sD6GGsaSEieUbOYC9+Zjk/2MhbfQAv7geI+oxJJ3c4RzoVWVmPiHpNdwE Ord4jhzhuVAI70evC3a9Y8FVXzurTWluqTlARGxAhOnYtd7RHDIU1SpxbNRiNX+covtq 3+PMC2Duu676sbbzC8w8K+4wjmzcybYyjND2hdcMCjVqkbM9udyYByLRw3JST5rsK19M ezvMUdQfh34FAH7lxWYAoRRlLG7L/ngM0g/4Ss5X9z9nyB6+VJjSYbGhO3te7+u8AW+B +lZeAd0Ga5WUnwDPoan0RHnIgmRCiD6jEPISTG58FzZWgYRtXnRNPw7A/Wf7J+ivVOEC W0Mw== X-Forwarded-Encrypted: i=1; AJvYcCVRyT1Iwu0a+rDiSrF7n9mnCGZHwOhRllpnWLo21MdS5+KaPGXqly2LF96KtYHwwgQQzAL/fGU6mw1vSD0poLHBdK4= X-Gm-Message-State: AOJu0YzF0/JMVjmbXD14PPtCsXJFqUSjCEDD6DqQQi7f+SC4AsdiVMj4 Vl7XDvXRNSq3HaPjiuJqzRNsXi4SdmNB8XjELEVA00PuRjOPw4byWQ6U9EaHg6G2YOo+Cr+AHlX G0k2+u6384PHucTUbkubEXVFn8PbRl2GApm+rKA== X-Google-Smtp-Source: AGHT+IH+Edc9iXw+w21tXkKYgKc9r5bGogAVxVmF4um3gvk4j2CW+U3blMHFSdicW99WKDFBcBYiPNGzV5TG6rTqgII= X-Received: by 2002:a17:906:449:b0:a58:994c:8c6a with SMTP id e9-20020a170906044900b00a58994c8c6amr3858381eja.26.1714660474011; Thu, 02 May 2024 07:34:34 -0700 (PDT) List-Id: Production branch of FreeBSD source code List-Archive: https://lists.freebsd.org/archives/freebsd-stable List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: freebsd-stable@freebsd.org Sender: owner-freebsd-stable@FreeBSD.org MIME-Version: 1.0 References: <5e1b5097-c1c0-4740-a491-63c709d01c25@sentex.net> <67721332-fa1d-4b3c-aa57-64594ad5d77a@shrew.net> <77e203b3-c555-408b-9634-c452cb3a57ac@sentex.net> In-Reply-To: From: Warner Losh Date: Thu, 2 May 2024 08:34:22 -0600 Message-ID: Subject: Re: how to tell if TRIM is working To: mike tancsa Cc: Matthew Grooms , stable@freebsd.org Content-Type: multipart/alternative; boundary="00000000000076dabe06177980f1" X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US] X-Rspamd-Queue-Id: 4VVbzw2q7mz4L75 --00000000000076dabe06177980f1 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, May 2, 2024 at 8:19=E2=80=AFAM mike tancsa wrote: > On 5/2/2024 10:16 AM, Warner Losh wrote: > > > When trims are fast, you want to send them to the drive as soon as you > know the blocks are freed. UFS always does this (if trim is enabled at > all). > ZFS has a lot of knobs to control when / how / if this is done. > > vfs.zfs.vdev.trim_min_active: 1 > vfs.zfs.vdev.trim_max_active: 2 > vfs.zfs.trim.queue_limit: 10 > vfs.zfs.trim.txg_batch: 32 > vfs.zfs.trim.metaslab_skip: 0 > vfs.zfs.trim.extent_bytes_min: 32768 > vfs.zfs.trim.extent_bytes_max: 134217728 > vfs.zfs.l2arc.trim_ahead: 0 > > > I've not tried to tune these in the past, but you can see how they affect= things. > > > Thanks Warner, I will try and play around with these values to see if the= y > impact things. BTW, do you know what / why things would be "skipped" > during trim events ? > > kstat.zfs.zrootoffs.misc.iostats.trim_bytes_failed: 0 > kstat.zfs.zrootoffs.misc.iostats.trim_extents_failed: 0 > kstat.zfs.zrootoffs.misc.iostats.trim_bytes_skipped: 5968330752 > kstat.zfs.zrootoffs.misc.iostats.trim_extents_skipped: 503986 > kstat.zfs.zrootoffs.misc.iostats.trim_bytes_written: 181593186304 > kstat.zfs.zrootoffs.misc.iostats.trim_extents_written: 303115 > A quick look at the code suggests that it is when the extent to be trimmed is smaller than the extent_bytes_min parameter. The minimum seems to be a trade off between too many trims to the drive and making sure that the trims that you do send are maximally effective. By specifying a smaller size, you'll be freeing up more holes in the underlying NAND blocks. In some drives, this triggers more data copying (and more write amp), so you want to set it a bit higher for those. In other drivers, it improves the efficiency of the GC algorithm, allowing each underlying block groomed to recover more space for future writes. In the past, I've found that ZFS' defaults are decent for 2018ish level of SATA SSDs, but a bit too trim avoidy for newer nvme drives, even the cheap consumer ones. Though that's just a coarse generalization from my buildworld workload. Other work loads will have other data patterns, ymmv, so you need to measure it. Another way to get statistics, one that I've not been able to measure a slowdown from, is to enable CAM_IOSCHED_DYNAMIC. Then you get a lot more statistics about the I/Os in the system, including latency measurements. In theory, that also allows one to traffic shape the trims to the drive, but I've had only limited success with that and haven't had the time to make it a lot better. Warner --00000000000076dabe06177980f1 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Thu, May 2, 2024 at 8:19=E2=80=AFA= M mike tancsa <mike@sentex.net>= ; wrote:
=20 =20 =20
On 5/2/2024 10:16 AM, Warner Losh wrote:
=20

When trims are fast, you want to send them to the drive as soon as you
know the blocks are freed. UFS always does this (if trim is enabled at all).
ZFS has a lot of knobs to control when / how / if this is done.

vfs.zfs.vdev.trim_=
min_active: 1
vfs.zfs.vdev.trim_max_active: 2
vfs.zfs.trim.queue_limit: 10
vfs.zfs.trim.txg_batch: 32
vfs.zfs.trim.metaslab_skip: 0
vfs.zfs.trim.extent_bytes_min: 32768
vfs.zfs.trim.extent_bytes_max: 134217728
vfs.zfs.l2arc.trim_ahead: 0

I've not tried=
 to tune these in the past, but you can see how they affect things.

Thanks Warner, I will try and play around with these values to see if they impact things.=C2=A0 BTW, do you know what / why things would be "skipped" during trim events ?

kstat.zfs.zrootoffs.misc.iostats.trim_bytes_failed: 0
kstat.zfs.zrootoffs.misc.iostats.trim_extents_failed: 0
kstat.zfs.zrootoffs.misc.iostats.trim_bytes_skipped: 5968330752
kstat.zfs.zrootoffs.misc.iostats.trim_extents_skipped: 503986
kstat.zfs.zrootoffs.misc.iostats.trim_bytes_written: 181593186304
kstat.zfs.zrootoffs.misc.iostats.trim_extents_written: 303115


A quick look at the code suggests that i= t is when the extent to be trimmed is smaller than the extent_bytes_min par= ameter.

The minimum seems to be a trade off betwee= n too many trims to the drive and making sure that the trims that you do se= nd are maximally effective. By specifying a smaller size, you'll be fre= eing up more holes in the underlying NAND blocks. In some drives, this trig= gers more data copying (and more write amp), so you want to set it a bit hi= gher for those. In other drivers, it improves the efficiency of the GC algo= rithm, allowing each underlying block groomed to recover more space for fut= ure writes. In the past, I've found that ZFS' defaults are decent f= or 2018ish level of SATA SSDs, but a bit too trim avoidy for newer nvme dri= ves, even the cheap consumer ones. Though that's just a coarse generali= zation from my buildworld workload. Other work loads will have other data p= atterns, ymmv, so you need to measure it.

Anot= her way to get statistics, one that I've not been able to measure a slo= wdown from, is to enable CAM_IOSCHED_DYNAMIC. Then you get a lot more stati= stics about the I/Os in the system, including latency measurements. In theo= ry, that also allows one to traffic shape the trims to the drive, but I'= ;ve had only limited success with that and haven't had the time to make= it a lot better.

Warner

--00000000000076dabe06177980f1--