From nobody Wed Oct 22 15:42:11 2025 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4csD1n2Yzxz6ClTJ for ; Wed, 22 Oct 2025 15:42:21 +0000 (UTC) (envelope-from markjdb@gmail.com) Received: from mail-qv1-xf2f.google.com (mail-qv1-xf2f.google.com [IPv6:2607:f8b0:4864:20::f2f]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "WR4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4csD1l6Pv3z3M2H for ; Wed, 22 Oct 2025 15:42:19 +0000 (UTC) (envelope-from markjdb@gmail.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20230601 header.b=cSwDzWwH; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=freebsd.org (policy=none); spf=pass (mx1.freebsd.org: domain of markjdb@gmail.com designates 2607:f8b0:4864:20::f2f as permitted sender) smtp.mailfrom=markjdb@gmail.com Received: by mail-qv1-xf2f.google.com with SMTP id 6a1803df08f44-7ea50f94045so11441206d6.1 for ; Wed, 22 Oct 2025 08:42:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1761147734; x=1761752534; darn=freebsd.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:from:to:cc:subject:date:message-id :reply-to; bh=FPa6m89FiBaNpcPGKlBhk1GGihfJesq3XGhvWNglBqA=; b=cSwDzWwHPto0WrQV6WCLN4/RL8nflaerX9uqpUGB4h0WJJH/PPs4Z1I/jv/gcKvWut cg8g/XajZ1VU2aWkZFeRLnKN1do6XqEiazqUCDJp6RLqQBR9tFoETWXmvEaQsLmEakBA Y5sSfOrFSfpZ30CsVMp0LcqJasLVFxtg0CPN3LyJAmGz4wwDfSGpKSjmG0wYo3jfKyDr osAcpipY1hA2WbhNpGIRhsUjWaiyKyCVhC7b3EVOF/gCSIhj2jqc6S4r1xlN9BhiweDM BPzxYvuWf8ZODVxDNQB+fpouN4AeTTVNJtdzzd74uuWKsWAHE3XbuAa4SjkLNWTOdVxI QfnQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1761147734; x=1761752534; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=FPa6m89FiBaNpcPGKlBhk1GGihfJesq3XGhvWNglBqA=; b=AGvNgs5KNdCaQ4K7W6Wvp7LDmW2cM86VJMz+nK9DsAL1Uiwi5MnlvN1Vj/2b2P1xAa Ay8IftVik5eccMsZc5ECzRk5seKpwgLVBvQdhAec3gBQIZKRMc3xGw3nJy0z/E7nJF2F nqPDeqjhxy8zD8jMeE6z3ezDesFxfWFjIDkeL6RBrOfdIpyqA7L3Tb1z6TbgtV7mktIi oOgR9sLIZ1HxS2QJMe+NDSE61xRZ5dSHFK29rnrm1h3eO3HqC/hlcQUmqyoHWw4Xe9Vq YYqPohtCnuGJk/NZHY+2HFLifhcCfguwIyu5oEiZs/LR+541IaIjzwY8OeefwlRh+qC2 OFtA== X-Gm-Message-State: AOJu0YyhS/Pbr251i68dLRJ62jz6Zy7+EPnT9LqhfdgSSl6a2gW53ckA Waanyf9EOu0NNA8KSFHUFgRQFMlBS+doMs3qXYpVJc4HGj9yjcDD48kexbNDYIuGb8hY5g== X-Gm-Gg: ASbGncuTuTo7uPQ9XzaYO4v3tK2QZsFqMeHaxdag8j7mo4kA/KVAUqewhAqM+/pNN+d TSMDRdePqukheblVDLrq7VZF+yrLudczerJJs2AFf0u2vpxaNq2rx5Atheke/KKcRehZCSeohkb k9swr9Zq9jQ4OjlrSfAsl/XRIdPRizmRLkEkC5S9eWQ7++XDhjs6Q+zQ2bxEeEpwVyItEsgl75R 56UiaEGjdO7qB3XNmDw11xj4RoQaPK5PfRIOyvNeOWJiUAvapXlHgWsEuUfdaJy5neM2demrt3H DPO+EPZi598JLfWnH346jcDsPceNMy5trnsoIbKTu+FsvFNpjb3RusXJQQ3lMulC9mxYoe//hOo 7w8Be6IbM1glrvI3L8yFCp4ghHLiNmoJou8SCbHjhQQZ3m66302sntboC2/nEjANIJjAMzLdp3k 3FvciUI6k= X-Google-Smtp-Source: AGHT+IHTvxravF7mm1gvhVwF1HnD40IyE7w/FLYhkrlgsYtI6p2Ku+s4S68mjrGfPhevCJSWCucD4g== X-Received: by 2002:a05:6214:b61:b0:87c:2470:160e with SMTP id 6a1803df08f44-87dfdbace15mr23025336d6.32.1761147734085; Wed, 22 Oct 2025 08:42:14 -0700 (PDT) Received: from nuc (192-0-220-237.cpe.teksavvy.com. [192.0.220.237]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-87d02d90b05sm90618186d6.63.2025.10.22.08.42.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 22 Oct 2025 08:42:13 -0700 (PDT) Date: Wed, 22 Oct 2025 11:42:11 -0400 From: Mark Johnston To: Rick Macklem Cc: FreeBSD CURRENT , Garrett Wollman , Peter Eriksson , Alexander Motin Subject: Re: RFC: How ZFS handles arc memory use Message-ID: References: List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@FreeBSD.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spamd-Bar: - X-Spamd-Result: default: False [-1.10 / 15.00]; SUSPICIOUS_RECIPS(1.50)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; MID_RHS_NOT_FQDN(0.50)[]; FORGED_SENDER(0.30)[markj@freebsd.org,markjdb@gmail.com]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20230601]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36:c]; MIME_GOOD(-0.10)[text/plain]; DMARC_POLICY_SOFTFAIL(0.10)[freebsd.org : SPF not aligned (relaxed), DKIM not aligned (relaxed),none]; RCVD_TLS_LAST(0.00)[]; TO_DN_ALL(0.00)[]; FREEMAIL_TO(0.00)[gmail.com]; ARC_NA(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::f2f:from]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[gmail.com]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_FIVE(0.00)[5]; TO_MATCH_ENVRCPT_SOME(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[freebsd-current@freebsd.org]; RCVD_COUNT_TWO(0.00)[2]; FROM_NEQ_ENVFROM(0.00)[markj@freebsd.org,markjdb@gmail.com]; DKIM_TRACE(0.00)[gmail.com:+]; MLMMJ_DEST(0.00)[freebsd-current@freebsd.org]; RCVD_VIA_SMTP_AUTH(0.00)[]; TAGGED_RCPT(0.00)[]; MISSING_XM_UA(0.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim] X-Rspamd-Queue-Id: 4csD1l6Pv3z3M2H On Wed, Oct 22, 2025 at 07:34:39AM -0700, Rick Macklem wrote: > Hi, > > A couple of people have reported problems with NFS servers, > where essentially all of the system's memory gets exhausted. > They see the problem on 14.n FreeBSD servers (which use the > newer ZFS code) but not on 13.n servers. > > I am trying to learn how ZFS handles arc memory use to try > and figure out what can be done about this problem. > > I know nothing about ZFS internals or UMA(9) internals, > so I could be way off, but here is what I think is happening. > (Please correct me on this.) > > The L1ARC uses uma_zalloc_arg()/uma_zfree_arg() to allocate > the arc memory. The zones are created using uma_zcreate(), > so they are regular zones. This means the pages are coming > from a slab in a keg, which are wired pages. > > The only time the size of the slab/keg will be reduced by ZFS > is when it calls uma_zone_reclaim(.., UMA_RECLAIM_DRAIN), > which is called by arc_reap_cb(), triggered by arc_reap_cb_check(). > > arc_reap_cb_check() uses arc_available_memory() and triggers > arc_reap_cb() when arc_available_memory() returns a negative > value. > > arc_available_memory() returns a negative value when > zfs_arc_free_target (vfs.zfs.arc.free_target) is greater than freemem. > (By default, zfs_arc_free_target is set to vm_cnt.v_free_taget.) > > Does all of the above sound about right? It's been a while since I've looked, but that sounds roughly correct. Note that the vm_lowmem eventhandler is invoked when fewer than v_free_target pages are available, and this should pressure ZFS into shrinking the ARC. > This leads me to... > - zfs_arc_free_target (vfs.zfs.arc.free_target) needs to be larger > or > - Most of the wired pages in the slab are per-cpu, > so the uma_zone_reclaim() needs to UMA_RECLAIM_DRAIN_CPU > on some systems. (Not the small test systems I have, where I > cannot reproduce the problem.) The number of wired pages belonging to per-CPU caches should be fairly small, since the size of each CPU's cache is bounded to 2*bucket size*ncpu items. For instance, the ZFS ABD chunk zone on my build system has $(sysctl -n vm.uma.abd_chunk.bucket_size) == 220 items per bucket. Each item is a page, so that gives an upper bound of 220*2*32 pages in the ABD zone per-CPU caches. That's about 56MB, which is not a huge amount on this system with 128GB of RAM. > or > - uma_zone_reclaim() needs to be called under other > circumstances. > or > - ??? > > How can you tell if a keg/slab is per-cpu? > (For my simple test system, I only see "UMA Slabs 0:" and > "UMA Slabs 1:". It looks like UMA Slabs 0: is being used for > ZFS arc allocation for this simple test system.) A slab is the backend allocation unit for (most) UMA zones. A keg is a structure which manages slabs. When the frontend needs to allocate a new item, it asks the keg for one; the keg then either returns an item from an existing slab, or allocates a new slab from the VM system. The frontend is a "zone", it employs per-CPU caching to try and make the allocation and free paths cheap and scalable, i.e., in the common case there is no need to acquire any locks. The zone maintains several "buckets" of free items per CPU. When an allocation misses in the per-CPU cache, a per-zone linked list of full buckets is used. If that list is empty, we go to the keg and ask it to give us more items. When a keg allocates a slab, it must also allocate a structure which tracks the state of each item within the subdivided slab. These are the "UMA Slabs" zones you referred to. For some types of items, the slab header can be stored within the slab itself, so no explicit allocation is required. For other cases (including ZFS ABD buffers which are used to populate the ARC), a separate allocation from these zones is required. > Hopefully folk who understand ZFS arc allocation or UMA > can jump in and help out, rick >