Date: Mon, 16 Mar 2026 15:08:44 -0600 From: Alan Somers <asomers@freebsd.org> To: Garrett Wollman <wollman@bimajority.org> Cc: freebsd-stable@freebsd.org Subject: Re: ZFS deadlocks/memory accounting issues Message-ID: <CAOtMX2gpNQH9hpCnRP%2Bm5kBJDMf5O3MSHgzPVjBKEObyL8bjdw@mail.gmail.com> In-Reply-To: <27064.27391.224476.910636@hergotha.csail.mit.edu>
index | next in thread | previous in thread | raw e-mail
On Mon, Mar 16, 2026 at 2:41 PM Garrett Wollman <wollman@bimajority.org> wrote: > > Since we upgraded to 14.3 last summer, we have been experiencing > numerous memory accounting issues on our NFS servers. These manifest > as a server *desperate* to free up memory despite having multiple > gigabytes of physical RAM available. (Some of these machines have 1 > TiB of RAM, with more than 64 GiB free, and were swapping and invoking > the OOM-killer.) > > I had a server deadlock just now after only three days of uptime with > 32 GiB of free memory. Prior to the crash, about 70 GiB (of 128) was > used by the ARC, of which some 60 GiB was accounted for as > "evictable", and the load was pretty modest. > > In DDB on the console, I noted: > > pid ppid pgrp uid state wmesg wchan cmd > 60673 60672 3008 0 D db->db_ 0xfffff8058173af68 nrpe > 60672 1 3008 0 S wait 0xfffffe031ee41560 nrpe > 60670 1186 60670 0 Ds db->db_ 0xfffff8173309f1e8 sshd-session > 60669 1202 1202 0 D voffloc 0xfffff8024db4966a perl > 60668 60667 3008 0 D db->db_ 0xfffff8058173af68 nrpe > 60667 1 3008 0 S wait 0xfffffe031ee41000 nrpe > 60665 60664 3008 0 D db->db_ 0xfffff8058173af68 nrpe > 60664 1 3008 0 S wait 0xfffffe031723a5c0 nrpe > 60662 60661 3008 0 D db->db_ 0xfffff8058173af68 nrpe > 60661 1 3008 0 S wait 0xfffffe03172395a0 nrpe > 60659 60658 3008 0 D db->db_ 0xfffff8058173af68 nrpe > 60658 1 3008 0 S wait 0xfffffe0317239040 nrpe > 60656 60655 3008 0 D db->db_ 0xfffff8058173af68 nrpe > 60655 1 3008 0 S wait 0xfffffe0317238ae0 nrpe > 60653 60652 3008 0 D db->db_ 0xfffff8058173af68 nrpe > 60652 1 3008 0 S wait 0xfffffe0317238580 nrpe > 60650 60649 3008 0 D db->db_ 0xfffff8058173af68 nrpe > 60649 1 3008 0 S wait 0xfffffe0317238020 nrpe > 60647 60646 3008 0 D db->db_ 0xfffff8058173af68 nrpe > 60646 1 3008 0 S wait 0xfffffe0317237ac0 nrpe > 60644 60643 3008 0 D db->db_ 0xfffff8058173af68 nrpe > 60643 1 3008 0 S wait 0xfffffe0317237000 nrpe > 60641 60640 3008 0 D db->db_ 0xfffff8058173af68 nrpe > 60640 1 3008 0 S wait 0xfffffe00d3cfa040 nrpe > 60638 1202 1202 0 D voffloc 0xfffff8024db4966a perl > 60637 1186 60637 0 Ds db->db_ 0xfffff8173309f1e8 sshd-session > 60636 60635 3008 0 D db->db_ 0xfffff8058173af68 nrpe > 60635 1 3008 0 S wait 0xfffffe00d3cf9ae0 nrpe > 60633 60632 3008 0 D db->db_ 0xfffff8058173af68 nrpe > 60632 1 3008 0 S wait 0xfffffe00d3cf9580 nrpe > 60630 60629 3008 0 D db->db_ 0xfffff8058173af68 nrpe > 60629 1 3008 0 S wait 0xfffffe00d3cf9020 nrpe > 60627 60626 3008 0 D db->db_ 0xfffff8058173af68 nrpe > 60626 1 3008 0 S wait 0xfffffe00d3cf8560 nrpe > 60624 60623 3008 0 D db->db_ 0xfffff8058173af68 nrpe > 60623 1 3008 0 S wait 0xfffffe00d3cf8000 nrpe > 60621 60620 3008 0 D db->db_ 0xfffff8058173af68 nrpe > 60620 1 3008 0 S wait 0xfffffe0317188060 nrpe > 60618 60617 3008 0 D db->db_ 0xfffff8058173af68 nrpe > 60617 1 3008 0 S wait 0xfffffe0317187b00 nrpe > 60615 60614 3008 0 D db->db_ 0xfffff8058173af68 nrpe > 60614 1 3008 0 S wait 0xfffffe03171875a0 nrpe > 60612 60611 3008 0 D db->db_ 0xfffff8058173af68 nrpe > 60611 1 3008 0 S wait 0xfffffe0317186ae0 nrpe > 60609 60608 3008 0 D db->db_ 0xfffff8058173af68 nrpe > 60608 1 3008 0 S wait 0xfffffe0317186580 nrpe > 60606 1186 60606 0 Ds db->db_ 0xfffff8173309f1e8 sshd-session > 60605 60604 3008 0 D db->db_ 0xfffff8058173af68 nrpe > 60604 1 3008 0 S wait 0xfffffe0317186020 nrpe > 60602 60601 3008 0 D db->db_ 0xfffff8058173af68 nrpe > 60601 1 3008 0 S wait 0xfffffe0317185ac0 nrpe > 60599 1202 1202 0 D voffloc 0xfffff8024db4966a perl > 60598 60597 3008 0 D db->db_ 0xfffff8058173af68 nrpe > 60597 1 3008 0 S wait 0xfffffe0317185560 nrpe > 60595 60594 3008 0 D db->db_ 0xfffff8058173af68 nrpe > 60594 1 3008 0 S wait 0xfffffe0317185000 nrpe > 60592 60591 3008 0 D db->db_ 0xfffff8058173af68 nrpe > 60591 1 3008 0 S wait 0xfffffe031724c5c0 nrpe > 60589 60588 3008 0 D db->db_ 0xfffff8058173af68 nrpe > 60588 1 3008 0 S wait 0xfffffe031724c060 nrpe > 60586 60585 3008 0 D db->db_ 0xfffff8058173af68 nrpe > 60585 1 3008 0 S wait 0xfffffe031724b5a0 nrpe > 60583 60582 3008 0 D db->db_ 0xfffff8058173af68 nrpe > 60582 1 3008 0 S wait 0xfffffe031724a580 nrpe > 60580 60579 3008 0 D db->db_ 0xfffff8058173af68 nrpe > 60579 1 3008 0 S wait 0xfffffe031724a020 nrpe > 60577 1186 60577 0 Ds aw.aew_ 0xfffffe0326e5a608 sshd-session > 60576 60575 3008 0 D db->db_ 0xfffff8058173af68 nrpe > 60575 1 3008 0 S wait 0xfffffe0317249560 nrpe > 60573 1202 1202 0 D aw.aew_ 0xfffffe0326df6478 perl > 60572 60571 3008 0 D db->db_ 0xfffff8058173af68 nrpe > 60571 1 3008 0 S wait 0xfffffe0317249000 nrpe > 5015 5010 5015 6263 Ss+ ttyin 0xfffff810aa50a8b0 zsh > 5010 5006 5006 6263 S select 0xfffff8024ca966c0 sshd-session > 5006 1186 5006 0 Ss select 0xfffff8024ca984c0 sshd-session > 3008 1 3008 0 Ss select 0xfffff80209dc98c0 nrpe > 2910 1 2910 0 Ds+ aw.aew_ 0xfffffe03274d66e8 getty > > This getty is the one running on the console tty, which was stuck. > Note the wait channel is "aw.aew_cv", which is part of the logic for > evicting buffers from the ARC. Other threads are waiting for a > dbuf (ZFS disk buffer) object mutex. > > I'm currently planning on taking us to 14.4 later this spring, but it > would be nice to know if anyone else has seen this bug or has a fix. > I've tried dropping kern.maxvnodes and increasing > vfs.zfs.arc_free_target, with no change in symptoms. > > This particular server is due to be replaced but the new disk array > (which was ordered in January) won't ship until late April per the > vendor. > > -GAWollman I once saw a similar bug. In my case I had a process that mmap()ed some very large files on fusefs, consuming lots of inactive pages. And when the system comes under memory pressure, it asks ARC to evict first. So the ARC would end up shrinking down to arc_min every time. In my case, the solution was to set vfs.fusefs.data_cache_mode=0 . I suspect that similar bugs could be possible with UFS or tmpfs, if they have giant files that are mmaped(). A less effective workaround was to set vfs.zfs.arc.min to some reasonable value. That can prevent ARC from shrinking too far. You could try that. Another thing you could try is to run "vmstat -o" when the system is in the problematic state. That will show you which vm objects are using the most inactive pages. Hope this helps, -Alanhome | help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOtMX2gpNQH9hpCnRP%2Bm5kBJDMf5O3MSHgzPVjBKEObyL8bjdw>
