Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 16 Mar 2026 16:41:35 -0400
From:      Garrett Wollman <wollman@bimajority.org>
To:        freebsd-stable@freebsd.org
Subject:   ZFS deadlocks/memory accounting issues
Message-ID:  <27064.27391.224476.910636@hergotha.csail.mit.edu>

index | next in thread | raw e-mail

Since we upgraded to 14.3 last summer, we have been experiencing
numerous memory accounting issues on our NFS servers.  These manifest
as a server *desperate* to free up memory despite having multiple
gigabytes of physical RAM available.  (Some of these machines have 1
TiB of RAM, with more than 64 GiB free, and were swapping and invoking
the OOM-killer.)

I had a server deadlock just now after only three days of uptime with
32 GiB of free memory.  Prior to the crash, about 70 GiB (of 128) was
used by the ARC, of which some 60 GiB was accounted for as
"evictable", and the load was pretty modest.

In DDB on the console, I noted:

  pid  ppid  pgrp   uid  state   wmesg   wchan               cmd
60673 60672  3008     0  D       db->db_ 0xfffff8058173af68  nrpe
60672     1  3008     0  S       wait    0xfffffe031ee41560  nrpe
60670  1186 60670     0  Ds      db->db_ 0xfffff8173309f1e8  sshd-session
60669  1202  1202     0  D       voffloc 0xfffff8024db4966a  perl
60668 60667  3008     0  D       db->db_ 0xfffff8058173af68  nrpe
60667     1  3008     0  S       wait    0xfffffe031ee41000  nrpe
60665 60664  3008     0  D       db->db_ 0xfffff8058173af68  nrpe
60664     1  3008     0  S       wait    0xfffffe031723a5c0  nrpe
60662 60661  3008     0  D       db->db_ 0xfffff8058173af68  nrpe
60661     1  3008     0  S       wait    0xfffffe03172395a0  nrpe
60659 60658  3008     0  D       db->db_ 0xfffff8058173af68  nrpe
60658     1  3008     0  S       wait    0xfffffe0317239040  nrpe
60656 60655  3008     0  D       db->db_ 0xfffff8058173af68  nrpe
60655     1  3008     0  S       wait    0xfffffe0317238ae0  nrpe
60653 60652  3008     0  D       db->db_ 0xfffff8058173af68  nrpe
60652     1  3008     0  S       wait    0xfffffe0317238580  nrpe
60650 60649  3008     0  D       db->db_ 0xfffff8058173af68  nrpe
60649     1  3008     0  S       wait    0xfffffe0317238020  nrpe
60647 60646  3008     0  D       db->db_ 0xfffff8058173af68  nrpe
60646     1  3008     0  S       wait    0xfffffe0317237ac0  nrpe
60644 60643  3008     0  D       db->db_ 0xfffff8058173af68  nrpe
60643     1  3008     0  S       wait    0xfffffe0317237000  nrpe
60641 60640  3008     0  D       db->db_ 0xfffff8058173af68  nrpe
60640     1  3008     0  S       wait    0xfffffe00d3cfa040  nrpe
60638  1202  1202     0  D       voffloc 0xfffff8024db4966a  perl
60637  1186 60637     0  Ds      db->db_ 0xfffff8173309f1e8  sshd-session
60636 60635  3008     0  D       db->db_ 0xfffff8058173af68  nrpe
60635     1  3008     0  S       wait    0xfffffe00d3cf9ae0  nrpe
60633 60632  3008     0  D       db->db_ 0xfffff8058173af68  nrpe
60632     1  3008     0  S       wait    0xfffffe00d3cf9580  nrpe
60630 60629  3008     0  D       db->db_ 0xfffff8058173af68  nrpe
60629     1  3008     0  S       wait    0xfffffe00d3cf9020  nrpe
60627 60626  3008     0  D       db->db_ 0xfffff8058173af68  nrpe
60626     1  3008     0  S       wait    0xfffffe00d3cf8560  nrpe
60624 60623  3008     0  D       db->db_ 0xfffff8058173af68  nrpe
60623     1  3008     0  S       wait    0xfffffe00d3cf8000  nrpe
60621 60620  3008     0  D       db->db_ 0xfffff8058173af68  nrpe
60620     1  3008     0  S       wait    0xfffffe0317188060  nrpe
60618 60617  3008     0  D       db->db_ 0xfffff8058173af68  nrpe
60617     1  3008     0  S       wait    0xfffffe0317187b00  nrpe
60615 60614  3008     0  D       db->db_ 0xfffff8058173af68  nrpe
60614     1  3008     0  S       wait    0xfffffe03171875a0  nrpe
60612 60611  3008     0  D       db->db_ 0xfffff8058173af68  nrpe
60611     1  3008     0  S       wait    0xfffffe0317186ae0  nrpe
60609 60608  3008     0  D       db->db_ 0xfffff8058173af68  nrpe
60608     1  3008     0  S       wait    0xfffffe0317186580  nrpe
60606  1186 60606     0  Ds      db->db_ 0xfffff8173309f1e8  sshd-session
60605 60604  3008     0  D       db->db_ 0xfffff8058173af68  nrpe
60604     1  3008     0  S       wait    0xfffffe0317186020  nrpe
60602 60601  3008     0  D       db->db_ 0xfffff8058173af68  nrpe
60601     1  3008     0  S       wait    0xfffffe0317185ac0  nrpe
60599  1202  1202     0  D       voffloc 0xfffff8024db4966a  perl
60598 60597  3008     0  D       db->db_ 0xfffff8058173af68  nrpe
60597     1  3008     0  S       wait    0xfffffe0317185560  nrpe
60595 60594  3008     0  D       db->db_ 0xfffff8058173af68  nrpe
60594     1  3008     0  S       wait    0xfffffe0317185000  nrpe
60592 60591  3008     0  D       db->db_ 0xfffff8058173af68  nrpe
60591     1  3008     0  S       wait    0xfffffe031724c5c0  nrpe
60589 60588  3008     0  D       db->db_ 0xfffff8058173af68  nrpe
60588     1  3008     0  S       wait    0xfffffe031724c060  nrpe
60586 60585  3008     0  D       db->db_ 0xfffff8058173af68  nrpe
60585     1  3008     0  S       wait    0xfffffe031724b5a0  nrpe
60583 60582  3008     0  D       db->db_ 0xfffff8058173af68  nrpe
60582     1  3008     0  S       wait    0xfffffe031724a580  nrpe
60580 60579  3008     0  D       db->db_ 0xfffff8058173af68  nrpe
60579     1  3008     0  S       wait    0xfffffe031724a020  nrpe
60577  1186 60577     0  Ds      aw.aew_ 0xfffffe0326e5a608  sshd-session
60576 60575  3008     0  D       db->db_ 0xfffff8058173af68  nrpe
60575     1  3008     0  S       wait    0xfffffe0317249560  nrpe
60573  1202  1202     0  D       aw.aew_ 0xfffffe0326df6478  perl
60572 60571  3008     0  D       db->db_ 0xfffff8058173af68  nrpe
60571     1  3008     0  S       wait    0xfffffe0317249000  nrpe
 5015  5010  5015  6263  Ss+     ttyin   0xfffff810aa50a8b0  zsh
 5010  5006  5006  6263  S       select  0xfffff8024ca966c0  sshd-session
 5006  1186  5006     0  Ss      select  0xfffff8024ca984c0  sshd-session
 3008     1  3008     0  Ss      select  0xfffff80209dc98c0  nrpe
 2910     1  2910     0  Ds+     aw.aew_ 0xfffffe03274d66e8  getty

This getty is the one running on the console tty, which was stuck.
Note the wait channel is "aw.aew_cv", which is part of the logic for
evicting buffers from the ARC.  Other threads are waiting for a
dbuf (ZFS disk buffer) object mutex.

I'm currently planning on taking us to 14.4 later this spring, but it
would be nice to know if anyone else has seen this bug or has a fix.
I've tried dropping kern.maxvnodes and increasing
vfs.zfs.arc_free_target, with no change in symptoms.

This particular server is due to be replaced but the new disk array
(which was ordered in January) won't ship until late April per the
vendor.

-GAWollman



home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?27064.27391.224476.910636>