Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 04 Nov 2021 20:34:27 +0000
From:      bugzilla-noreply@freebsd.org
To:        virtualization@FreeBSD.org
Subject:   [Bug 259651] bhyve process uses all memory/swap
Message-ID:  <bug-259651-27103@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D259651

            Bug ID: 259651
           Summary: bhyve process uses all memory/swap
           Product: Base System
           Version: 12.2-RELEASE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: bhyve
          Assignee: virtualization@FreeBSD.org
          Reporter: reg@FreeBSD.org

I've had a Windows Server running in bhyve on FreeNAS for a few years now. =
 It
uses DFS-R to sync a few windows file systems to my remote backup location.=
=20
The VM has several zvol backed AHCI devices and a virtio network adapter.  =
It
has been running (mostly) stably for a long time with adequate performance =
(as
in, it can mostly saturate the 1GB link it's on and can get disk speeds in =
the
VM which are as fast as I expect from the low power backing store).  Recent=
ly I
made a few changes to the machine and the host, some of which are hard to
reverse, and the VM has started to consume all available RAM, then all the =
swap
and eventually it gets killed by the OOM handler...  A few crashes corrupted
the DFS-R databases, and so now the machine wants to do a huge amount of IO
(both network and disk) to resync (but that's my problem).

There are other reports online of RAM exhaustion from bhyve, but I couldn't
find an open bug, so I'm filing one.  My problem seemed to start on updatin=
g to
TrueNAS-12.0-U5.1, but I also did some other reconfiguration around this ti=
me,
and judging from the other reports, this might be a long-standing issue.

The other change I made was to mess with the CPU/RAM allocation to this VM,=
 and
I accidentally misread the number of the cores as the total number of cores,
not the per CPU cores, so I allocated way more cores as my CPU has threads
(2xCPUs, 2xcores, 2xthreads, 8GB RAM)...  Needless to say, the VM quickly
swamped the host.  However, this also caused the memory use to grow.  I've =
now
scaled the CPUs back to (1xCPU, 1xcore, 2xthreads, 6GB RAM) and the memory =
use
is now staying stable - although it's currently rebuilding some DFS-R datab=
ase
so it's not maxing out the VM CPUs.

The behavior I observe is that the memory use stays stable as long as the h=
ost
CPU use is reasonable.  As soon as the host starts to max out its real cores
(it's a 2xcore, 2xthread CPU) and the bhyve VM is doing a lot of IO, the me=
mory
use grows rapidly.  When the byhe process is stopped (by shutting down the =
VM,
if you can get in quick enough), it takes a very long time to exit and sits=
 in
a 'tx->tx' state.  It looks like it's trying to flush buffers, although the
zpool seems to show only reads while the process is exiting.  My guess as to
the bug is that byhve has a huge amount of outstanding IO, but I'm not sure=
 how
to monitor that.  When the host CPU is really busy these IO buffers are not
being freed properly, and are eventually leaking.

Around the same time as making these changes, I also turned on dedup on one=
 of
the zvols (the backups on that disk are rewritten every day, even though
they're the same, so I was getting a lot of snapshot growth).  I've turned =
that
off, but it didn't seem to change the behavior.  I also added the ZIL and L=
2ARC
devices to the pool around this time.  I've not tried removing them.

The host and the VM have been set up for a long time and working, so I'm go=
ing
to ignore suggestions to get a bigger box or tune my zarc values...  But I'm
happy to debug it - I've been able to reproduce this relatively reliably wi=
th
different CPU settings, although it does rely on Windows cooperating.  I ca=
n't
mess with it too much since I do need to keep the other backups going direc=
tly
via TrueNAS to the other pools going ;-).

TrueNAS Server:
ThinkServer TS140, Intel(R) Core(TM) i3-4130 CPU @ 3.40GHz, 20GB RAM.
zpool: 2 striped mirrored 3TB TOSHIBA HDWD130, with mirrored 12GB ZIL and 1=
9GB
L2ARC on SATA SSDs.
TrueNAS-12.0-U6 (FreeBSD 12.2-RELEASE-p10), 25GB of swap.

Windows Server VM:
2xCPU, 1xcore, 1xthread, 6GB RAM (original, see other comments).
4xAHCI zvol with 64K cluster, one of which had dedup on for a period as an
experiment, 512B blocks. (VM BSODs immediately if I try using virtio-blk).
1xVirtIO NIC (em0), with 0.1.208 virtio-win drivers.
Windows Server 2019, fully patched.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-259651-27103>