FreeBSD Mail Archives

Date:      Fri, 22 Apr 2022 18:46:56 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        Pete Wright <pete@nomadlogic.org>
Cc:        freebsd-current <freebsd-current@freebsd.org>
Subject:   Re: Chasing OOM Issues - good sysctl metrics to use?
Message-ID:  <94B2E2FD-2371-4FEA-8E01-F37103F63CC0@yahoo.com>
In-Reply-To: <a5b2e248-3298-80e3-4bb6-742c8431f064@nomadlogic.org>
References:  <83A713B9-A973-4C97-ACD6-830DF6A50B76.ref@yahoo.com> <83A713B9-A973-4C97-ACD6-830DF6A50B76@yahoo.com> <a5b2e248-3298-80e3-4bb6-742c8431f064@nomadlogic.org>

On 2022-Apr-22, at 16:42, Pete Wright <pete@nomadlogic.org> wrote:

> On 4/21/22 21:18, Mark Millard wrote:
>>=20
>> Messages in the console out would be appropriate
>> to report. Messages might also be available via
>> the following at appropriate times:
>=20
> that is what is frustrating.  i will get notification that the =
processes are killed:
> Apr 22 09:55:15 topanga kernel: pid 76242 (chrome), jid 0, uid 1001, =
was killed: failed to reclaim memory
> Apr 22 09:55:19 topanga kernel: pid 76288 (chrome), jid 0, uid 1001, =
was killed: failed to reclaim memory
> Apr 22 09:55:20 topanga kernel: pid 76259 (firefox), jid 0, uid 1001, =
was killed: failed to reclaim memory
> Apr 22 09:55:22 topanga kernel: pid 76252 (firefox), jid 0, uid 1001, =
was killed: failed to reclaim memory
> Apr 22 09:55:23 topanga kernel: pid 76267 (firefox), jid 0, uid 1001, =
was killed: failed to reclaim memory
> Apr 22 09:55:24 topanga kernel: pid 76234 (chrome), jid 0, uid 1001, =
was killed: failed to reclaim memory
> Apr 22 09:55:26 topanga kernel: pid 76275 (firefox), jid 0, uid 1001, =
was killed: failed to reclaim memory

Those messages are not reporting being out of swap
as such. They are reporting sustained low free RAM
despite a number of less drastic attempts to gain
back free RAM (to above some threshold).

FreeBSD does not swap out the kernel stacks for
processes that stay in a runnable state: it just
continues to page. Thus just one large process
that has a huge working set of active pages can
lead to OOM kills in a context were no other set
of processes would be enough to gain the free
RAM required. Such contexts are not really a
swap issue.

Based on there being only 1 "killed:" reason,
I have a suggestion that should allow delaying
such kills for a long time. That in turn may
help with investigating without actually
suffering the kills during the activity: more
time with low free RAM to observe.

Increase:

# sysctl -d vm.pageout_oom_seq
vm.pageout_oom_seq: back-to-back calls to oom detector to start OOM

The default value was 12, last I checked.

My /boot/loader.conf contains the following relative to
that and another type of kill context (just comments
currently for that other type):

#
# Delay when persistent low free RAM leads to
# Out Of Memory killing of processes:
vm.pageout_oom_seq=3D120
#
# For plunty of swap/paging space (will not
# run out), avoid pageout delays leading to
# Out Of Memory killing of processes:
#vm.pfault_oom_attempts=3D-1
#
# For possibly insufficient swap/paging space
# (might run out), increase the pageout delay
# that leads to Out Of Memory killing of
# processes (showing defaults at the time):
#vm.pfault_oom_attempts=3D 3
#vm.pfault_oom_wait=3D 10
# (The multiplication is the total but there
# are other potential tradoffs in the factors
# multiplied, even for nearly the same total.)

There is no value of vm.pageout_oom_seq that
disables the mechanism. But you can set large
values, like I did --or even larger-- to
wait for more attempts to free some RAM before
the kills. Some notes about that follow.

The 120 I use allows even low end arm Small
Board Computers to manage buildworld buildkernel
without such kills. The buildworld buildkernel
completion is sufficient that the low-free-RAM
status is no longer true and the OOM attempts
stop --so the count goes back to 0.

But those are large but finite activities. If
you want to leave something running for days,
weeks, months, or whatever that produces the
sustained low free RAM conditions, the problem
will eventually happen. Ultimately one may have
to exit and restart such processes once and a
while, exiting enough of them to give a little
time with sufficient free RAM.

> the system in this case had killed both firefox and chrome while i was =
afk.  i logged back in and started them up to do more more, then the =
next logline is from this morning when i had to force power off/on the =
system as they keyboard and network were both unresponsive:
>=20
> Apr 22 09:58:20 topanga syslogd: kernel boot file is =
/boot/kernel/kernel
>=20
>> Do you have any swap partitions set up and in use? The
>> details could be relevant. Do you have swap set up
>> some other way than via swap partition use? No swap?
> yes i have a 2GB of swap that resides on a nvme device.

I assume a partition style. Otherwise there are other
issues involved --that likely should be avoided by
switching to partition style.

>> ZFS (so with ARC)? UFS? Both?
>=20
> i am using ZFS and am setting my vfs.zfs.arc.max to 10G.  i have also =
experienced this crash with that set to the default unlimited value as =
well.

I use ZFS on systems with at least 8 GiBytes of RAM,
but I've never tuned ZFS. So I'm not much help for
that side of things.

For systems with under 8 GiBytes of RAM, I uses UFS
unless doing an odd experiment.

>> The first block of lines from a top display could be
>> relevant, particularly when it is clearly progressing
>> towards having the problem. (After the problem is too
>> late.) (I just picked top as a way to get a bunch of
>> the information all together automatically.)
>=20
> since the initial OOM events happen when i am AFK it is difficult to =
get relevant stats out of top.

If you use vm.pageout_oom_seq=3D120 (or more) and check once
and a while, I expect you would end up seeing the activity
in top without suffering a kill in short order. Once noticed,
you could start your investigation, including via top if
desired.

> this is why i've started collecting more detailed metrics in =
prometheus.  my hope is i'll be able to do a better job observing how my =
system is behaving over time, in the run up to the OOM event as well as =
right before and after.  there are heaps of metrics collected though so =
hoping someone can point me in the right direction :)

I'm hoping that vm.pageout_oom_seq=3D120 (or more) makes it
so you do not have to have identified everything up front
and can explore easier.

Note that vm.pageout_oom_seq is both a loader tunable
and a writeable runtime tunable:

# sysctl -T vm.pageout_oom_seq
vm.pageout_oom_seq: 120
amd64_ZFS amd64  1400053 1400053 # sysctl -W vm.pageout_oom_seq
vm.pageout_oom_seq: 120

So you can use it to extend the time when the
machine is already running.

=3D=3D=3D
Mark Millard
marklmi at yahoo.com

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?94B2E2FD-2371-4FEA-8E01-F37103F63CC0>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation