Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 23 Feb 2021 13:23:56 -0800
From:      Mark Millard <marklmi@yahoo.com>
To:        asomers@freebsd.org, FreeBSD Hackers <freebsd-hackers@freebsd.org>
Subject:   Re: The out-of-swap killer makes poor choices
Message-ID:  <93DA798A-1109-48B0-AD5E-063B5A182BFB@yahoo.com>
References:  <93DA798A-1109-48B0-AD5E-063B5A182BFB.ref@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Alan Somers asomers at freebsd.org wrote on
Tue Feb 23 20:50:02 UTC 2021 :

. . .
> * The "out of swap space" kernel message doesn't specify whether the
> process was killed because of insufficient swap or RAM (the shortage
> variable)
. . .

I'm only dealing with "why" notifications part of
the Email.

I'm not sure your notes are complete for coverage,
although I can not claim to fully understand the
implications of the below. So, just some things to
consider in that area . . .

When I looked at the code I found 4 things that lead
to the same "out of swap space" messages for which no
"swap_pager_getswapspace(...): failed" seemed to need
to be involved:

Sustained low free RAM (via stays-runnable processes).
A sufficiently delayed pageout.
The swap blk uma zone was exhausted.
The swap pctrie uma zone was exhausted.

(I run a modified kernel that reports messages about
which of the 4 initiated the OOM. I depend on the
"swap_pager_getswapspace(...): failed" notices to
detect actual out of swap/paging space conditions.)

The first 2 of the 4 above have some tunables:

#
# Delay when persistent low free RAM leads to
# Out Of Memory killing of processes. The
# delay is a count of kernel-attempts to gain
# free RAM (so not time units).
vm.pageout_oom_seq=3D120

(The default is 12 as far as I know. I systematically
use the above value.)

NOTE: stable/12 -r351776 got the support for the following:
(I've not checked the match to releases.)

#
# For plunty of swap/paging space (will not
# run out), avoid pageout delays leading to
# Out Of Memory killing of processes:
vm.pfault_oom_attempts=3D-1

(I systematically use the above value but am
careful to strongly expect that I'd not actually
run out of swap space.)

That last has the alternative structure needed
for when out of swap is a concern as I understand
what I've been told/read (replace ???'s with
notation for positive integers):

#
# For possibly insufficient swap/paging space
# (might run out), increase the pageout delay
# that leads to Out Of Memory killing of
# processes:
#vm.pfault_oom_attempts=3D ???
#vm.pfault_oom_wait=3D ???
# (The multiplication of the two values is the
# total but there are other potential tradoffs
# in the factors multiplied for the same total.)

For reference:

# sysctl -d vm.pfault_oom_wait
vm.pfault_oom_wait: Number of seconds to wait for free pages before =
retrying the page fault handler

# sysctl -d vm.pfault_oom_attempts
vm.pfault_oom_attempts: Number of page allocation attempts in page fault =
handler before it triggers OOM handling

# sysctl -d vm.pageout_oom_seq
vm.pageout_oom_seq: back-to-back calls to oom detector to start OOM


I'm definitely not a fan of the misleading
"out of swap space" notices. They greatly
mislead me until Mark J. got involved and
did some basic fixing to my understanding
of the context, including  pointing out
vm.pageout_oom_seq at the time.
(vm.pfault_oom_attempts and vm.pfault_oom_wait
are from a later discovery.)

It seems that the detailed reason for the OOM
helps drive the appropriate future configuration
choices for avoiding or managing things. In my
view the the specific reason should be explicitly
reported. If nothing else, it provides context
to indicate in a question to the lists about what
is then appropriate to do give the occurrence
observed.

=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?93DA798A-1109-48B0-AD5E-063B5A182BFB>