Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 23 Feb 2021 23:11:22 +0200
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Alan Somers <asomers@freebsd.org>
Cc:        FreeBSD Hackers <freebsd-hackers@freebsd.org>
Subject:   Re: The out-of-swap killer makes poor choices
Message-ID:  <YDVvenUpLMhGoLR4@kib.kiev.ua>
In-Reply-To: <CAOtMX2jYmrK7ftx62_NEfNCWS7O=giHKL1p9kXCqq1t5E1arxA@mail.gmail.com>
References:  <CAOtMX2jYmrK7ftx62_NEfNCWS7O=giHKL1p9kXCqq1t5E1arxA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Feb 23, 2021 at 01:49:49PM -0700, Alan Somers wrote:
> To me it's always seemed like the out-of-swap killer kills the wrong
> process.  Oh, it does the right thing with a trivial while(1) {malloc()}
> test program, but not with real workloads.  To summarize the logic in
> vm_pageout_oom:
> 
> * Don't kill system, protected, or killed processes
> * Don't kill processes with a thread that isn't running or suspended
> * Kill whichever process is using the most swap or swap + ram, depending on
> the shortage variable.  On ties, kill the newest one.
> 
> This algorithm probably made sense in the days when computers had much more
> swap than RAM.  But now it leads to several problems:
> 
> * It's almost guaranteed to do the wrong thing when shortage ==
> VM_OOM_SWAPZ and there is little or no swap configured.  If no swap is
> configured, it will kill the newest running or suspended process.  If a
> little bit is configured, it will probably kill some idle process, like
> zfsd, that is swapped out because it doesn't run very often.
> 
> * Even if multiple GB of swap are configured, the OOM killer is still
> biased towards killing idle processes when shortage == VM_OOM_SWAPZ.  Most
> often, the process responsible for an out-of-memory condition is not idle,
> and is consuming large amounts of RAM.
> 
> * It ignores RLIMIT_RSS.  We consider that rlimit when deciding whether to
> move a process from RAM to swap.
> 
> * The "out of swap space" kernel message doesn't specify whether the
> process was killed because of insufficient swap or RAM (the shortage
> variable)
> 
> I propose the following changes:
> 
> * Incorporate shortage into the "out of swap space" message.
ok with me, not sure if users could make any action based on discretion

> * When walking the process list, if any process exceeds its RLIMIT_RSS,
> choose it immediately, without bothering to compare it to older processes.
RSS was never supposed to be a limit on how many pages are resident.
It only provided some preference for more aggressive paging out process'
pages.

Or put it differently, RSS is not supposed to be the working set size
in VMS/NT sense.

> * Always consider the sum of a process's RAM + swap, regardless of the
> shortage variable.
> 
> Does this make sense?  Am I missing something about shortage ==
> VM_OOM_SWAPZ?  I don't understand why you would ever want to exclude
> processes' RAM usage.  That logic was added in revision
> 2025d69ba7a68a5af173007a8072c45ad797ea23, but I don't understand the
> rationale.

SWAPZ means that swap zone is exhausted.  In this case, killing a process
that does not use swap, would not free any space in the zone.  Similarly,
we should select a process with largest swap (== metadata kept in swap zone)
use to free something in swap zone.

In other words, such kill could be not enough and really require more and
more rounds of OOM, esp. on machine with very small swap configured.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YDVvenUpLMhGoLR4>