Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 23 Feb 2021 14:20:21 -0700
From:      Alan Somers <asomers@freebsd.org>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        FreeBSD Hackers <freebsd-hackers@freebsd.org>
Subject:   Re: The out-of-swap killer makes poor choices
Message-ID:  <CAOtMX2jeyuM_cEygW=vEjhMSqO1jM2UDs29xnYYkCZN2CLKFxA@mail.gmail.com>
In-Reply-To: <YDVvenUpLMhGoLR4@kib.kiev.ua>
References:  <CAOtMX2jYmrK7ftx62_NEfNCWS7O=giHKL1p9kXCqq1t5E1arxA@mail.gmail.com> <YDVvenUpLMhGoLR4@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Feb 23, 2021 at 2:11 PM Konstantin Belousov <kostikbel@gmail.com>
wrote:

> On Tue, Feb 23, 2021 at 01:49:49PM -0700, Alan Somers wrote:
> > To me it's always seemed like the out-of-swap killer kills the wrong
> > process.  Oh, it does the right thing with a trivial while(1) {malloc()}
> > test program, but not with real workloads.  To summarize the logic in
> > vm_pageout_oom:
> >
> > * Don't kill system, protected, or killed processes
> > * Don't kill processes with a thread that isn't running or suspended
> > * Kill whichever process is using the most swap or swap + ram, depending
> on
> > the shortage variable.  On ties, kill the newest one.
> >
> > This algorithm probably made sense in the days when computers had much
> more
> > swap than RAM.  But now it leads to several problems:
> >
> > * It's almost guaranteed to do the wrong thing when shortage ==
> > VM_OOM_SWAPZ and there is little or no swap configured.  If no swap is
> > configured, it will kill the newest running or suspended process.  If a
> > little bit is configured, it will probably kill some idle process, like
> > zfsd, that is swapped out because it doesn't run very often.
> >
> > * Even if multiple GB of swap are configured, the OOM killer is still
> > biased towards killing idle processes when shortage == VM_OOM_SWAPZ.
> Most
> > often, the process responsible for an out-of-memory condition is not
> idle,
> > and is consuming large amounts of RAM.
> >
> > * It ignores RLIMIT_RSS.  We consider that rlimit when deciding whether
> to
> > move a process from RAM to swap.
> >
> > * The "out of swap space" kernel message doesn't specify whether the
> > process was killed because of insufficient swap or RAM (the shortage
> > variable)
> >
> > I propose the following changes:
> >
> > * Incorporate shortage into the "out of swap space" message.
> ok with me, not sure if users could make any action based on discretion
>
> > * When walking the process list, if any process exceeds its RLIMIT_RSS,
> > choose it immediately, without bothering to compare it to older
> processes.
> RSS was never supposed to be a limit on how many pages are resident.
> It only provided some preference for more aggressive paging out process'
> pages.
>
> Or put it differently, RSS is not supposed to be the working set size
> in VMS/NT sense.
>

Sure, but given that we must kill _something_, preferentially killing a
process that was specifically limited sounds better than killing a process
that wasn't, won't you agree?


>
> > * Always consider the sum of a process's RAM + swap, regardless of the
> > shortage variable.
> >
> > Does this make sense?  Am I missing something about shortage ==
> > VM_OOM_SWAPZ?  I don't understand why you would ever want to exclude
> > processes' RAM usage.  That logic was added in revision
> > 2025d69ba7a68a5af173007a8072c45ad797ea23, but I don't understand the
> > rationale.
>
> SWAPZ means that swap zone is exhausted.  In this case, killing a process
> that does not use swap, would not free any space in the zone.  Similarly,
> we should select a process with largest swap (== metadata kept in swap
> zone)
> use to free something in swap zone.
>

But killing a process that does not use swap could reduce the need for more
swap by other processes.  How many cases are there where a process needs
more SWAP and won't settle for RAM instead?


>
> In other words, such kill could be not enough and really require more and
> more rounds of OOM, esp. on machine with very small swap configured.
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOtMX2jeyuM_cEygW=vEjhMSqO1jM2UDs29xnYYkCZN2CLKFxA>