Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 23 Feb 2021 16:29:46 -0700
From:      Alan Somers <asomers@freebsd.org>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        FreeBSD Hackers <freebsd-hackers@freebsd.org>
Subject:   Re: The out-of-swap killer makes poor choices
Message-ID:  <CAOtMX2i3Njo=KBP=99_G0%2BKuSa00CVgNvacmzhTaoZUYEhwPPA@mail.gmail.com>
In-Reply-To: <YDWDZYRHzNsRhLGG@kib.kiev.ua>
References:  <CAOtMX2jYmrK7ftx62_NEfNCWS7O=giHKL1p9kXCqq1t5E1arxA@mail.gmail.com> <YDVvenUpLMhGoLR4@kib.kiev.ua> <CAOtMX2jeyuM_cEygW=vEjhMSqO1jM2UDs29xnYYkCZN2CLKFxA@mail.gmail.com> <YDWDZYRHzNsRhLGG@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Feb 23, 2021 at 3:36 PM Konstantin Belousov <kostikbel@gmail.com>
wrote:

> On Tue, Feb 23, 2021 at 02:20:21PM -0700, Alan Somers wrote:
> > On Tue, Feb 23, 2021 at 2:11 PM Konstantin Belousov <kostikbel@gmail.com
> >
> > wrote:
> >
> > > On Tue, Feb 23, 2021 at 01:49:49PM -0700, Alan Somers wrote:
> > > > To me it's always seemed like the out-of-swap killer kills the wrong
> > > > process.  Oh, it does the right thing with a trivial while(1)
> {malloc()}
> > > > test program, but not with real workloads.  To summarize the logic in
> > > > vm_pageout_oom:
> > > >
> > > > * Don't kill system, protected, or killed processes
> > > > * Don't kill processes with a thread that isn't running or suspended
> > > > * Kill whichever process is using the most swap or swap + ram,
> depending
> > > on
> > > > the shortage variable.  On ties, kill the newest one.
> > > >
> > > > This algorithm probably made sense in the days when computers had
> much
> > > more
> > > > swap than RAM.  But now it leads to several problems:
> > > >
> > > > * It's almost guaranteed to do the wrong thing when shortage ==
> > > > VM_OOM_SWAPZ and there is little or no swap configured.  If no swap
> is
> > > > configured, it will kill the newest running or suspended process.
> If a
> > > > little bit is configured, it will probably kill some idle process,
> like
> > > > zfsd, that is swapped out because it doesn't run very often.
> > > >
> > > > * Even if multiple GB of swap are configured, the OOM killer is still
> > > > biased towards killing idle processes when shortage == VM_OOM_SWAPZ.
> > > Most
> > > > often, the process responsible for an out-of-memory condition is not
> > > idle,
> > > > and is consuming large amounts of RAM.
> > > >
> > > > * It ignores RLIMIT_RSS.  We consider that rlimit when deciding
> whether
> > > to
> > > > move a process from RAM to swap.
> > > >
> > > > * The "out of swap space" kernel message doesn't specify whether the
> > > > process was killed because of insufficient swap or RAM (the shortage
> > > > variable)
> > > >
> > > > I propose the following changes:
> > > >
> > > > * Incorporate shortage into the "out of swap space" message.
> > > ok with me, not sure if users could make any action based on discretion
> > >
> > > > * When walking the process list, if any process exceeds its
> RLIMIT_RSS,
> > > > choose it immediately, without bothering to compare it to older
> > > processes.
> > > RSS was never supposed to be a limit on how many pages are resident.
> > > It only provided some preference for more aggressive paging out
> process'
> > > pages.
> > >
> > > Or put it differently, RSS is not supposed to be the working set size
> > > in VMS/NT sense.
> > >
> >
> > Sure, but given that we must kill _something_, preferentially killing a
> > process that was specifically limited sounds better than killing a
> process
> > that wasn't, won't you agree?
> Semantic of RLIMIT_RSS is not to limit, but to give preference for pageout.
> Changing it to the semantic of 'preference for OOM' would give the similar
> complaint.
>
> >
> >
> > >
> > > > * Always consider the sum of a process's RAM + swap, regardless of
> the
> > > > shortage variable.
> > > >
> > > > Does this make sense?  Am I missing something about shortage ==
> > > > VM_OOM_SWAPZ?  I don't understand why you would ever want to exclude
> > > > processes' RAM usage.  That logic was added in revision
> > > > 2025d69ba7a68a5af173007a8072c45ad797ea23, but I don't understand the
> > > > rationale.
> > >
> > > SWAPZ means that swap zone is exhausted.  In this case, killing a
> process
> > > that does not use swap, would not free any space in the zone.
> Similarly,
> > > we should select a process with largest swap (== metadata kept in swap
> > > zone)
> > > use to free something in swap zone.
> > >
> >
> > But killing a process that does not use swap could reduce the need for
> more
> > swap by other processes.  How many cases are there where a process needs
> > more SWAP and won't settle for RAM instead?
> Both choices are somewhat random.  The goal is to get more swap zone slack,
> and this is what the code tried to target.
>
> In fact, if OOM kills largest RAM+swap consumer, then with the small swap
> there is huge chance that swap is not freed, and then on the next nearby
> pageout attempt some more process would be killed, perhaps innocently.
>
> OOM purpose is not to smoother operation of over-committed system, but
> to have it survive (avoid low resources deadlock) to the state where it
> can be examined and possibly corrected.
>
> >
> >
> > >
> > > In other words, such kill could be not enough and really require more
> and
> > > more rounds of OOM, esp. on machine with very small swap configured.
>
>
Ok, I'll abandon this idea.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOtMX2i3Njo=KBP=99_G0%2BKuSa00CVgNvacmzhTaoZUYEhwPPA>