Date: Wed, 24 Feb 2021 00:36:21 +0200 From: Konstantin Belousov <kostikbel@gmail.com> To: Alan Somers <asomers@freebsd.org> Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org> Subject: Re: The out-of-swap killer makes poor choices Message-ID: <YDWDZYRHzNsRhLGG@kib.kiev.ua> In-Reply-To: <CAOtMX2jeyuM_cEygW=vEjhMSqO1jM2UDs29xnYYkCZN2CLKFxA@mail.gmail.com> References: <CAOtMX2jYmrK7ftx62_NEfNCWS7O=giHKL1p9kXCqq1t5E1arxA@mail.gmail.com> <YDVvenUpLMhGoLR4@kib.kiev.ua> <CAOtMX2jeyuM_cEygW=vEjhMSqO1jM2UDs29xnYYkCZN2CLKFxA@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Feb 23, 2021 at 02:20:21PM -0700, Alan Somers wrote: > On Tue, Feb 23, 2021 at 2:11 PM Konstantin Belousov <kostikbel@gmail.com> > wrote: > > > On Tue, Feb 23, 2021 at 01:49:49PM -0700, Alan Somers wrote: > > > To me it's always seemed like the out-of-swap killer kills the wrong > > > process. Oh, it does the right thing with a trivial while(1) {malloc()} > > > test program, but not with real workloads. To summarize the logic in > > > vm_pageout_oom: > > > > > > * Don't kill system, protected, or killed processes > > > * Don't kill processes with a thread that isn't running or suspended > > > * Kill whichever process is using the most swap or swap + ram, depending > > on > > > the shortage variable. On ties, kill the newest one. > > > > > > This algorithm probably made sense in the days when computers had much > > more > > > swap than RAM. But now it leads to several problems: > > > > > > * It's almost guaranteed to do the wrong thing when shortage == > > > VM_OOM_SWAPZ and there is little or no swap configured. If no swap is > > > configured, it will kill the newest running or suspended process. If a > > > little bit is configured, it will probably kill some idle process, like > > > zfsd, that is swapped out because it doesn't run very often. > > > > > > * Even if multiple GB of swap are configured, the OOM killer is still > > > biased towards killing idle processes when shortage == VM_OOM_SWAPZ. > > Most > > > often, the process responsible for an out-of-memory condition is not > > idle, > > > and is consuming large amounts of RAM. > > > > > > * It ignores RLIMIT_RSS. We consider that rlimit when deciding whether > > to > > > move a process from RAM to swap. > > > > > > * The "out of swap space" kernel message doesn't specify whether the > > > process was killed because of insufficient swap or RAM (the shortage > > > variable) > > > > > > I propose the following changes: > > > > > > * Incorporate shortage into the "out of swap space" message. > > ok with me, not sure if users could make any action based on discretion > > > > > * When walking the process list, if any process exceeds its RLIMIT_RSS, > > > choose it immediately, without bothering to compare it to older > > processes. > > RSS was never supposed to be a limit on how many pages are resident. > > It only provided some preference for more aggressive paging out process' > > pages. > > > > Or put it differently, RSS is not supposed to be the working set size > > in VMS/NT sense. > > > > Sure, but given that we must kill _something_, preferentially killing a > process that was specifically limited sounds better than killing a process > that wasn't, won't you agree? Semantic of RLIMIT_RSS is not to limit, but to give preference for pageout. Changing it to the semantic of 'preference for OOM' would give the similar complaint. > > > > > > > * Always consider the sum of a process's RAM + swap, regardless of the > > > shortage variable. > > > > > > Does this make sense? Am I missing something about shortage == > > > VM_OOM_SWAPZ? I don't understand why you would ever want to exclude > > > processes' RAM usage. That logic was added in revision > > > 2025d69ba7a68a5af173007a8072c45ad797ea23, but I don't understand the > > > rationale. > > > > SWAPZ means that swap zone is exhausted. In this case, killing a process > > that does not use swap, would not free any space in the zone. Similarly, > > we should select a process with largest swap (== metadata kept in swap > > zone) > > use to free something in swap zone. > > > > But killing a process that does not use swap could reduce the need for more > swap by other processes. How many cases are there where a process needs > more SWAP and won't settle for RAM instead? Both choices are somewhat random. The goal is to get more swap zone slack, and this is what the code tried to target. In fact, if OOM kills largest RAM+swap consumer, then with the small swap there is huge chance that swap is not freed, and then on the next nearby pageout attempt some more process would be killed, perhaps innocently. OOM purpose is not to smoother operation of over-committed system, but to have it survive (avoid low resources deadlock) to the state where it can be examined and possibly corrected. > > > > > > In other words, such kill could be not enough and really require more and > > more rounds of OOM, esp. on machine with very small swap configured. > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YDWDZYRHzNsRhLGG>