Date: Tue, 23 Feb 2021 16:29:46 -0700 From: Alan Somers <asomers@freebsd.org> To: Konstantin Belousov <kostikbel@gmail.com> Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org> Subject: Re: The out-of-swap killer makes poor choices Message-ID: <CAOtMX2i3Njo=KBP=99_G0%2BKuSa00CVgNvacmzhTaoZUYEhwPPA@mail.gmail.com> In-Reply-To: <YDWDZYRHzNsRhLGG@kib.kiev.ua> References: <CAOtMX2jYmrK7ftx62_NEfNCWS7O=giHKL1p9kXCqq1t5E1arxA@mail.gmail.com> <YDVvenUpLMhGoLR4@kib.kiev.ua> <CAOtMX2jeyuM_cEygW=vEjhMSqO1jM2UDs29xnYYkCZN2CLKFxA@mail.gmail.com> <YDWDZYRHzNsRhLGG@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Feb 23, 2021 at 3:36 PM Konstantin Belousov <kostikbel@gmail.com> wrote: > On Tue, Feb 23, 2021 at 02:20:21PM -0700, Alan Somers wrote: > > On Tue, Feb 23, 2021 at 2:11 PM Konstantin Belousov <kostikbel@gmail.com > > > > wrote: > > > > > On Tue, Feb 23, 2021 at 01:49:49PM -0700, Alan Somers wrote: > > > > To me it's always seemed like the out-of-swap killer kills the wrong > > > > process. Oh, it does the right thing with a trivial while(1) > {malloc()} > > > > test program, but not with real workloads. To summarize the logic in > > > > vm_pageout_oom: > > > > > > > > * Don't kill system, protected, or killed processes > > > > * Don't kill processes with a thread that isn't running or suspended > > > > * Kill whichever process is using the most swap or swap + ram, > depending > > > on > > > > the shortage variable. On ties, kill the newest one. > > > > > > > > This algorithm probably made sense in the days when computers had > much > > > more > > > > swap than RAM. But now it leads to several problems: > > > > > > > > * It's almost guaranteed to do the wrong thing when shortage == > > > > VM_OOM_SWAPZ and there is little or no swap configured. If no swap > is > > > > configured, it will kill the newest running or suspended process. > If a > > > > little bit is configured, it will probably kill some idle process, > like > > > > zfsd, that is swapped out because it doesn't run very often. > > > > > > > > * Even if multiple GB of swap are configured, the OOM killer is still > > > > biased towards killing idle processes when shortage == VM_OOM_SWAPZ. > > > Most > > > > often, the process responsible for an out-of-memory condition is not > > > idle, > > > > and is consuming large amounts of RAM. > > > > > > > > * It ignores RLIMIT_RSS. We consider that rlimit when deciding > whether > > > to > > > > move a process from RAM to swap. > > > > > > > > * The "out of swap space" kernel message doesn't specify whether the > > > > process was killed because of insufficient swap or RAM (the shortage > > > > variable) > > > > > > > > I propose the following changes: > > > > > > > > * Incorporate shortage into the "out of swap space" message. > > > ok with me, not sure if users could make any action based on discretion > > > > > > > * When walking the process list, if any process exceeds its > RLIMIT_RSS, > > > > choose it immediately, without bothering to compare it to older > > > processes. > > > RSS was never supposed to be a limit on how many pages are resident. > > > It only provided some preference for more aggressive paging out > process' > > > pages. > > > > > > Or put it differently, RSS is not supposed to be the working set size > > > in VMS/NT sense. > > > > > > > Sure, but given that we must kill _something_, preferentially killing a > > process that was specifically limited sounds better than killing a > process > > that wasn't, won't you agree? > Semantic of RLIMIT_RSS is not to limit, but to give preference for pageout. > Changing it to the semantic of 'preference for OOM' would give the similar > complaint. > > > > > > > > > > > > * Always consider the sum of a process's RAM + swap, regardless of > the > > > > shortage variable. > > > > > > > > Does this make sense? Am I missing something about shortage == > > > > VM_OOM_SWAPZ? I don't understand why you would ever want to exclude > > > > processes' RAM usage. That logic was added in revision > > > > 2025d69ba7a68a5af173007a8072c45ad797ea23, but I don't understand the > > > > rationale. > > > > > > SWAPZ means that swap zone is exhausted. In this case, killing a > process > > > that does not use swap, would not free any space in the zone. > Similarly, > > > we should select a process with largest swap (== metadata kept in swap > > > zone) > > > use to free something in swap zone. > > > > > > > But killing a process that does not use swap could reduce the need for > more > > swap by other processes. How many cases are there where a process needs > > more SWAP and won't settle for RAM instead? > Both choices are somewhat random. The goal is to get more swap zone slack, > and this is what the code tried to target. > > In fact, if OOM kills largest RAM+swap consumer, then with the small swap > there is huge chance that swap is not freed, and then on the next nearby > pageout attempt some more process would be killed, perhaps innocently. > > OOM purpose is not to smoother operation of over-committed system, but > to have it survive (avoid low resources deadlock) to the state where it > can be examined and possibly corrected. > > > > > > > > > > > In other words, such kill could be not enough and really require more > and > > > more rounds of OOM, esp. on machine with very small swap configured. > > Ok, I'll abandon this idea.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOtMX2i3Njo=KBP=99_G0%2BKuSa00CVgNvacmzhTaoZUYEhwPPA>