Date: Tue, 23 Feb 2021 13:23:56 -0800 From: Mark Millard <marklmi@yahoo.com> To: asomers@freebsd.org, FreeBSD Hackers <freebsd-hackers@freebsd.org> Subject: Re: The out-of-swap killer makes poor choices Message-ID: <93DA798A-1109-48B0-AD5E-063B5A182BFB@yahoo.com> References: <93DA798A-1109-48B0-AD5E-063B5A182BFB.ref@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Alan Somers asomers at freebsd.org wrote on Tue Feb 23 20:50:02 UTC 2021 : . . . > * The "out of swap space" kernel message doesn't specify whether the > process was killed because of insufficient swap or RAM (the shortage > variable) . . . I'm only dealing with "why" notifications part of the Email. I'm not sure your notes are complete for coverage, although I can not claim to fully understand the implications of the below. So, just some things to consider in that area . . . When I looked at the code I found 4 things that lead to the same "out of swap space" messages for which no "swap_pager_getswapspace(...): failed" seemed to need to be involved: Sustained low free RAM (via stays-runnable processes). A sufficiently delayed pageout. The swap blk uma zone was exhausted. The swap pctrie uma zone was exhausted. (I run a modified kernel that reports messages about which of the 4 initiated the OOM. I depend on the "swap_pager_getswapspace(...): failed" notices to detect actual out of swap/paging space conditions.) The first 2 of the 4 above have some tunables: # # Delay when persistent low free RAM leads to # Out Of Memory killing of processes. The # delay is a count of kernel-attempts to gain # free RAM (so not time units). vm.pageout_oom_seq=3D120 (The default is 12 as far as I know. I systematically use the above value.) NOTE: stable/12 -r351776 got the support for the following: (I've not checked the match to releases.) # # For plunty of swap/paging space (will not # run out), avoid pageout delays leading to # Out Of Memory killing of processes: vm.pfault_oom_attempts=3D-1 (I systematically use the above value but am careful to strongly expect that I'd not actually run out of swap space.) That last has the alternative structure needed for when out of swap is a concern as I understand what I've been told/read (replace ???'s with notation for positive integers): # # For possibly insufficient swap/paging space # (might run out), increase the pageout delay # that leads to Out Of Memory killing of # processes: #vm.pfault_oom_attempts=3D ??? #vm.pfault_oom_wait=3D ??? # (The multiplication of the two values is the # total but there are other potential tradoffs # in the factors multiplied for the same total.) For reference: # sysctl -d vm.pfault_oom_wait vm.pfault_oom_wait: Number of seconds to wait for free pages before = retrying the page fault handler # sysctl -d vm.pfault_oom_attempts vm.pfault_oom_attempts: Number of page allocation attempts in page fault = handler before it triggers OOM handling # sysctl -d vm.pageout_oom_seq vm.pageout_oom_seq: back-to-back calls to oom detector to start OOM I'm definitely not a fan of the misleading "out of swap space" notices. They greatly mislead me until Mark J. got involved and did some basic fixing to my understanding of the context, including pointing out vm.pageout_oom_seq at the time. (vm.pfault_oom_attempts and vm.pfault_oom_wait are from a later discovery.) It seems that the detailed reason for the OOM helps drive the appropriate future configuration choices for avoiding or managing things. In my view the the specific reason should be explicitly reported. If nothing else, it provides context to indicate in a question to the lists about what is then appropriate to do give the occurrence observed. =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?93DA798A-1109-48B0-AD5E-063B5A182BFB>