Date: Sat, 23 Apr 2022 10:26:18 -0700 From: Pete Wright <pete@nomadlogic.org> To: Mark Millard <marklmi@yahoo.com> Cc: freebsd-current <freebsd-current@freebsd.org> Subject: Re: Chasing OOM Issues - good sysctl metrics to use? Message-ID: <0fcb5a4a-5517-e57b-2b69-4f3b3b10589a@nomadlogic.org> In-Reply-To: <94B2E2FD-2371-4FEA-8E01-F37103F63CC0@yahoo.com> References: <83A713B9-A973-4C97-ACD6-830DF6A50B76.ref@yahoo.com> <83A713B9-A973-4C97-ACD6-830DF6A50B76@yahoo.com> <a5b2e248-3298-80e3-4bb6-742c8431f064@nomadlogic.org> <94B2E2FD-2371-4FEA-8E01-F37103F63CC0@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 4/22/22 18:46, Mark Millard wrote: > On 2022-Apr-22, at 16:42, Pete Wright <pete@nomadlogic.org> wrote: > >> On 4/21/22 21:18, Mark Millard wrote: >>> Messages in the console out would be appropriate >>> to report. Messages might also be available via >>> the following at appropriate times: >> that is what is frustrating. i will get notification that the processes are killed: >> Apr 22 09:55:15 topanga kernel: pid 76242 (chrome), jid 0, uid 1001, was killed: failed to reclaim memory >> Apr 22 09:55:19 topanga kernel: pid 76288 (chrome), jid 0, uid 1001, was killed: failed to reclaim memory >> Apr 22 09:55:20 topanga kernel: pid 76259 (firefox), jid 0, uid 1001, was killed: failed to reclaim memory >> Apr 22 09:55:22 topanga kernel: pid 76252 (firefox), jid 0, uid 1001, was killed: failed to reclaim memory >> Apr 22 09:55:23 topanga kernel: pid 76267 (firefox), jid 0, uid 1001, was killed: failed to reclaim memory >> Apr 22 09:55:24 topanga kernel: pid 76234 (chrome), jid 0, uid 1001, was killed: failed to reclaim memory >> Apr 22 09:55:26 topanga kernel: pid 76275 (firefox), jid 0, uid 1001, was killed: failed to reclaim memory > Those messages are not reporting being out of swap > as such. They are reporting sustained low free RAM > despite a number of less drastic attempts to gain > back free RAM (to above some threshold). > > FreeBSD does not swap out the kernel stacks for > processes that stay in a runnable state: it just > continues to page. Thus just one large process > that has a huge working set of active pages can > lead to OOM kills in a context were no other set > of processes would be enough to gain the free > RAM required. Such contexts are not really a > swap issue. Thank you for this clarification/explanation - that totally makes sense! > > Based on there being only 1 "killed:" reason, > I have a suggestion that should allow delaying > such kills for a long time. That in turn may > help with investigating without actually > suffering the kills during the activity: more > time with low free RAM to observe. Great idea thank-you! and thanks for the example settings and descriptions as well. > But those are large but finite activities. If > you want to leave something running for days, > weeks, months, or whatever that produces the > sustained low free RAM conditions, the problem > will eventually happen. Ultimately one may have > to exit and restart such processes once and a > while, exiting enough of them to give a little > time with sufficient free RAM. perfect - since this is a workstation my run-time for these processes is probably a week as i update my system and pkgs over the weekend, then dog food current during the work week. >> yes i have a 2GB of swap that resides on a nvme device. > I assume a partition style. Otherwise there are other > issues involved --that likely should be avoided by > switching to partition style. so i kinda lied - initially i had just a 2G swap, but i added a second 20G swap a while ago to have enough space to capture some cores while testing drm-kmod work. based on this comment i am going to only use the 20G file backed swap and see how that goes. this is my fstab entry currently for the file backed swap: md99 none swap sw,file=/root/swap1,late 0 0 > >>> ZFS (so with ARC)? UFS? Both? >> i am using ZFS and am setting my vfs.zfs.arc.max to 10G. i have also experienced this crash with that set to the default unlimited value as well. > I use ZFS on systems with at least 8 GiBytes of RAM, > but I've never tuned ZFS. So I'm not much help for > that side of things. since we started this thread I've gone ahead and removed the zfs.arc.max setting since its cruft at this point. i initially added it to test a configuration i deployed to a sever hosting a bunch of VMs. > I'm hoping that vm.pageout_oom_seq=120 (or more) makes it > so you do not have to have identified everything up front > and can explore easier. > > > Note that vm.pageout_oom_seq is both a loader tunable > and a writeable runtime tunable: > > # sysctl -T vm.pageout_oom_seq > vm.pageout_oom_seq: 120 > amd64_ZFS amd64 1400053 1400053 # sysctl -W vm.pageout_oom_seq > vm.pageout_oom_seq: 120 > > So you can use it to extend the time when the > machine is already running. fantastic. thanks again for taking your time and sharing your knowledge and experience with me Mark! these types of journeys are why i run current on my daily driver, it really helps me better understand the OS so that i can be a better admin on the "real" servers i run for work. its also just fun to learn stuff too heh. -p -- Pete Wright pete@nomadlogic.org @nomadlogicLA
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?0fcb5a4a-5517-e57b-2b69-4f3b3b10589a>