Date: Fri, 22 Apr 2022 16:42:42 -0700 From: Pete Wright <pete@nomadlogic.org> To: Mark Millard <marklmi@yahoo.com>, freebsd-current <freebsd-current@freebsd.org> Subject: Re: Chasing OOM Issues - good sysctl metrics to use? Message-ID: <a5b2e248-3298-80e3-4bb6-742c8431f064@nomadlogic.org> In-Reply-To: <83A713B9-A973-4C97-ACD6-830DF6A50B76@yahoo.com> References: <83A713B9-A973-4C97-ACD6-830DF6A50B76.ref@yahoo.com> <83A713B9-A973-4C97-ACD6-830DF6A50B76@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 4/21/22 21:18, Mark Millard wrote: > > Messages in the console out would be appropriate > to report. Messages might also be available via > the following at appropriate times: that is what is frustrating. i will get notification that the processes are killed: Apr 22 09:55:15 topanga kernel: pid 76242 (chrome), jid 0, uid 1001, was killed: failed to reclaim memory Apr 22 09:55:19 topanga kernel: pid 76288 (chrome), jid 0, uid 1001, was killed: failed to reclaim memory Apr 22 09:55:20 topanga kernel: pid 76259 (firefox), jid 0, uid 1001, was killed: failed to reclaim memory Apr 22 09:55:22 topanga kernel: pid 76252 (firefox), jid 0, uid 1001, was killed: failed to reclaim memory Apr 22 09:55:23 topanga kernel: pid 76267 (firefox), jid 0, uid 1001, was killed: failed to reclaim memory Apr 22 09:55:24 topanga kernel: pid 76234 (chrome), jid 0, uid 1001, was killed: failed to reclaim memory Apr 22 09:55:26 topanga kernel: pid 76275 (firefox), jid 0, uid 1001, was killed: failed to reclaim memory the system in this case had killed both firefox and chrome while i was afk. i logged back in and started them up to do more more, then the next logline is from this morning when i had to force power off/on the system as they keyboard and network were both unresponsive: Apr 22 09:58:20 topanga syslogd: kernel boot file is /boot/kernel/kernel > Do you have any swap partitions set up and in use? The > details could be relevant. Do you have swap set up > some other way than via swap partition use? No swap? yes i have a 2GB of swap that resides on a nvme device. > ZFS (so with ARC)? UFS? Both? i am using ZFS and am setting my vfs.zfs.arc.max to 10G. i have also experienced this crash with that set to the default unlimited value as well. > > The first block of lines from a top display could be > relevant, particularly when it is clearly progressing > towards having the problem. (After the problem is too > late.) (I just picked top as a way to get a bunch of > the information all together automatically.) since the initial OOM events happen when i am AFK it is difficult to get relevant stats out of top. this is why i've started collecting more detailed metrics in prometheus. my hope is i'll be able to do a better job observing how my system is behaving over time, in the run up to the OOM event as well as right before and after. there are heaps of metrics collected though so hoping someone can point me in the right direction :) -pete -- Pete Wright pete@nomadlogic.org @nomadlogicLA
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?a5b2e248-3298-80e3-4bb6-742c8431f064>