Date: Fri, 29 Apr 2022 13:57:56 -0700 From: Mark Millard <marklmi@yahoo.com> To: Pete Wright <pete@nomadlogic.org> Cc: freebsd-current <freebsd-current@freebsd.org> Subject: Re: Chasing OOM Issues - good sysctl metrics to use? Message-ID: <33B740AA-A431-49CB-9F27-50B8C49734A2@yahoo.com> In-Reply-To: <446d5913-a8c2-7dd0-860b-792fa9fe7c5b@nomadlogic.org> References: <83A713B9-A973-4C97-ACD6-830DF6A50B76.ref@yahoo.com> <83A713B9-A973-4C97-ACD6-830DF6A50B76@yahoo.com> <a5b2e248-3298-80e3-4bb6-742c8431f064@nomadlogic.org> <94B2E2FD-2371-4FEA-8E01-F37103F63CC0@yahoo.com> <0fcb5a4a-5517-e57b-2b69-4f3b3b10589a@nomadlogic.org> <DD98C932-A07F-4097-AE7F-D9CEF0BB6AEE@yahoo.com> <f43d7276-3718-df89-cbf0-5c1ef3d67e77@nomadlogic.org> <f00ccd1f-b6f6-bb00-f0a7-2f760c8953a0@nomadlogic.org> <464ED220-0DE4-4D2F-9DA2-AFD00D8D42B7@yahoo.com> <446d5913-a8c2-7dd0-860b-792fa9fe7c5b@nomadlogic.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2022-Apr-29, at 13:41, Pete Wright <pete@nomadlogic.org> wrote: >=20 > On 4/29/22 11:38, Mark Millard wrote: >> On 2022-Apr-29, at 11:08, Pete Wright <pete@nomadlogic.org> wrote: >>=20 >>> On 4/23/22 19:20, Pete Wright wrote: >>>>> The developers handbook has a section debugging deadlocks that he >>>>> referenced in a response to another report (on freebsd-hackers). >>>>>=20 >>>>> = https://docs.freebsd.org/en/books/developers-handbook/kerneldebug/#kerneld= ebug-deadlocks >>>> d'oh - thanks for the correction! >>>>=20 >>>> -pete >>>>=20 >>>>=20 >>> hello, i just wanted to provide an update on this issue. so the = good news is that by removing the file backed swap the deadlocks have = indeed gone away! thanks for sorting me out on that front Mark! >> Glad it helped. >=20 > d'oh - went out for lunch and workstation locked up. i *knew* i = shouldn't have said anything lol. Any interesting console messages ( or dmesg -a or /var/log/messages )? >>> i still am seeing a memory leak with either firefox or chrome (maybe = both where they create a voltron of memory leaks?). this morning = firefox and chrome had been killed when i first logged in. fortunately = the system has remained responsive for several hours which was not the = case previously. >>>=20 >>> when looking at my metrics i see vm.domain.0.stats.inactive take a = nose dive from around 9GB to 0 over the course of 1min. the timing = seems to align with around the time when firefox crashed, and is = proceeded by a large spike in vm.domain.0.stats.active from ~1GB to 7GB = 40mins before the apps crashed. after the binaries were killed memory = metrics seem to have recovered (laundry size grew, and inactive size = grew by several gigs for example). >> Since the form of kill here is tied to sustained low free memory >> ("failed to reclaim memory"), you might want to report the >> vm.domain.0.stats.free_count figures from various time frames as >> well: >>=20 >> vm.domain.0.stats.free_count: Free pages >>=20 >> (It seems you are converting pages to byte counts in your report, >> the units I'm not really worried about so long as they are >> obvious.) >>=20 >> There are also figures possibly tied to the handling of the kill >> activity but some being more like thresholds than usage figures, >> such as: >>=20 >> vm.domain.0.stats.free_severe: Severe free pages >> vm.domain.0.stats.free_min: Minimum free pages >> vm.domain.0.stats.free_reserved: Reserved free pages >> vm.domain.0.stats.free_target: Target free pages >> vm.domain.0.stats.inactive_target: Target inactive pages > ok thanks Mark, based on this input and the fact i did manage to lock = up my system, i'm going to get some metrics up on my website and share = them publicly when i have time. i'll definitely take you input into = account when sharing this info. >=20 >>=20 >> Also, what value were you using for: >>=20 >> vm.pageout_oom_seq > $ sysctl vm.pageout_oom_seq > vm.pageout_oom_seq: 120 > $ Without knowing vm.domain.0.stats.free_count it is hard to tell, but you might try, say, sysctl vm.pageout_oom_seq=3D12000 in hopes of getting notably more time with the vm.domain.0.stats.free_count staying small. That may give you more time to notice the low free RAM (if you are checking periodically, rather than just waiting for failure to make it obvious). =3D=3D=3D Mark Millard marklmi at yahoo.com
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?33B740AA-A431-49CB-9F27-50B8C49734A2>
