FreeBSD Mail Archives

Date:      Fri, 29 Apr 2022 11:38:13 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        Pete Wright <pete@nomadlogic.org>
Cc:        freebsd-current <freebsd-current@freebsd.org>
Subject:   Re: Chasing OOM Issues - good sysctl metrics to use?
Message-ID:  <464ED220-0DE4-4D2F-9DA2-AFD00D8D42B7@yahoo.com>
In-Reply-To: <f00ccd1f-b6f6-bb00-f0a7-2f760c8953a0@nomadlogic.org>
References:  <83A713B9-A973-4C97-ACD6-830DF6A50B76.ref@yahoo.com> <83A713B9-A973-4C97-ACD6-830DF6A50B76@yahoo.com> <a5b2e248-3298-80e3-4bb6-742c8431f064@nomadlogic.org> <94B2E2FD-2371-4FEA-8E01-F37103F63CC0@yahoo.com> <0fcb5a4a-5517-e57b-2b69-4f3b3b10589a@nomadlogic.org> <DD98C932-A07F-4097-AE7F-D9CEF0BB6AEE@yahoo.com> <f43d7276-3718-df89-cbf0-5c1ef3d67e77@nomadlogic.org> <f00ccd1f-b6f6-bb00-f0a7-2f760c8953a0@nomadlogic.org>


On 2022-Apr-29, at 11:08, Pete Wright <pete@nomadlogic.org> wrote:

> On 4/23/22 19:20, Pete Wright wrote:
>> 
>>> The developers handbook has a section debugging deadlocks that he
>>> referenced in a response to another report (on freebsd-hackers).
>>> 
>>> https://docs.freebsd.org/en/books/developers-handbook/kerneldebug/#kerneldebug-deadlocks 
>> 
>> d'oh - thanks for the correction!
>> 
>> -pete
>> 
>> 
> 
> hello, i just wanted to provide an update on this issue.  so the good news is that by removing the file backed swap the deadlocks have indeed gone away!  thanks for sorting me out on that front Mark!

Glad it helped.

> i still am seeing a memory leak with either firefox or chrome (maybe both where they create a voltron of memory leaks?).  this morning firefox and chrome had been killed when i first logged in. fortunately the system has remained responsive for several hours which was not the case previously.
> 
> when looking at my metrics i see vm.domain.0.stats.inactive take a nose dive from around 9GB to 0 over the course of 1min.  the timing seems to align with around the time when firefox crashed, and is proceeded by a large spike in vm.domain.0.stats.active from ~1GB to 7GB 40mins before the apps crashed.  after the binaries were killed memory metrics seem to have recovered (laundry size grew, and inactive size grew by several gigs for example).

Since the form of kill here is tied to sustained low free memory
("failed to reclaim memory"), you might want to report the
vm.domain.0.stats.free_count figures from various time frames as
well:

vm.domain.0.stats.free_count: Free pages

(It seems you are converting pages to byte counts in your report,
the units I'm not really worried about so long as they are
obvious.)

There are also figures possibly tied to the handling of the kill
activity but some being more like thresholds than usage figures,
such as:

vm.domain.0.stats.free_severe: Severe free pages
vm.domain.0.stats.free_min: Minimum free pages
vm.domain.0.stats.free_reserved: Reserved free pages
vm.domain.0.stats.free_target: Target free pages
vm.domain.0.stats.inactive_target: Target inactive pages

Also, what value were you using for:

vm.pageout_oom_seq

?

> maybe i'll have to gather data and post it online for anyone who would be interested in seeing this in graph form.  although, frankly i feel like it's a browser problem which i can work around by running them in jails with resource limits in place via rctl.




===
Mark Millard
marklmi at yahoo.com

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?464ED220-0DE4-4D2F-9DA2-AFD00D8D42B7>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation