Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 29 Apr 2022 13:57:56 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        Pete Wright <pete@nomadlogic.org>
Cc:        freebsd-current <freebsd-current@freebsd.org>
Subject:   Re: Chasing OOM Issues - good sysctl metrics to use?
Message-ID:  <33B740AA-A431-49CB-9F27-50B8C49734A2@yahoo.com>
In-Reply-To: <446d5913-a8c2-7dd0-860b-792fa9fe7c5b@nomadlogic.org>
References:  <83A713B9-A973-4C97-ACD6-830DF6A50B76.ref@yahoo.com> <83A713B9-A973-4C97-ACD6-830DF6A50B76@yahoo.com> <a5b2e248-3298-80e3-4bb6-742c8431f064@nomadlogic.org> <94B2E2FD-2371-4FEA-8E01-F37103F63CC0@yahoo.com> <0fcb5a4a-5517-e57b-2b69-4f3b3b10589a@nomadlogic.org> <DD98C932-A07F-4097-AE7F-D9CEF0BB6AEE@yahoo.com> <f43d7276-3718-df89-cbf0-5c1ef3d67e77@nomadlogic.org> <f00ccd1f-b6f6-bb00-f0a7-2f760c8953a0@nomadlogic.org> <464ED220-0DE4-4D2F-9DA2-AFD00D8D42B7@yahoo.com> <446d5913-a8c2-7dd0-860b-792fa9fe7c5b@nomadlogic.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2022-Apr-29, at 13:41, Pete Wright <pete@nomadlogic.org> wrote:
>=20
> On 4/29/22 11:38, Mark Millard wrote:
>> On 2022-Apr-29, at 11:08, Pete Wright <pete@nomadlogic.org> wrote:
>>=20
>>> On 4/23/22 19:20, Pete Wright wrote:
>>>>> The developers handbook has a section debugging deadlocks that he
>>>>> referenced in a response to another report (on freebsd-hackers).
>>>>>=20
>>>>> =
https://docs.freebsd.org/en/books/developers-handbook/kerneldebug/#kerneld=
ebug-deadlocks
>>>> d'oh - thanks for the correction!
>>>>=20
>>>> -pete
>>>>=20
>>>>=20
>>> hello, i just wanted to provide an update on this issue.  so the =
good news is that by removing the file backed swap the deadlocks have =
indeed gone away!  thanks for sorting me out on that front Mark!
>> Glad it helped.
>=20
> d'oh - went out for lunch and workstation locked up.  i *knew* i =
shouldn't have said anything lol.

Any interesting console messages ( or dmesg -a or /var/log/messages )?

>>> i still am seeing a memory leak with either firefox or chrome (maybe =
both where they create a voltron of memory leaks?).  this morning =
firefox and chrome had been killed when i first logged in. fortunately =
the system has remained responsive for several hours which was not the =
case previously.
>>>=20
>>> when looking at my metrics i see vm.domain.0.stats.inactive take a =
nose dive from around 9GB to 0 over the course of 1min.  the timing =
seems to align with around the time when firefox crashed, and is =
proceeded by a large spike in vm.domain.0.stats.active from ~1GB to 7GB =
40mins before the apps crashed.  after the binaries were killed memory =
metrics seem to have recovered (laundry size grew, and inactive size =
grew by several gigs for example).
>> Since the form of kill here is tied to sustained low free memory
>> ("failed to reclaim memory"), you might want to report the
>> vm.domain.0.stats.free_count figures from various time frames as
>> well:
>>=20
>> vm.domain.0.stats.free_count: Free pages
>>=20
>> (It seems you are converting pages to byte counts in your report,
>> the units I'm not really worried about so long as they are
>> obvious.)
>>=20
>> There are also figures possibly tied to the handling of the kill
>> activity but some being more like thresholds than usage figures,
>> such as:
>>=20
>> vm.domain.0.stats.free_severe: Severe free pages
>> vm.domain.0.stats.free_min: Minimum free pages
>> vm.domain.0.stats.free_reserved: Reserved free pages
>> vm.domain.0.stats.free_target: Target free pages
>> vm.domain.0.stats.inactive_target: Target inactive pages
> ok thanks Mark, based on this input and the fact i did manage to lock =
up my system, i'm going to get some metrics up on my website and share =
them publicly when i have time.  i'll definitely take you input into =
account when sharing this info.
>=20
>>=20
>> Also, what value were you using for:
>>=20
>> vm.pageout_oom_seq
> $ sysctl vm.pageout_oom_seq
> vm.pageout_oom_seq: 120
> $

Without knowing vm.domain.0.stats.free_count it is hard to
tell, but you might try, say, sysctl vm.pageout_oom_seq=3D12000
in hopes of getting notably more time with the
vm.domain.0.stats.free_count staying small. That may give
you more time to notice the low free RAM (if you are checking
periodically, rather than just waiting for failure to make
it obvious).


=3D=3D=3D
Mark Millard
marklmi at yahoo.com




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?33B740AA-A431-49CB-9F27-50B8C49734A2>