Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 14 May 2022 01:09:30 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        Pete Wright <pete@nomadlogic.org>, freebsd-current <freebsd-current@freebsd.org>
Subject:   Re: Chasing OOM Issues - good sysctl metrics to use?
Message-ID:  <8C14A90D-3429-437C-A815-E811B7BFBF05@yahoo.com>
References:  <8C14A90D-3429-437C-A815-E811B7BFBF05.ref@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Pete Wright <pete_at_nomadlogic.org> wrote on
Date: Fri, 13 May 2022 13:43:11 -0700 :

> On 5/11/22 12:52, Mark Millard wrote:
> >
> >
> > Relative to avoiding hang-ups, so far it seems that
> > use of vm.swap_enabled=3D0 with vm.swap_idle_enabled=3D0
> > makes hang-ups less likely/less frequent/harder to
> > produce examples of. But is no guarantee of lack of
> > a hang-up. Its does change the cause of the hang-up
> > (in that it avoids processes with kernel stacks swapped
> > out being involved).
>=20
> thanks for the above analysis Mark.  i am going to test these settings=20=

> out now as i'm still seeing the lockup.
>=20
> this most recent hang-up was using a patch tijl_at_ asked me to test=20=

> (attached to this email), and the default setting of =
vm.pageout_oom_seq:=20
> 12.

I also had been run various tests for tijl_at_ , the same
sort of 'removal of the " + 1" patch'. I had found a basic
way to tell if a fundamental problem was completely
avoided or not, without having to wait long periods of
activity to do so. But that does not mean the test is a
good simulation of your context's sequence that leads to
issues. Nor does it indicate how wide a range of activity
is fairly likely to reach the failing conditions.

You could see how vm.pageout_oom_seq=3D120 does for you with
the patch. I was never patient enough to wait long enough
for this to OOM kill or hang-up in my test context.

I've been reporting the likes of:

# sysctl vm.domain.0.stats # done after the fact
vm.domain.0.stats.inactive_pps: 1037
vm.domain.0.stats.free_severe: 15566
vm.domain.0.stats.free_min: 25759
vm.domain.0.stats.free_reserved: 5374
vm.domain.0.stats.free_target: 86914
vm.domain.0.stats.inactive_target: 130371
vm.domain.0.stats.unswppdpgs: 0
vm.domain.0.stats.unswappable: 0
vm.domain.0.stats.laundpdpgs: 858845
vm.domain.0.stats.laundry: 9
vm.domain.0.stats.inactpdpgs: 1040939
vm.domain.0.stats.inactive: 1063
vm.domain.0.stats.actpdpgs: 407937767
vm.domain.0.stats.active: 1032
vm.domain.0.stats.free_count: 3252526

But I also have a kernel that reports just before
the call that is to cause a OOM kill, ending up
with output like:

vm_pageout_mightbe_oom: kill context: v_free_count: 15306, =
v_inactive_count: 1, v_laundry_count: 64, v_active_count: 3891599
May 11 00:44:11 CA72_Mbin_ZFS kernel: pid 844 (stress), jid 0, uid 0, =
was killed: failed to reclaim memory

(I was testing main [so: 14].) So I report that as well.

Since I was using stress as part of my test context, there
were also lines like:

stress: FAIL: [843] (415) <-- worker 844 got signal 9
stress: WARN: [843] (417) now reaping child worker processes
stress: FAIL: [843] (451) failed run completed in 119s

(tijl_at_ had me add v_laundry_count and v_active_count
to what I've had carried forward since back in 2018 when
Mark J. provided the original extra message.)

Turns out the kernel debugger (db> prompt) can report the
same general sort of figures:

db> show page
vm_cnt.v_free_count: 15577
vm_cnt.v_inactive_count: 1
vm_cnt.v_active_count: 3788852
vm_cnt.v_laundry_count: 0
vm_cnt.v_wire_count: 272395
vm_cnt.v_free_reserved: 5374
vm_cnt.v_free_min: 25759
vm_cnt.v_free_target: 86914
vm_cnt.v_inactive_target: 130371

db> show pageq
pq_free 15577
dom 0 page_cnt 4077116 free 15577 pq_act 3788852 pq_inact 1 pq_laund 0 =
pq_unsw 0

(Note: pq_unsw is a non-swappable count that excludes
the wired count, apparently matching
vm.domain.0.stats.unswappable .)

The above is the most extremely small pq_inact+pq_laund that
I saw at the OOM kill time or during a "hang-up" (what I saw
across example "hang-ups" suggests to me a livelock context,
not a deadlock context).

> interestingly enough with the patch applied i observed a smaller=20
> amount of memory used for laundry as well as less swap space used =
until=20
> right before the crash.

If your logging of values has been made public, I've not
(yet?) looked at it at all.

None of my testing reached a stage of having much swap
space in use. But the test is biased to produce the problems
quickly, rather than to explore a range of ways to reach
conditions with the problem.

I've stopped testing for now and am doing a round of OS
building and upgrading, port (re-)building and installing
and the like, mostly for aarch64 but also for armv7 and
amd64. (This is without the 'remove " + 1"' patch.)

One of the points is to see if I get any evidence of
vm.swap_enabled=3D0 with vm.swap_idle_enabled=3D0 ending up
contributing to any problems in my normal usage. So far: no.
vm.pageout_oom_seq=3D120 is in use for this, my normal
context since sometime in 2018.

=3D=3D=3D
Mark Millard
marklmi at yahoo.com




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?8C14A90D-3429-437C-A815-E811B7BFBF05>