Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 10 May 2022 17:49:49 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        Jan Mikkelsen <janm@transactionware.com>, Pete Wright <pete@nomadlogic.org>
Cc:        freebsd-current <freebsd-current@freebsd.org>
Subject:   Re: Chasing OOM Issues - good sysctl metrics to use?
Message-ID:  <D429A8ED-011A-4E67-9726-C49937861CCD@yahoo.com>
In-Reply-To: <C992DE63-AE7B-47F7-B679-B76D480AC0B1@yahoo.com>
References:  <83A713B9-A973-4C97-ACD6-830DF6A50B76.ref@yahoo.com> <83A713B9-A973-4C97-ACD6-830DF6A50B76@yahoo.com> <a5b2e248-3298-80e3-4bb6-742c8431f064@nomadlogic.org> <94B2E2FD-2371-4FEA-8E01-F37103F63CC0@yahoo.com> <0fcb5a4a-5517-e57b-2b69-4f3b3b10589a@nomadlogic.org> <DD98C932-A07F-4097-AE7F-D9CEF0BB6AEE@yahoo.com> <f43d7276-3718-df89-cbf0-5c1ef3d67e77@nomadlogic.org> <f00ccd1f-b6f6-bb00-f0a7-2f760c8953a0@nomadlogic.org> <464ED220-0DE4-4D2F-9DA2-AFD00D8D42B7@yahoo.com> <446d5913-a8c2-7dd0-860b-792fa9fe7c5b@nomadlogic.org> <33B740AA-A431-49CB-9F27-50B8C49734A2@yahoo.com> <3C5C183F-1471-4139-A53C-0B3815CFC25E@yahoo.com> <75C02C8C-6A5E-4E19-AC7D-B5DB704E8F16@transactionware.com> <C992DE63-AE7B-47F7-B679-B76D480AC0B1@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2022-May-10, at 11:49, Mark Millard <marklmi@yahoo.com> wrote:

> On 2022-May-10, at 08:47, Jan Mikkelsen <janm@transactionware.com> =
wrote:
>=20
>> On 10 May 2022, at 10:01, Mark Millard <marklmi@yahoo.com> wrote:
>>>=20
>>> On 2022-Apr-29, at 13:57, Mark Millard <marklmi@yahoo.com> wrote:
>>>=20
>>>> On 2022-Apr-29, at 13:41, Pete Wright <pete@nomadlogic.org> wrote:
>>>>>=20
>>>>>> . . .
>>>>>=20
>>>>> d'oh - went out for lunch and workstation locked up.  i *knew* i =
shouldn't have said anything lol.
>>>>=20
>>>> Any interesting console messages ( or dmesg -a or /var/log/messages =
)?
>>>>=20
>>>=20
>>> I've been doing some testing of a patch by tijl at FreeBSD.org
>>> and have reproduced both hang-ups (ZFS/ARC context) and kills
>>> (UFS/noARC and ZFS/ARC) for "was killed: failed to reclaim
>>> memory", both with and without the patch. This is with only a
>>> tiny fraction of the swap partition(s) enabled being put to
>>> use. So far, the testing was deliberately with
>>> vm.pageout_oom_seq=3D12 (the default value). My testing has been
>>> with main [so: 14].
>>>=20
>>> But I also learned how to avoid the hang-ups that I got --but
>>> it costs making kills more likely/quicker, other things being
>>> equal.
>>>=20
>>> I discovered that the hang-ups that I got were from all the
>>> processes that I interact with the system via ending up with
>>> the process's kernel threads swapped out and were not being
>>> swapped in. (including sshd, so no new ssh connections). In
>>> some contexts I only had escaping into the kernel debugger
>>> available, not even ^T would work. Other times ^T did work.
>>>=20
>>> So, when I'm willing to risk kills in order to maintain
>>> the ability to interact normally, I now use in
>>> /etc/sysctl.conf :
>>>=20
>>> vm.swap_enabled=3D0
>>=20
>> I have been looking at an OOM related issue. Ignoring the actual =
leak, the problem leads to a process being killed because the system was =
out of memory. This is fine. After that, however, the system console was =
black with a single block cursor and the console keyboard was =
unresponsive. Caps lock and num lock didn=E2=80=99t toggle their lights =
when pressed.
>>=20
>> Using an ssh session, the system looked fine. USB events for the =
keyboard being disconnected and reconnected appeared but the keyboard =
stayed unresponsive.
>>=20
>> Setting vm.swap_enabled=3D0, as you did above, resolved this problem. =
After the process was killed a perfectly normal console returned.
>>=20
>> The interesting thing is that this test system is configured with no =
swap space.
>>=20
>> This is on 13.1-RC5.
>>=20
>>> This disables swapping out of process kernel stacks. It
>>> is just with that option removedfor gaining free RAM, there
>>> fewer options tried before a kill is initiated. It is not a
>>> loader-time tunable but is writable, thus the
>>> /etc/sysctl.conf placement.
>>=20
>> Is that really what it does? =46rom a quick look at the code in =
vm/vm_swapout.c, it seems little more complex.
>=20
> I was going by its description:
>=20
> # sysctl -d vm.swap_enabled
> vm.swap_enabled: Enable entire process swapout
>=20
> Based on the below, it appears that the description
> presumes vm.swap_idle_enabled=3D=3D0 (the default). In
> my context vm.swap_idle_enabled=3D=3D0 . Looks like I
> should also list:
>=20
> vm.swap_idle_enabled=3D0
>=20
> in my /etc/sysctl.conf with a reminder comment that the
> pair of =3D0's are required for avoiding the observed
> hang-ups.
>=20
>=20
> The  analysis goes like . . .
>=20
> I see in the code that vm.swap_enabled !=3D0 causes
> VM_SWAP_NORMAL :
>=20
> void
> vm_swapout_run(void)
> {
>=20
>        if (vm_swap_enabled)
>                vm_req_vmdaemon(VM_SWAP_NORMAL);
> }
>=20
> and that in turn leads to vm_daemon to:
>=20
>                if (swapout_flags !=3D 0) {
>                        /*
>                         * Drain the per-CPU page queue batches as a =
deadlock
>                         * avoidance measure.
>                         */
>                        if ((swapout_flags & VM_SWAP_NORMAL) !=3D 0)
>                                vm_page_pqbatch_drain();
>                        swapout_procs(swapout_flags);
>                }
>=20
> Note: vm.swap_idle_enabled=3D=3D0 && vm.swap_enabled=3D=3D0 ends
> up with swapout_flags=3D=3D0. vm.swap_idle. . . defaults seem
> to be (in my context):
>=20
> # sysctl -a | grep swap_idle
> vm.swap_idle_threshold2: 10
> vm.swap_idle_threshold1: 2
> vm.swap_idle_enabled: 0
>=20
> For reference:
>=20
> /*
> * Idle process swapout -- run once per second when pagedaemons are
> * reclaiming pages.
> */
> void
> vm_swapout_run_idle(void)
> {
>        static long lsec;
>=20
>        if (!vm_swap_idle_enabled || time_second =3D=3D lsec)
>                return;
>        vm_req_vmdaemon(VM_SWAP_IDLE);
>        lsec =3D time_second;
> }
>=20
> [So vm.swap_idle_enabled=3D=3D0 avoids VM_SWAP_IDLE status.]
>=20
> static void
> vm_req_vmdaemon(int req)
> {
>        static int lastrun =3D 0;
>=20
>        mtx_lock(&vm_daemon_mtx);
>        vm_pageout_req_swapout |=3D req;
>        if ((ticks > (lastrun + hz)) || (ticks < lastrun)) {
>                wakeup(&vm_daemon_needed);
>                lastrun =3D ticks;
>        }
>        mtx_unlock(&vm_daemon_mtx);
> }
>=20
> [So VM_SWAP_IDLE and VM_SWAP_NORMAL are independent bits
> in vm_pageout_req_swapout.]
>=20
> vm_deamon does:
>=20
>                mtx_lock(&vm_daemon_mtx);
>                msleep(&vm_daemon_needed, &vm_daemon_mtx, PPAUSE, =
"psleep",
>                    vm_daemon_timeout);
>                swapout_flags =3D vm_pageout_req_swapout;
>                vm_pageout_req_swapout =3D 0;
>                mtx_unlock(&vm_daemon_mtx);
>=20
> So vm_pageout_req_swapout is regenerated after thata
> each time.
>=20
> I'll not show the code for vm.swap_idle_enabled!=3D0 .
>=20

Well, with continued experiments I got an example of
a hangup for which looking via the db> prompt did not
show any swapping out of process kernel stacks
( vm.swap_enabled=3D0 was the context, so expected ).
The environment was ZFS (so with ARC).

But this was testing with vm.pageout_oom_seq=3D120 instead
of the default vm.pageout_oom_seq=3D12 . It may be that
let sit long enough things would have unhung (external
perspective).

It is part of what I'm experimenting with so we will see.



=3D=3D=3D
Mark Millard
marklmi at yahoo.com




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?D429A8ED-011A-4E67-9726-C49937861CCD>