Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 29 Jan 2022 11:23:58 -0800
From:      Mark Millard <marklmi@yahoo.com>
To:        bob prohaska <fbsd@www.zefox.net>
Cc:        Free BSD <freebsd-arm@freebsd.org>
Subject:   Re: devel/llvm13 failed to reclaim memory on 8 GB Pi4 running -current [UFS success context for 4 cores, notes added]
Message-ID:  <E7D83125-1E55-4ABE-9C5E-9AC40501648E@yahoo.com>
In-Reply-To: <BFEC2EC0-3127-49A2-93FD-F059AF7842A7@yahoo.com>
References:  <20220127164512.GA51200@www.zefox.net> <C8BDF77F-5144-4234-A453-8DEC9EA9E227@yahoo.com> <2C7E741F-4703-4E41-93FE-72E1F16B60E2@yahoo.com> <20220127214801.GA51710@www.zefox.net> <5E861D46-128A-4E09-A3CF-736195163B17@yahoo.com> <20220127233048.GA51951@www.zefox.net> <6528ED25-A3C6-4277-B951-1F58ADA2D803@yahoo.com> <10B4E2F0-6219-4674-875F-A7B01CA6671C@yahoo.com> <54CD0806-3902-4B9C-AA30-5ED003DE4D41@yahoo.com> <A4FA4E8B-635B-454E-87D1-C36A84E2C3BA@yahoo.com> <9771EB33-037E-403E-8A77-7E8E98DCF375@yahoo.com> <B12D2AB9-147E-49EF-854F-A3B999ADDECC@yahoo.com> <BA25F969-4DAC-4E5D-88EF-9475139B6B8A@yahoo.com> <6D67BFDF-D786-4BB7-BF2D-CE4D5532D452@yahoo.com> <BFEC2EC0-3127-49A2-93FD-F059AF7842A7@yahoo.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2022-Jan-29, at 03:59, Mark Millard <marklmi@yahoo.com> wrote:

> On 2022-Jan-28, at 19:20, Mark Millard <marklmi@yahoo.com> wrote:
>=20
>> On 2022-Jan-28, at 15:05, Mark Millard <marklmi@yahoo.com> wrote:
>>=20
>>> On 2022-Jan-28, at 00:31, Mark Millard <marklmi@yahoo.com> wrote:
>>>=20
>>>>> . . .
>>>>=20
>>>> UFS context:
>>>>=20
>>>> . . .;  load averages:   . . . MaxObs:   5.47,   4.99,   4.82
>>>> . . . threads:    . . ., 14 MaxObsRunning
>>>> . . .
>>>> Mem: . . ., 6457Mi MaxObsActive, 1263Mi MaxObsWired, 7830Mi =
MaxObs(Act+Wir+Lndry)
>>>> Swap: 8192Mi Total, 8192Mi Used, K Free, 100% Inuse, 8192Mi =
MaxObsUsed, 14758Mi MaxObs(Act+Lndry+SwapUsed), 16017Mi =
MaxObs(Act+Wir+Lndry+SwapUsed)
>>>>=20
>>>>=20
>>>> Console:
>>>>=20
>>>> swap_pager: out of swap space
>>>> swp_pager_getswapspace(4): failed
>>>> swp_pager_getswapspace(1): failed
>>>> swp_pager_getswapspace(1): failed
>>>> swp_pager_getswapspace(2): failed
>>>> swp_pager_getswapspace(2): failed
>>>> swp_pager_getswapspace(4): failed
>>>> swp_pager_getswapspace(1): failed
>>>> swp_pager_getswapspace(9): failed
>>>> swp_pager_getswapspace(4): failed
>>>> swp_pager_getswapspace(7): failed
>>>> swp_pager_getswapspace(29): failed
>>>> swp_pager_getswapspace(9): failed
>>>> swp_pager_getswapspace(1): failed
>>>> swp_pager_getswapspace(2): failed
>>>> swp_pager_getswapspace(1): failed
>>>> swp_pager_getswapspace(4): failed
>>>> swp_pager_getswapspace(1): failed
>>>> swp_pager_getswapspace(10): failed
>>>>=20
>>>> . . . Then some time with no messages . . .
>>>>=20
>>>> vm_pageout_mightbe_oom: kill context: v_free_count: 7740, =
v_inactive_count: 1
>>>> Jan 27 23:01:07 CA72_UFS kernel: pid 57238 (c++), jid 3, uid 0, was =
killed: failed to reclaim memory
>>>> swp_pager_getswapspace(2): failed
>>>>=20
>>>>=20
>>>> Note: The "vm_pageout_mightbe_oom: kill context:"
>>>> notice is one of the few parts of an old reporting
>>>> patch Mark J. had supplied (long ago) that still
>>>> fits in the modern code (or that I was able to keep
>>>> updated enough to fit, anyway). It is another of the
>>>> personal updates that I keep in my source trees,
>>>> such as in /usr/main-src/ .
>>>>=20
>>>> diff --git a/sys/vm/vm_pageout.c b/sys/vm/vm_pageout.c
>>>> index 36d5f3275800..f345e2d4a2d4 100644
>>>> --- a/sys/vm/vm_pageout.c
>>>> +++ b/sys/vm/vm_pageout.c
>>>> @@ -1828,6 +1828,8 @@ vm_pageout_mightbe_oom(struct vm_domain *vmd, =
int page_shortage,
>>>>      * start OOM.  Initiate the selection and signaling of the
>>>>      * victim.
>>>>      */
>>>> +       printf("vm_pageout_mightbe_oom: kill context: v_free_count: =
%u, v_inactive_count: %u\n",
>>>> +          vmd->vmd_free_count, =
vmd->vmd_pagequeues[PQ_INACTIVE].pq_cnt);
>>>>     vm_pageout_oom(VM_OOM_MEM);
>>>>=20
>>>>     /*
>>>>=20
>>>>=20
>>>> Again, I'd used vm.pfault_oom_attempts inappropriately
>>>> for running out of swap (although with UFS it did do
>>>> a kill fairly soon):
>>>>=20
>>>> # Delay when persistent low free RAM leads to
>>>> # Out Of Memory killing of processes:
>>>> vm.pageout_oom_seq=3D120
>>>> #
>>>> # For plunty of swap/paging space (will not
>>>> # run out), avoid pageout delays leading to
>>>> # Out Of Memory killing of processes:
>>>> vm.pfault_oom_attempts=3D-1
>>>> #
>>>> # For possibly insufficient swap/paging space
>>>> # (might run out), increase the pageout delay
>>>> # that leads to Out Of Memory killing of
>>>> # processes (showing defaults at the time):
>>>> #vm.pfault_oom_attempts=3D 3
>>>> #vm.pfault_oom_wait=3D 10
>>>> # (The multiplication is the total but there
>>>> # are other potential tradoffs in the factors
>>>> # multiplied, even for nearly the same total.)
>>>>=20
>>>> I'll change:
>>>>=20
>>>> vm.pfault_oom_attempts
>>>> vm.pfault_oom_wait
>>>>=20
>>>> and reboot --and start the bulk somewhat before
>>>> going to bed.
>>>>=20
>>>>=20
>>>> For reference:
>>>>=20
>>>> [00:02:13] [01] [00:00:00] Building devel/llvm13 | llvm13-13.0.0_3
>>>> [07:37:05] [01] [07:34:52] Finished devel/llvm13 | llvm13-13.0.0_3: =
Failed: build
>>>>=20
>>>>=20
>>>> [ 65% 4728/7265] . . . flang/lib/Evaluate/fold-designator.cpp
>>>> [ 65% 4729/7265] . . . flang/lib/Evaluate/fold-integer.cpp
>>>> FAILED: =
tools/flang/lib/Evaluate/CMakeFiles/obj.FortranEvaluate.dir/fold-integer.c=
pp.o=20
>>>> [ 65% 4729/7265] . . . flang/lib/Evaluate/fold-logical.cpp
>>>> [ 65% 4729/7265] . . . flang/lib/Evaluate/fold-complex.cpp
>>>> [ 65% 4729/7265] . . . flang/lib/Evaluate/fold-real.cpp
>>>>=20
>>>> So the flang/lib/Evaluate/fold-integer.cpp one was the one killed.
>>>>=20
>>>> Notably, the specific sources being compiled are different
>>>> than in the ZFS context report. But this might be because
>>>> of my killing ninja explicitly in the ZFS context, before
>>>> killing the running compilers.
>>>>=20
>>>> Again, using the options to avoid building the Fortran
>>>> compiler probably avoids such memory use --if you do not
>>>> need the Fortran compiler.
>>>=20
>>>=20
>>> UFS based on instead using (not vm.pfault_oom_attempts=3D-1):
>>>=20
>>> vm.pfault_oom_attempts=3D 3
>>> vm.pfault_oom_wait=3D 10
>>>=20
>>> It reached swap-space-full:
>>>=20
>>> . . .;  load averages:   . . . MaxObs:   5.42,   4.98,   4.80
>>> . . . threads:    . . ., 11 MaxObsRunning
>>> . . .
>>> Mem: . . ., 6482Mi MaxObsActive, 1275Mi MaxObsWired, 7832Mi =
MaxObs(Act+Wir+Lndry)
>>> Swap: 8192Mi Total, 8192Mi Used, K Free, 100% Inuse, 4096B In, =
81920B Out, 8192Mi MaxObsUsed, 14733Mi MaxObs(Act+Lndry+SwapUsed), =
16007Mi MaxObs(Act+Wir+Lndry+SwapUsed)
>>>=20
>>>=20
>>> swap_pager: out of swap space
>>> swp_pager_getswapspace(5): failed
>>> swp_pager_getswapspace(25): failed
>>> swp_pager_getswapspace(1): failed
>>> swp_pager_getswapspace(31): failed
>>> swp_pager_getswapspace(6): failed
>>> swp_pager_getswapspace(1): failed
>>> swp_pager_getswapspace(25): failed
>>> swp_pager_getswapspace(10): failed
>>> swp_pager_getswapspace(17): failed
>>> swp_pager_getswapspace(27): failed
>>> swp_pager_getswapspace(5): failed
>>> swp_pager_getswapspace(11): failed
>>> swp_pager_getswapspace(9): failed
>>> swp_pager_getswapspace(29): failed
>>> swp_pager_getswapspace(2): failed
>>> swp_pager_getswapspace(1): failed
>>> swp_pager_getswapspace(9): failed
>>> swp_pager_getswapspace(20): failed
>>> swp_pager_getswapspace(4): failed
>>> swp_pager_getswapspace(21): failed
>>> swp_pager_getswapspace(11): failed
>>> swp_pager_getswapspace(2): failed
>>> swp_pager_getswapspace(21): failed
>>> swp_pager_getswapspace(2): failed
>>> swp_pager_getswapspace(1): failed
>>> swp_pager_getswapspace(2): failed
>>> swp_pager_getswapspace(3): failed
>>> swp_pager_getswapspace(3): failed
>>> swp_pager_getswapspace(2): failed
>>> swp_pager_getswapspace(1): failed
>>> swp_pager_getswapspace(20): failed
>>> swp_pager_getswapspace(2): failed
>>> swp_pager_getswapspace(1): failed
>>> swp_pager_getswapspace(16): failed
>>> swp_pager_getswapspace(6): failed
>>> swap_pager: out of swap space
>>> swp_pager_getswapspace(4): failed
>>> swp_pager_getswapspace(9): failed
>>> swp_pager_getswapspace(17): failed
>>> swp_pager_getswapspace(30): failed
>>> swp_pager_getswapspace(1): failed
>>>=20
>>> . . . Then some time with no messages . . .
>>>=20
>>> vm_pageout_mightbe_oom: kill context: v_free_count: 7875, =
v_inactive_count: 1
>>> Jan 28 14:36:44 CA72_UFS kernel: pid 55178 (c++), jid 3, uid 0, was =
killed: failed to reclaim memory
>>> swp_pager_getswapspace(11): failed
>>>=20
>>>=20
>>> So, not all that much different from how the
>>> vm.pfault_oom_attempts=3D-1 example looked.
>>>=20
>>>=20
>>> [00:01:00] [01] [00:00:00] Building devel/llvm13 | llvm13-13.0.0_3
>>> [07:41:39] [01] [07:40:39] Finished devel/llvm13 | llvm13-13.0.0_3: =
Failed: build
>>>=20
>>> Again it killed:
>>>=20
>>> FAILED: =
tools/flang/lib/Evaluate/CMakeFiles/obj.FortranEvaluate.dir/fold-integer.c=
pp.o
>>>=20
>>> So, basically the same stopping area as for the
>>> vm.pfault_oom_attempts=3D-1 example.
>>>=20
>>>=20
>>> I'll set things up for swap totaling to 30 GiBytes, reboot,
>>> and start it again. This will hopefully let me see and
>>> report MaxObs??? figures for a successful build when there
>>> is RAM+SWAP: 38 GiBytes. So: more than 9 GiBytes per compiler
>>> instance (mean).
>>=20
>> The analogous ZFS test with:
>>=20
>> vm.pfault_oom_attempts=3D 3
>> vm.pfault_oom_wait=3D 10
>>=20
>> got:
>>=20
>> . . .;  load averages:   . . . MaxObs:   5.90,   5.07,   4.80
>> . . . threads:    . . ., 11 MaxObsRunning
>> . . .
>> Mem: . . ., 6006Mi MaxObsActive
>> . . .
>> Swap: 8192Mi Total, 8192Mi Used, 32768B Free, 99% Inuse, 28984Ki In, =
4792Ki Out, 8192Mi MaxObsUsed, 14282Mi MaxObs(Act+Lndry+SwapUsed), =
16009Mi MaxObs(Act+Wir+Lndry+SwapUsed)
>>=20
>> (I got that slightly early, before the 100% showed up.)
>>=20
>>=20
>> swap_pager: out of swap space
>> swp_pager_getswapspace(10): failed
>> swp_pager_getswapspace(1): failed
>> swp_pager_getswapspace(4): failed
>> swp_pager_getswapspace(16): failed
>> swp_pager_getswapspace(5): failed
>> swp_pager_getswapspace(2): failed
>> swp_pager_getswapspace(8): failed
>> swp_pager_getswapspace(12): failed
>> swp_pager_getswapspace(1): failed
>> swp_pager_getswapspace(32): failed
>> swp_pager_getswapspace(4): failed
>> swp_pager_getswapspace(9): failed
>> swp_pager_getswapspace(4): failed
>> swp_pager_getswapspace(17): failed
>> swp_pager_getswapspace(21): failed
>> swp_pager_getswapspace(10): failed
>> swp_pager_getswapspace(18): failed
>> swp_pager_getswapspace(6): failed
>> swp_pager_getswapspace(2): failed
>> swp_pager_getswapspace(14): failed
>> swp_pager_getswapspace(1): failed
>> swp_pager_getswapspace(5): failed
>> swp_pager_getswapspace(25): failed
>> swp_pager_getswapspace(12): failed
>> swp_pager_getswapspace(5): failed
>> swp_pager_getswapspace(7): failed
>> swp_pager_getswapspace(10): failed
>> swp_pager_getswapspace(3): failed
>> swp_pager_getswapspace(24): failed
>> swap_pager: out of swap space
>> swp_pager_getswapspace(11): failed
>> swap_pager: out of swap space
>> swp_pager_getswapspace(17): failed
>> swp_pager_getswapspace(5): failed
>> swp_pager_getswapspace(1): failed
>> swp_pager_getswapspace(32): failed
>> swp_pager_getswapspace(15): failed
>> swp_pager_getswapspace(19): failed
>> swp_pager_getswapspace(1): failed
>> swp_pager_getswapspace(25): failed
>> swp_pager_getswapspace(11): failed
>> swp_pager_getswapspace(1): failed
>> swp_pager_getswapspace(15): failed
>> swp_pager_getswapspace(1): failed
>> swp_pager_getswapspace(8): failed
>> swp_pager_getswapspace(31): failed
>> swp_pager_getswapspace(26): failed
>> swp_pager_getswapspace(1): failed
>> swp_pager_getswapspace(20): failed
>> swp_pager_getswapspace(4): failed
>> swp_pager_getswapspace(3): failed
>> swp_pager_getswapspace(3): failed
>> swp_pager_getswapspace(9): failed
>> swp_pager_getswapspace(1): failed
>> swp_pager_getswapspace(15): failed
>> swp_pager_getswapspace(3): failed
>> swp_pager_getswapspace(7): failed
>> swp_pager_getswapspace(8): failed
>> swp_pager_getswapspace(17): failed
>> swp_pager_getswapspace(2): failed
>> swp_pager_getswapspace(10): failed
>> swp_pager_getswapspace(6): failed
>> swp_pager_getswapspace(2): failed
>> swp_pager_getswapspace(11): failed
>> swp_pager_getswapspace(21): failed
>> swp_pager_getswapspace(1): failed
>> swp_pager_getswapspace(9): failed
>> swp_pager_getswapspace(32): failed
>> swp_pager_getswapspace(2): failed
>> swp_pager_getswapspace(32): failed
>> swp_pager_getswapspace(25): failed
>> swp_pager_getswapspace(21): failed
>> swp_pager_getswapspace(22): failed
>> swp_pager_getswapspace(14): failed
>> swp_pager_getswapspace(10): failed
>> swap_pager: out of swap space
>> swp_pager_getswapspace(1): failed
>> swp_pager_getswapspace(28): failed
>> swp_pager_getswapspace(2): failed
>> swp_pager_getswapspace(13): failed
>> swp_pager_getswapspace(3): failed
>> swp_pager_getswapspace(31): failed
>> swp_pager_getswapspace(20): failed
>> swp_pager_getswapspace(2): failed
>> vm_pageout_mightbe_oom: kill context: v_free_count: 8186, =
v_inactive_count: 1
>> Jan 28 18:42:42 CA72_4c8G_ZFS kernel: pid 98734 (c++), jid 3, uid 0, =
was killed: failed to reclaim memory
>>=20
>> [00:00:49] [01] [00:00:00] Building devel/llvm13 | llvm13-13.0.0_3
>> [08:06:09] [01] [08:05:20] Finished devel/llvm13 | llvm13-13.0.0_3: =
Failed: build
>>=20
>> FAILED: =
tools/flang/lib/Evaluate/CMakeFiles/obj.FortranEvaluate.dir/fold-complex.c=
pp.o
>>=20
>> and flang/lib/Evaluate/fold-integer.cpp was one of the compiles going =
on.

The below is about the success case for the 8 GiByte RPi4B:

> Finally, what a successful build of devel/llvm13 on
> UFS was like on the 8 GiByte RPi4B (overclocked,
> USB3 NVMe based SSD):
>=20
> [00:00:57] [01] [00:00:00] Building devel/llvm13 | llvm13-13.0.0_3
> [12:25:40] [01] [12:24:43] Finished devel/llvm13 | llvm13-13.0.0_3: =
Success
>=20
> where its Maximum Observed figures were:
>=20
> . . .;  load averages:   . . . MaxObs:   6.15,   5.71,   5.31
> . . . threads:    . . ., 11 MaxObsRunning
> . . .
> Mem: . . ., 6465Mi MaxObsActive, 1355Mi MaxObsWired, 7832Mi =
MaxObs(Act+Wir+Lndry)
> Swap: . . ., 10429Mi MaxObsUsed, 16799Mi MaxObs(Act+Lndry+SwapUsed), =
18072Mi MaxObs(Act+Wir+Lndry+SwapUsed)
>=20
> But 18072Mi MaxObs(Act+Wir+Lndry+SwapUsed) =3D=3D 17.6484375 GiByte,
> so more than 17.6484375 GiByte for RAM+SWAP, depending on
> how much room for inactive and margin one chooses. Probably
> 20+ GiBytes, so 12+ GiBytes of swap for 8 GiBytes of RAM.
>=20
> (Reminder: maximum of sum <=3D sum of maximums.)

For folks that might read the above without a lot
of prior context . . .

I forgot to mention above that the RPi4B has 4 cores
and the poudriere ALLOW_PARALLEL_JOB=3D meant that
there were 4 jobs (processes) much of the time. (Nightly
cron related activity and made the MaxObs load averages
bigger than the 4.? or 5.? that would otherwise have
showed up.)

Having notably more (or fewer) processes active for the
build need not use RAM+SWAP proportionally overall. The
20+ GiBytes figure for 4 active hardware threads in use
is somewhat context specific. So having 5+ GiBytes of
RAM+SWAP per hardware thread that is to be in use may be
significant overkill when there are notably more
hardware threads involved.


=3D=3D=3D
Mark Millard
marklmi at yahoo.com




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?E7D83125-1E55-4ABE-9C5E-9AC40501648E>