Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 3 Apr 2020 15:34:53 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        bob prohaska <fbsd@www.zefox.net>
Cc:        freebsd-arm <freebsd-arm@freebsd.org>
Subject:   Re: Unexpected OOM kill on rpi2 while building world
Message-ID:  <42583CC7-4650-4F17-8E22-78B02CD47832@yahoo.com>
In-Reply-To: <20200403163313.GA33978@www.zefox.net>
References:  <20200402233359.GA31562@www.zefox.net> <2ECB61DA-1DDA-4BDC-9ABF-5051E7388D20@yahoo.com> <131F8442-02E9-4AAF-B15D-827D775170ED@yahoo.com> <16E9257C-D400-4DF7-BE6C-4D1EA2BA1653@yahoo.com> <20200403163313.GA33978@www.zefox.net>

next in thread | previous in thread | raw e-mail | index | archive | help
[Some of this exchange occurred off-list. This
brings it back to the list.]

On 2020-Apr-3, at 09:33, bob prohaska <fbsd at www.zefox.net> wrote:
>=20
> On Thu, Apr 02, 2020 at 07:23:22PM -0700, Mark Millard wrote:
>> [Not sent to the lists.]
>>=20
>> On 2020-Apr-2, at 18:36, Mark Millard <marklmi at yahoo.com> wrote:
>>=20
>>=20
>>> {Sorry for the earlier accidental send before even editing the
>>> text to reply.]
>>>=20
>>>>=20
>>>> On 2020-Apr-2, at 16:33, bob prohaska <fbsd at www.zefox.net> =
wrote:
>>>>=20
>>>> Two installations of the=20
>>>> FreeBSD-12.1-STABLE-arm-armv7-RPI2-20200305-r358659.img
>>>> image have set up and built -j4  world using a single 64GB
>>>> Samsung Evo Plus microSD card with a 2.6 GB swap partition.
>>>> No changes to /boot/loader.conf required.
>>>=20
>>> The following is from an head -r358966 armv7 example of having
>>> one 3072 MiByte swap/paging partition:
>>=20
>> Actually: head -r359427 . I've progressed the FreeBSD
>> version I'm using since the RPi3 results that I'd
>> reported.
>>=20
>> I do not see needing a world wide message for
>> this before the build results are available. But
>> Hopefully, this will help remind me to make the
>> correction then.
>>=20
> Agreed.
>>> QUOTE
>>> warning: total configured swap (786432 pages) exceeds maximum =
recommended amount (468832 pages).
>>> warning: increase kern.maxswzone or reduce amount of swap.
>>> END QUOTE
>>>=20
>>> 468832 pages is between 1831 MiByte and 1832 MiByte.
>>> 2.6 GB is far beyond the recommendation. I've noticed
>>> some variability between armv7 versions for the
>>> recommended figure, but not large differences. So
>>> your context may not be an exact match.
>>>=20
>>> (aarch64 for the same size RAM [1 GiBYte] allows a much
>>> larger swap space without complaint: 3072 MiByte does
>>> not get a complaint on a RPi3 running aarch64 FreeBSD.)
>>>=20
>>> Did you leave things configured such that such a message
>>> was produced on the armv7? What did it say (if produced)?
>>> What was its recommended maximum (translated to, say,
>>> MiBYtes).
>>>=20
> No changes to swap configuration, in the past no problems emerged.
> The warning is:
> warning: total configured swap (675200 pages) exceeds maximum =
recommended amount (312480 pages).

312480 * 4096 / 1024 / 1024 =3D 1220.625 MiByte

This is far less than what head reported as its
recommended maximum for the RPi2 V1.2 using head
armv7 FreeBSD (between 1831 MiByte and 1832
MiByte someplace).

I'll stick in a note here about some context:
buildworld buildkernel takes long enough for
nightly jobs and such to also run with the
build still going on. This can be another
example variability in memory handling that
could contribute to OOM criteria being met.

>>> Going in a different direction . . .
>>>=20
>>> I'll note that stable/12 -r358659 includes:
>>>=20
>>> stable/12/contrib/googletest/googlemock/test/gmock-matchers_test.cc
>>>=20
>>> which is known to be an issue for OOM activity for
>>> 1 GiByte machines, even for -j1 in some configurations.
>>> See:
>>>=20
>>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D241848
>>>=20
>>> My experiment showed that built by itself (as if
>>> -j1) on a armv7 with 2 GiByte of RAM it got an
>>> Observed: 1146Mi MaxObs(Act+Wir) . (1740 MiByte
>>> of swap space, but it stayed all free.)
>>>=20
>=20
> I'm confused, was this on a Pi?

armv7, not a RPi* but an OrangePi+ 2e (OPi+2e).
The major difference for the issue is the amount
of RAM. (I did not have RPi2 access at the time.)

In fact, I use the same microsd card, USB SSD, root
file system, msdos file system, and a common swap
partition on the RPi2 V1.2. I do switch between
boot.scr files when I move the media between
machines. The RPi2 ignores the dd'd material on the
microsd card.

The same media would work in the RPi2 V1.1 but that
microsd card slot has a broken retention mechanism
and ejects the microsd card as I take my fingers
away. I've no access to another RPi2 V1.1 .

> My observation was casual
> and I think peaked at about 700MB. Alas, no logging in progress.=20

For:

stable/12/contrib/googletest/googlemock/test/gmock-matchers_test.cc

Dimitry Andric reported clang90 with assertions disabled used:
"maxrss of 1755320, so ~1714 MiByte". So not much different than
my figure (but measured a different way). With -j4 active, you
would probably need the 3 other cores to be waiting for something
to do during the big memory use time frame of this compile in
order for swap space use to be only 700 MiByte over that
time frame.

(I'll note that this file and related files were added to
stable/12 in -r344078 , somewhat over a year ago. It was
copied from vendor/ tp contrib at -r344082 and integrated
in -r345203 and -r348138 [2019-Mar/May].)

> The OOMA kills seemed to be toward the end of the "building libraries"
> phase but before ld became active.

contrib/googletest/googlemock/test/gmock-matchers_test.cc would
be compiled later than in the libraries part of the build.

So you likely saw an earlier local-peak in the memory use.

>>>=20
>>>=20
>>>> On the third installation, the machine stopped with=20
>>>> pid 68521 (c++), jid 0, uid 0, was killed: out of swap space
>>>> so I set=20
>>>> vm.pfault_oom_attempts=3D"-1" and restarted buildworld with =
-DNO_CLEAN
>>>> The machine promptly reported
>>>> pid 93318 (c++), jid 0, uid 0, was killed: out of swap space
>>>=20
>>> (The following is based on head. I've not compared
>>> 12-STABLE to be sure how close the match is.)
>>>=20
>>> Possible causes of the OOM kill activity include:
>>>=20
>>> The swap blk uma zone was exhausted.
>>> The swap pctrie uma zone was exhausted.
>>>=20
>>> vm.pfault_oom_attempts and vm.pageout_oom_seq make no
>>> direct difference for these, as far as I know.
>>>=20
>>> Unfortantely, FreeBSD does not specifically report
>>> either cause when it happens, but gives the generic
>>> "out of swap" type of notice.
>=20
> Ok, that seems like a plausible candidate, but would setting
> vm.pageout_oom_seq=3D"4096"
> subsequently suppress the OOMA kill?=20

vm.pageout_oom_seq does not change being unable to allocate
from either of the 2 memory areas (uma zones), other than
having possibly prevented some prior potential OOM kills.
So: No.

As near as I can tell exhausting either of these uma zones
means that the kernel can no longer manage the swap space,
short of killing processes to free up the related zone
contents for other uses.

Nothing ever disables all the OOM kill criteria, only
specific aspects of the overall criteria. (As far as I
know anyway.) The alternatives are things like forced
reboots, deadlocks, and such under various conditions.

>>>=20
>>> Your 2.6GB swap space configuration may be making one
>>> or both of these exhaustions more likely. For all I
>>> know "exhaustion" might included something becoming
>>> too fragmented to have individual areas of sufficient
>>> size despite total free in the involved one being
>>> seemingly sufficient.
>>>=20
>=20
>>> I'm not claiming that -j4 is even possible to do
>>> reliably, much less staying within the maximum
>>> recommended by default. But, what the consequences
>>> might be for what the warning reports, might put
>>> one outside the generally-well-understood range
>>> of FreeBSD use. Rare expertise might be involved
>>> in understanding what to expect.
>>>=20
> Up to now -j4 buildworlds were successful on the Pi2, with
> minimal use of swap. 12.1 uses clang9, which seems considerably
> larger than former versions. Perhaps that's part of the trouble.=20

stable/12 -r356460 (2020-Jan-7) switched to llvm 9.0.0 .
If all your prior 12.x activity predates that, then, yea,
things are likely different this time around.

Interestingly, gmock-matchers_test.cc predates that. If
you used between -r348138 and -r356460 without such large
memory use, clang 9+ would be contributing for the subset
of issues tied to gmock-matchers_test.cc . (Even though
gcc uses even more RAM for the file, as I understand.)

>>>> In neither case were there any "indefinite wait...." or any other
>>>> warning messages.
>>>=20
>>> Such messages need not be involved in the uma zone
>>> exhaustions.
>>>=20
>>>> At that point I set
>>>> vm.pageout_oom_seq=3D"4096" and restarted buildworld, again with =
-DNO_CLEAN.
>>>=20
>>> Which need not contribute to avoiding uma zone
>>> exhaustions.
>>>=20
> Even so, buildworld completed....

I thought you said you were having to restart things.
That gets a likely greatly different mix of what is
running at the same time in the detailed timing.
Peak memory use times across the cores likely would
be different after each restart, making OOM activity
related comparisons problematical.

>>>=20
>>> There is no setting that disables all the OOM kill
>>> criteria.=20
>=20
> Ok, this is a surprise, I gathered that
> vm.pfault_oom_attempts=3D"-1"=20
> turned off all OOMA activity.
>=20
>>> The two settings together are not enough
>>> to disable all the OOM kill criteria.
>>>=20
>=20
> Yet, using both together seems to have suppressed OOMA.

I thought you said you were having to restart things because
of OOM kill activity, even with both set. That would mean
that OOM kill activity had not been suppressed.

>>>> Casual observation suggests swap use peaks at a few hundred MB
>>>> under 12.1 on the Pi2.
>>>=20
>>> Is 12.1 the version number? The MB figure(s) seem to be missing
>>> from this statement for what is the few hundred MB is under.
>=20
> Yes, 12.1 is the version number from the snapshot name.
>=20
>>> Did you mean something like 2.6 GiByte (swap), so fairly near
>>> 2.6 GiByte but definitely over 2.0 GiByte of swap in use?
>>>=20
> Sorry, the few hundred MB was total swap use. IIRC I saw ~700MB in
> use very briefly. Roughly 30% of 2.6GB.=20

Wired memory is not swapped and there are possibly other
overheads. A quick estimate is that you had between 700
MiByte and 800 MiByte of RAM in use for the CPU time
takers at the time of the around 700 MB swap space use.
So, something like 1400 MB to 1500 MiByte overall,
counting related swap space as well.

In other words, having, say, 1100 MiByte to 1200 MiByte of
swap space would have been more than sufficient for that
stage.

This means that seeing if you still have -j4 problems at
that stage with the smaller swap space could be a useful
experiment, even if there are later problems with the
swap size. (You may have to risk eventual deadlock if
you do not stop it after the stage in question.)

Testing, unfortunately requires repeating from the same
initial conditions, probably always starting a from-
scratch build. (Not that this gives full repeatability.)

>>>> All three installations were run on the same physical Pi, though
>>>> of course the microSD cards were distinct. Even so, all three=20
>>>> cards are nominally identical and likely from the same batch.
>>>=20
>>> buildworld buildkernel would still have lots of variations
>>> in the relative timing of activities during the various
>>> builds. Such could matter to the uma zone usage, for example.
>>> If things are marginal for working, such variable results
>>> might well be expected.
>>>=20
>>>> The one tangible difference is that the Pi2 is now on a private
>>>> network, the two earlier buildworlds were on a public network.
>>>> Can't see how that would matter, however.=20
>>>=20
>>>=20
>>> Note:
>>>=20
>>> I have a RPi2B V1.2 with armv7 FreeBSD ( head -r358966 )
>>=20
>=20
> That might be a significant difference. My pi2's are v1.1,=20
> unlike the later version in that they're (I think) obligately
> armv7, with an older processor. =20

Older processor: yes. But, if I understand right, armv7
FreeBSD makes very little internal distinction in the
kernel for the two, including for relevant issues to the
OOM kill criteria and activity. (Those who know more
can correct any misimpression that I have.)

>> Nope: head -r359427 again. Sorry.
>>=20
>>> doing a -j2 buildworld buildkernel with top watching, top
>>> having my changes that track and report some maximum
>>> observed figures. Maximum Observed swap space usage is
>>> reported, as it MaxObs(Act+Wir).
>>>=20
>>> But it likely will be a day or two before it completes,
>>> presuming it is successful.
>>>=20
>>> (For reference: It is configured like the RPi3 was
>>> for that -j4 test that I reported to you earlier,
>>> other than the swap partition being set to 1800
>>> MiByte and the use of -j2 for armv7.)

I'll list the results later in this reply.

>=20
> Back in late January you observed an error in the code handling
> the setting of vm.pfault_oom_attempts=3D"-1" and reported it under
> the subject line: Re: OOMA kill with vm.pfault_oom_attempts=3D"-1"=20
> on RPi3 at r357147 (a vm_pfault_oom_attempts < 0 handling bug as=20
> of head -r357026)
>=20
> Is there any chance the bug survived in the `12 branch?

The change in -r357026 that introduced that issue was not MFC'd
and was not labeled as intended to be MFC'd: it is 13+ specific.
So it is not involved in stable/12 issues.


Going back to:

>>> (For reference: It is configured like the RPi3 was
>>> for that -j4 test that I reported to you earlier,
>>> other than the swap partition being set to 1800
>>> MiByte and the use of -j2 for armv7.)

It completed fine, with my odd top variant showing:

Mem:  758544Ki MaxObsActive, 189972Ki MaxObsWired, 928060Ki =
MaxObs(Act+Wir)
Swap: 527388Ki MaxObsUsed

But it turned out that the high memory use time frame for
gmock-matchers_test.cc was matched with a very low memory
use activity. So the 527388Ki MaxObsUsed is on the low
side for figuring out having margin to cover variability
in what the paired activity might be. Other pairings could
easily have used over 700 MiByte more (say, linking clang),
and so have reached in the realm of 1400 to 1500 MiByte
for swap, leaving, say, 400 MiBytes to 300 MiBytes unused.

(I happened to be there to watch the top display over the
period of time at issue, seeing the growth to 527388Ki
MaxObsUsed.)

(Unfortunately, 1400+ MiByte is more than the about 1200
MiByte your context gives as a warning for the recommended
maximum. This suggests that even -j2 has risks for total
swap usage in your context but the stable/12 vs. head
consequences are not obvious for such a comparison.)

Personally, I'd not want any small to non-existent potential
swap space margins from trying -j3. So I'll not try such:
-j2 at most for the 1800 MiByte swapping/paging space for
armv7 head RPi2 with only 1 GiByte of RAM. (More on the
2 GiByte RAM OPi+2e: 2nd swap file to enable when I need
it in that context.)

I'm not claiming that -j3 would usually run out of swap
space for the RPi2. I'm just trying to make it highly
unlikely that I'd ever have a case of running out of swap
space: I want vm.pfault_oom_attempts=3D"-1" to be reasonable
to use, without noticeably risking deadlocking the armv7
in the 1 GiByte RAM context. With 2 GiByte of RAM the
OPi+2e allows 3072 GiByte of swap space without the
warning as well, making -j4 reasonable in that context.


FYI for the -j2 RPi2 v1.2 based build:
Ended:   2020-04-03:10:46:52
Started: 2020-04-01:20:59:24

So somewhat under 38 hours for the from scratch build,
same sort of detailed selections as for the earlier
aarch64 RPi3 example.

=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?42583CC7-4650-4F17-8E22-78B02CD47832>