Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 11 May 2020 16:34:25 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        bob prohaska <fbsd@www.zefox.net>
Cc:        freebsd-arm@freebsd.org
Subject:   Re: Curious buildworld stoppage on RPi3B running 12.1-stable.
Message-ID:  <C4D819F2-2959-4D71-9174-2061740D66B6@yahoo.com>
In-Reply-To: <20200511224130.GA41230@www.zefox.net>
References:  <20200508180252.GA29832@www.zefox.net> <E86257BD-42F5-47F4-A0D7-AD66D41DADF9@yahoo.com> <20200511224130.GA41230@www.zefox.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2020-May-11, at 15:41, bob prohaska <fbsd at www.zefox.net> wrote:

> On Fri, May 08, 2020 at 03:08:54PM -0700, Mark Millard wrote:
>> On 2020-May-8, at 11:02, bob prohaska <fbsd at www.zefox.net> wrote:
>>=20
>>> Here's an example of a curious hang on a Pi3 during -j4 buildworld.
>>> The machine uses a 37GB mechanical USB disk running 12.1 r360689.
>>>=20
>>> /boot/loader.conf contains
>>> kern.cam.boot_delay=3D"20000"
>>> vm.pageout_oom_seq=3D"4096"
>>> vm.pfault_oom_attempts=3D"-1"
>>>=20
>=20
> Two attempts to reproduce the buildworld hang on successive revisions=20=

> of 12.1 (now at r360872) failed. Buildworld ran to completion in about=20=

> 27 hours, which is about as fast as it ever ran, at least since the
> introduction of clang 9. The compile did dwindle to a single thread,
> but the thread continued to make progress  and eventually four threads=20=

> replaced it.
>=20
> Swap activity looked different, remaining high (~1GB) until=20
> buildworld completed. Formerly swap use became insignificant once
> the "building libraries" stage was over.=20
>=20
> The "swap_pager_getswapspace(24): failed" messages were still seen,
> but they stopped relatively early in buildworld. Near as I can tell
> they're innocuous, even if they look threatening.=20

The settings that you use to help prevent OOM's
likely contribute to why things managed to continue
without having to restart them manually (avoiding
OOM kills).

But those messages probably also mean that such builds
are more likely to deadlock from having run out of
usable swap/paging space. They may suggest that backing
off the -jN setting to use a smaller N would be
appropriate, say to -j3 from what you have described.

> The tempest is back in the teapot 8-)
>=20
>>> Top reports:
>>>=20
>>> last pid: 84778;  load averages:  0.00,  0.47,  1.35                 =
     up 0+19:11:52  09:43:58
>>> 55 processes:  1 running, 54 sleeping
>>> CPU:  0.0% user,  0.0% nice,  0.7% system,  0.0% interrupt, 99.3% =
idle
>>> Mem: 243M Active, 26M Inact, 13M Laundry, 171M Wired, 98M Buf, 453M =
Free
>>> Swap: 2263M Total, 698M Used, 1565M Free, 30% Inuse, 696K In
>>=20
>> 453 MiByte Free, 698MiByte Swap/paging space used, and, see
>> below, 263 MiByte RESident for c++ --on a machine with 1 GiByte
>> of RAM with 171 MiByte Wired. The c++ RESident is 20 MiByte or
>> so more than the 243 MiByte Active figure.
>>=20
>>> PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU =
COMMAND
>>> 82615 root          1  20    0   726M   263M swread   3   8:43   =
0.46% c++
>>> 1025 bob           1  20    0    13M  1052K CPU0     0   2:47   =
0.16% top
>>> 1020 bob           1  20    0    19M   800K select   1   0:18   =
0.02% sshd
>>> 66244 root          1  20    0    11M   228K select   0   0:03   =
0.00% make
>>> 656 root          1  20    0    10M   244K select   0   0:00   0.00% =
devd
>>> 82405 root          1  20    0    17M  2460K select   0   0:25   =
0.00% make
>>> 27467 root          1  20    0    10M   228K select   0   0:11   =
0.00% make
>>> 27499 root          1  20    0    11M   232K select   0   0:07   =
0.00% make
>>> 40948 root          1  20    0    11M   228K select   0   0:06   =
0.00% make
>>> 928 root          1  20    0    16M  1004K select   0   0:05   0.00% =
sendmail
>>> 66255 root          1  20    0    14M   236K select   1   0:04   =
0.00% make
>>> 73291 root          1  20    0    10M   224K select   1   0:02   =
0.00% make
>>> 932 root          1  20    0    11M   340K nanslp   3   0:01   0.00% =
cron
>>> 727 root          1  20    0    11M   664K select   3   0:01   0.00% =
syslogd
>>>=20
>>> The buildworld output log file has stopped updating:
>>>=20
>>> -rw-r--r--  1 root  wheel  18178806 May  8 06:34 buildworld.log
>>> bob@nemesis:/usr/src % ls -l *.log
>>> -rw-r--r--  1 root  wheel  18345625 May  8 08:17 buildworld.log
>>> bob@nemesis:/usr/src % ls -l *.log
>>> -rw-r--r--  1 root  wheel  18532738 May  8 09:34 buildworld.log
>>> bob@nemesis:/usr/src % ls -l *.log
>>> -rw-r--r--  1 root  wheel  18532738 May  8 09:34 buildworld.log
>>> bob@nemesis:/usr/src % ls -l *.log
>>> -rw-r--r--  1 root  wheel  18532738 May  8 09:34 buildworld.log
>>> bob@nemesis:/usr/src % ls -l *.log
>>> -rw-r--r--  1 root  wheel  18532738 May  8 09:34 buildworld.log
>>> bob@nemesis:/usr/src %=20
>>> At this point clock time is 09:46, so it's been stuck 8 minutes.
>>=20
>> See later below about how long loading from swap/paging
>> space might be taking based on the below gstat information
>> and the earlier top information.
>>=20
>>> Gstat reports
>>> dT: 1.010s  w: 1.000s
>>> L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
>>>   0      0      0      0    0.0      0      0    0.0    0.0| mmcsd0
>>>   1     80     80    321   12.4      0      0    0.0   99.4| da0
>>>   0      0      0      0    0.0      0      0    0.0    0.0| =
mmcsd0s1
>>>   0      0      0      0    0.0      0      0    0.0    0.0| =
mmcsd0s2
>>>   0      0      0      0    0.0      0      0    0.0    0.0| =
diskid/DISK-9B9A5304
>>>   0      0      0      0    0.0      0      0    0.0    0.0| =
msdosfs/MSDOSBOOT
>>>   0      0      0      0    0.0      0      0    0.0    0.0| =
mmcsd0s2a
>>>   0      0      0      0    0.0      0      0    0.0    0.0| =
mmcsd0s2b
>>>   0      0      0      0    0.0      0      0    0.0    0.0| =
diskid/DISK-9B9A5304s1
>>>   0      0      0      0    0.0      0      0    0.0    0.0| =
diskid/DISK-9B9A5304s2
>>>   0      0      0      0    0.0      0      0    0.0    0.0| =
ufsid/5bbd5a23da8b6195
>>>   0      0      0      0    0.0      0      0    0.0    0.0| =
ufs/rootfs
>>>   0      0      0      0    0.0      0      0    0.0    0.0| =
diskid/DISK-9B9A5304s2a
>>>   0      0      0      0    0.0      0      0    0.0    0.0| =
diskid/DISK-9B9A5304s2b
>>>   0      0      0      0    0.0      0      0    0.0    0.0| da0s1
>>>   1     80     80    321   12.4      0      0    0.0   99.5| da0s2
>>>   0      0      0      0    0.0      0      0    0.0    0.0| da0s2a
>>>   1     80     80    321   12.4      0      0    0.0   99.5| da0s2b
>>> bob@nemesis:~ %=20
>>=20
>> Note the lack of write activity and top also is reported
>> showing: swread . Sort of like it is mostly doing a large
>> reload of something that had been mostly paged out earlier.
>>=20
> I was struck by the lack of any activity to da0s2a. The
> machine wasn't waiting for anything _except_ swap.
>=20
>> This is combined with that top output showing: 453M Free .
>> Was there a time earlier with far less free?
>>=20
> Yes, swapping had commenced hours earlier.
>=20
>> Also: top only shows one c++ to match up with your -j4 . This
>> suggests possibly a period of time when other things have not
>> started, pending needing what that c++ is building. Or it just
>> finished a parallel job but has not yet started another.
>> Somewhat earlier there might have been more than one thing
>> active.
>>=20
> Four threads started, gradually dwindling to a single thread
> by the time I noticed. After disturbing the machine by starting
> the gstat script the remaining thread became more active and
> eventually was joined by additional threads.=20
>=20
>> Can you find out what the c++ is compiling during this period?
> It was part of llvm, though I neglected to record what. I planned
> to observe more carefully on a second experiment, but failed.
>=20
>> that might allow figuring out if it is something that normally
>> sends some time has the only job despite -j4 .
>>=20
>>> It looks as if the swap device is saturated, but swap is not =
exhausted.
>>=20
>> You later mention swap_pager_getswapspace errors, leading to the
>> question if there was a period when swap (or its supporting RAM
>> data structure space) had been exhausted. Were those the only
>> messages in such earlier time frames?
>>=20
> Yes, they appeared in a flurry in the first half (timewise) of
> buildworld. Then no further console output.=20
>=20
>> As for being "saturated":
>>=20
>> At 321 KB/s (the example in the gstat output), if it was loading
>> something large, such as 512 MiBytes, just basically reading that
>> much material at a mean of such a rate would be over 25 minutes:
>>=20
>> 512*1024*1024_bytes * (1sec/(321*1024_bytes)) * (1minute/(60sec))
>>=20
>> (Not that I know the 512 MiByte is realistic or that the 321
>> KiBYtes/sec is a realistic mean rate.) This just gives an idea
>> what might be involved and that being beyond seconds or even a
>> small number of minutes for the general time scale may well be
>> possible.
>>=20
>> Extensive paging/swapping can vastly slow things down.
>>=20
> That is a startling realization.=20

Since paging tends to be small, random reads, and spinning
hard drives have seek times that contribute to overall time
and slow the mean transfer rate compared to sequential and
the like, hard drives tend to show such behavior. (No claim
hard drives are the only media with slow small random
read/write behavior.)

>>> The machine isn't hung; ssh sessions are responsive, the serial =
console
>>> refreshes the login prompt when the Enter key is hit.=20
>>>=20
>>> Also, there are no _current_ errors on the serial console. However, =
there
>>> a quite a number of swap_pager_getswapspace(32): failed messages, =
which
>>> stopped some considerable time  earlier. Buildworld didn't seem =
affected by them.=20
>>=20
>> I'm not sure what can be said about the status of the machine
>> after swap_pager_getswapspace failures are reported that do not
>> have matching OOM kills involved. It may be that pages stayed
>> resident that otherwise would have paged out and that other
>> tradeoffs lead to slow-downs compared to having been able to
>> free some RAM for other uses. (No deadlock happened, however.)
>>=20
>> My expectation here is that you did run out of free swap/paging
>> space space for a while. (It may be that internal data structures
>> for managing swap/paging space use are the limit hit, instead of
>> disk space, for all I know.) You did not indicate how big the
>> swap/paging partition is.
>>=20
> It's 2263M. =20

Yea, you did report that before: it was in the top output.
I just did not notice at the right point. Sorry.

>> You did not mention if booting is suggesting kern.maxswzone
>> adjustments --or if you have done something to control that
>> context to not use the default.
>>=20
> No gripes from the kernel.=20

Yep: booted as aarch64, that should be the case.

>>> Perhaps strangest of all, starting up a gstat script to log system
>>> activity seems to slowly revive buildworld. %idle gradually drops to=20=

>>> a few percent.
>>=20
>> Did you look to see if it was still just the one c++ process (PID
>> 82615) for the -j4 ?
>>=20
> Once PID 82615 got back near 100% it was joined by others. =20

So 82615 did not exit first. Interesting. It sounds like
82615 spent lots of time hitting page faults and reading
pages back in from the swapping/paging space, without
making much progress between the page faults --until it
finally had enough loaded. This looks to have delayed
other activity that was pending.

Again, this may suggest that a smaller -jN is appropriate
for the builds in this context, say -j3 or possibly even
-j2.

>> Other things may also have changed, not just you starting a gstat
>> script. This sort of judgement requires examining/monitoring the
>> context for a time.
>>=20
>>> If the gstat script is started before buildworld=20
>>> the stall doesn't seem to happen and buildworld finishes in about 27
>>> hours, which is as good as it gets. =20
>>=20
>> Are all the comparisons of from-scratch buildworld buildkernel
>> activity and otherwise controlled to be very similar contexts?
>>=20
> The experiments start with a cleanupscript:
> make -j8 cleandir > cleandir.log && make -j8 cleandir > cleandir.log =
&& rm -rf /usr/obj/usr/ && rm *.log
> (probably redundant, kept out of habit)
>=20
> Then a buildscript
> make -j4 -DNO_CLEAN  buildworld > buildworld.log && make -j4  =
buildkernel  > buildkernel.log
>=20
> Finally, the installscript:
> make -j4 installworld > installworld.log && make -j4 installkernel > =
installkernel.log

Looks to be a from-scratch build each time.

> . . .


=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?C4D819F2-2959-4D71-9174-2061740D66B6>