From owner-freebsd-arm@freebsd.org  Mon May 11 22:41:39 2020
Return-Path: <owner-freebsd-arm@freebsd.org>
Delivered-To: freebsd-arm@mailman.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.nyi.freebsd.org (Postfix) with ESMTP id 670702D929D
 for <freebsd-arm@mailman.nyi.freebsd.org>;
 Mon, 11 May 2020 22:41:39 +0000 (UTC)
 (envelope-from fbsd@www.zefox.net)
Received: from www.zefox.net (www.zefox.net [50.1.20.27])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 server-signature RSA-PSS (4096 bits)
 client-signature RSA-PSS (2048 bits) client-digest SHA256)
 (Client CN "www.zefox.com", Issuer "www.zefox.com" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 49LbX204Nrz4KYS
 for <freebsd-arm@freebsd.org>; Mon, 11 May 2020 22:41:37 +0000 (UTC)
 (envelope-from fbsd@www.zefox.net)
Received: from www.zefox.net (localhost [127.0.0.1])
 by www.zefox.net (8.15.2/8.15.2) with ESMTPS id 04BMfVho041343
 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO);
 Mon, 11 May 2020 15:41:31 -0700 (PDT)
 (envelope-from fbsd@www.zefox.net)
Received: (from fbsd@localhost)
 by www.zefox.net (8.15.2/8.15.2/Submit) id 04BMfUom041342;
 Mon, 11 May 2020 15:41:30 -0700 (PDT) (envelope-from fbsd)
Date: Mon, 11 May 2020 15:41:30 -0700
From: bob prohaska <fbsd@www.zefox.net>
To: Mark Millard <marklmi@yahoo.com>
Cc: freebsd-arm@freebsd.org, bob prohaska <fbsd@www.zefox.net>
Subject: Re: Curious buildworld stoppage on RPi3B running 12.1-stable.
Message-ID: <20200511224130.GA41230@www.zefox.net>
References: <20200508180252.GA29832@www.zefox.net>
 <E86257BD-42F5-47F4-A0D7-AD66D41DADF9@yahoo.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <E86257BD-42F5-47F4-A0D7-AD66D41DADF9@yahoo.com>
X-Rspamd-Queue-Id: 49LbX204Nrz4KYS
X-Spamd-Bar: ++
Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none;
 spf=none (mx1.freebsd.org: domain of fbsd@www.zefox.net has no SPF policy when
 checking 50.1.20.27) smtp.mailfrom=fbsd@www.zefox.net
X-Spamd-Result: default: False [2.14 / 15.00]; ARC_NA(0.00)[];
 WWW_DOT_DOMAIN(0.50)[]; RCVD_TLS_ALL(0.00)[];
 FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3];
 TO_DN_SOME(0.00)[];
 IP_SCORE(0.05)[ip: (0.21), ipnet: 50.1.16.0/20(0.11), asn: 7065(-0.03),
 country: US(-0.05)]; MIME_GOOD(-0.10)[text/plain];
 DMARC_NA(0.00)[zefox.net]; AUTH_NA(1.00)[];
 NEURAL_SPAM_MEDIUM(0.27)[0.268,0];
 NEURAL_HAM_LONG(-0.08)[-0.081,0];
 TO_MATCH_ENVRCPT_SOME(0.00)[]; R_SPF_NA(0.00)[];
 FREEMAIL_TO(0.00)[yahoo.com]; FROM_EQ_ENVFROM(0.00)[];
 R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+];
 ASN(0.00)[asn:7065, ipnet:50.1.16.0/20, country:US];
 MID_RHS_MATCH_FROM(0.00)[]; MID_RHS_WWW(0.50)[];
 RCVD_COUNT_TWO(0.00)[2]
X-BeenThere: freebsd-arm@freebsd.org
X-Mailman-Version: 2.1.33
Precedence: list
List-Id: "Porting FreeBSD to ARM processors." <freebsd-arm.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-arm>,
 <mailto:freebsd-arm-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arm/>
List-Post: <mailto:freebsd-arm@freebsd.org>
List-Help: <mailto:freebsd-arm-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-arm>,
 <mailto:freebsd-arm-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 11 May 2020 22:41:39 -0000

On Fri, May 08, 2020 at 03:08:54PM -0700, Mark Millard wrote:
> On 2020-May-8, at 11:02, bob prohaska <fbsd at www.zefox.net> wrote:
> 
> > Here's an example of a curious hang on a Pi3 during -j4 buildworld.
> > The machine uses a 37GB mechanical USB disk running 12.1 r360689.
> > 
> > /boot/loader.conf contains
> > kern.cam.boot_delay="20000"
> > vm.pageout_oom_seq="4096"
> > vm.pfault_oom_attempts="-1"
> > 

Two attempts to reproduce the buildworld hang on successive revisions 
of 12.1 (now at r360872) failed. Buildworld ran to completion in about 
27 hours, which is about as fast as it ever ran, at least since the
introduction of clang 9. The compile did dwindle to a single thread,
but the thread continued to make progress  and eventually four threads 
replaced it.

Swap activity looked different, remaining high (~1GB) until 
buildworld completed. Formerly swap use became insignificant once
the "building libraries" stage was over. 

The "swap_pager_getswapspace(24): failed" messages were still seen,
but they stopped relatively early in buildworld. Near as I can tell
they're innocuous, even if they look threatening. 

The tempest is back in the teapot 8-)

> > Top reports:
> > 
> > last pid: 84778;  load averages:  0.00,  0.47,  1.35                      up 0+19:11:52  09:43:58
> > 55 processes:  1 running, 54 sleeping
> > CPU:  0.0% user,  0.0% nice,  0.7% system,  0.0% interrupt, 99.3% idle
> > Mem: 243M Active, 26M Inact, 13M Laundry, 171M Wired, 98M Buf, 453M Free
> > Swap: 2263M Total, 698M Used, 1565M Free, 30% Inuse, 696K In
> 
> 453 MiByte Free, 698MiByte Swap/paging space used, and, see
> below, 263 MiByte RESident for c++ --on a machine with 1 GiByte
> of RAM with 171 MiByte Wired. The c++ RESident is 20 MiByte or
> so more than the 243 MiByte Active figure.
> 
> >  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
> > 82615 root          1  20    0   726M   263M swread   3   8:43   0.46% c++
> > 1025 bob           1  20    0    13M  1052K CPU0     0   2:47   0.16% top
> > 1020 bob           1  20    0    19M   800K select   1   0:18   0.02% sshd
> > 66244 root          1  20    0    11M   228K select   0   0:03   0.00% make
> >  656 root          1  20    0    10M   244K select   0   0:00   0.00% devd
> > 82405 root          1  20    0    17M  2460K select   0   0:25   0.00% make
> > 27467 root          1  20    0    10M   228K select   0   0:11   0.00% make
> > 27499 root          1  20    0    11M   232K select   0   0:07   0.00% make
> > 40948 root          1  20    0    11M   228K select   0   0:06   0.00% make
> >  928 root          1  20    0    16M  1004K select   0   0:05   0.00% sendmail
> > 66255 root          1  20    0    14M   236K select   1   0:04   0.00% make
> > 73291 root          1  20    0    10M   224K select   1   0:02   0.00% make
> >  932 root          1  20    0    11M   340K nanslp   3   0:01   0.00% cron
> >  727 root          1  20    0    11M   664K select   3   0:01   0.00% syslogd
> > 
> > The buildworld output log file has stopped updating:
> > 
> > -rw-r--r--  1 root  wheel  18178806 May  8 06:34 buildworld.log
> > bob@nemesis:/usr/src % ls -l *.log
> > -rw-r--r--  1 root  wheel  18345625 May  8 08:17 buildworld.log
> > bob@nemesis:/usr/src % ls -l *.log
> > -rw-r--r--  1 root  wheel  18532738 May  8 09:34 buildworld.log
> > bob@nemesis:/usr/src % ls -l *.log
> > -rw-r--r--  1 root  wheel  18532738 May  8 09:34 buildworld.log
> > bob@nemesis:/usr/src % ls -l *.log
> > -rw-r--r--  1 root  wheel  18532738 May  8 09:34 buildworld.log
> > bob@nemesis:/usr/src % ls -l *.log
> > -rw-r--r--  1 root  wheel  18532738 May  8 09:34 buildworld.log
> > bob@nemesis:/usr/src % 
> > At this point clock time is 09:46, so it's been stuck 8 minutes.
> 
> See later below about how long loading from swap/paging
> space might be taking based on the below gstat information
> and the earlier top information.
> 
> > Gstat reports
> > dT: 1.010s  w: 1.000s
> > L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
> >    0      0      0      0    0.0      0      0    0.0    0.0| mmcsd0
> >    1     80     80    321   12.4      0      0    0.0   99.4| da0
> >    0      0      0      0    0.0      0      0    0.0    0.0| mmcsd0s1
> >    0      0      0      0    0.0      0      0    0.0    0.0| mmcsd0s2
> >    0      0      0      0    0.0      0      0    0.0    0.0| diskid/DISK-9B9A5304
> >    0      0      0      0    0.0      0      0    0.0    0.0| msdosfs/MSDOSBOOT
> >    0      0      0      0    0.0      0      0    0.0    0.0| mmcsd0s2a
> >    0      0      0      0    0.0      0      0    0.0    0.0| mmcsd0s2b
> >    0      0      0      0    0.0      0      0    0.0    0.0| diskid/DISK-9B9A5304s1
> >    0      0      0      0    0.0      0      0    0.0    0.0| diskid/DISK-9B9A5304s2
> >    0      0      0      0    0.0      0      0    0.0    0.0| ufsid/5bbd5a23da8b6195
> >    0      0      0      0    0.0      0      0    0.0    0.0| ufs/rootfs
> >    0      0      0      0    0.0      0      0    0.0    0.0| diskid/DISK-9B9A5304s2a
> >    0      0      0      0    0.0      0      0    0.0    0.0| diskid/DISK-9B9A5304s2b
> >    0      0      0      0    0.0      0      0    0.0    0.0| da0s1
> >    1     80     80    321   12.4      0      0    0.0   99.5| da0s2
> >    0      0      0      0    0.0      0      0    0.0    0.0| da0s2a
> >    1     80     80    321   12.4      0      0    0.0   99.5| da0s2b
> > bob@nemesis:~ % 
> 
> Note the lack of write activity and top also is reported
> showing: swread . Sort of like it is mostly doing a large
> reload of something that had been mostly paged out earlier.
> 
I was struck by the lack of any activity to da0s2a. The
machine wasn't waiting for anything _except_ swap.

> This is combined with that top output showing: 453M Free .
> Was there a time earlier with far less free?
>
Yes, swapping had commenced hours earlier.
 
> Also: top only shows one c++ to match up with your -j4 . This
> suggests possibly a period of time when other things have not
> started, pending needing what that c++ is building. Or it just
> finished a parallel job but has not yet started another.
> Somewhat earlier there might have been more than one thing
> active.
> 
Four threads started, gradually dwindling to a single thread
by the time I noticed. After disturbing the machine by starting
the gstat script the remaining thread became more active and
eventually was joined by additional threads. 

> Can you find out what the c++ is compiling during this period?
It was part of llvm, though I neglected to record what. I planned
to observe more carefully on a second experiment, but failed.
 
> that might allow figuring out if it is something that normally
> sends some time has the only job despite -j4 .
> 
> > It looks as if the swap device is saturated, but swap is not exhausted.
> 
> You later mention swap_pager_getswapspace errors, leading to the
> question if there was a period when swap (or its supporting RAM
> data structure space) had been exhausted. Were those the only
> messages in such earlier time frames?
> 
Yes, they appeared in a flurry in the first half (timewise) of
buildworld. Then no further console output. 

> As for being "saturated":
> 
> At 321 KB/s (the example in the gstat output), if it was loading
> something large, such as 512 MiBytes, just basically reading that
> much material at a mean of such a rate would be over 25 minutes:
> 
> 512*1024*1024_bytes * (1sec/(321*1024_bytes)) * (1minute/(60sec))
> 
> (Not that I know the 512 MiByte is realistic or that the 321
> KiBYtes/sec is a realistic mean rate.) This just gives an idea
> what might be involved and that being beyond seconds or even a
> small number of minutes for the general time scale may well be
> possible.
> 
> Extensive paging/swapping can vastly slow things down.
>
That is a startling realization. 
 
> > The machine isn't hung; ssh sessions are responsive, the serial console
> > refreshes the login prompt when the Enter key is hit. 
> > 
> > Also, there are no _current_ errors on the serial console. However, there
> > a quite a number of swap_pager_getswapspace(32): failed messages, which
> > stopped some considerable time  earlier. Buildworld didn't seem affected by them. 
> 
> I'm not sure what can be said about the status of the machine
> after swap_pager_getswapspace failures are reported that do not
> have matching OOM kills involved. It may be that pages stayed
> resident that otherwise would have paged out and that other
> tradeoffs lead to slow-downs compared to having been able to
> free some RAM for other uses. (No deadlock happened, however.)
> 
> My expectation here is that you did run out of free swap/paging
> space space for a while. (It may be that internal data structures
> for managing swap/paging space use are the limit hit, instead of
> disk space, for all I know.) You did not indicate how big the
> swap/paging partition is.
>
It's 2263M.  
> You did not mention if booting is suggesting kern.maxswzone
> adjustments --or if you have done something to control that
> context to not use the default.
>
No gripes from the kernel. 
> > Perhaps strangest of all, starting up a gstat script to log system
> > activity seems to slowly revive buildworld. %idle gradually drops to 
> > a few percent.
> 
> Did you look to see if it was still just the one c++ process (PID
> 82615) for the -j4 ?
>
Once PID 82615 got back near 100% it was joined by others.  
> Other things may also have changed, not just you starting a gstat
> script. This sort of judgement requires examining/monitoring the
> context for a time.
> 
> > If the gstat script is started before buildworld 
> > the stall doesn't seem to happen and buildworld finishes in about 27
> > hours, which is as good as it gets.  
> 
> Are all the comparisons of from-scratch buildworld buildkernel
> activity and otherwise controlled to be very similar contexts?
>
The experiments start with a cleanupscript:
make -j8 cleandir > cleandir.log && make -j8 cleandir > cleandir.log && rm -rf /usr/obj/usr/ && rm *.log
(probably redundant, kept out of habit)

Then a buildscript
make -j4 -DNO_CLEAN  buildworld > buildworld.log && make -j4  buildkernel  > buildkernel.log

Finally, the installscript:
make -j4 installworld > installworld.log && make -j4 installkernel > installkernel.log


> > Earlier tests  letting the machine run undisturbed without the gastat 
> > script running suggests it'll eventually finish buildworld in around 
> > 36 hours.
> 
> Also has the question of how well guaranteed is the
> degree of similarity for the compared contexts?
> 
> > This vaguely echoes an incident long ago where Xorg was unusably slow
> > until a small job was started in the background, at which point it
> > became responsive. Can't readily imagine a connection, though.
> 
> Definitely changes what the scheduler has to do. That likely has
> other non-obvious consequences when pushing the system near its
> limits.
> 
> > It's rather counterintituve that giving the machine more work to do
> > should make it work faster....If anybody can suggest what's going
> > on I'd most curious. I could try poking around in the debugger if
> > someone can offer guidance. On my own I'm clueless. 
> 
> ===
> Mark Millard

Thanks to all (especially Mark) who read this far!

bob prohaska