Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 8 May 2020 11:02:52 -0700
From:      bob prohaska <fbsd@www.zefox.net>
To:        freebsd-arm@freebsd.org
Subject:   Curious buildworld stoppage on RPi3B running 12.1-stable.
Message-ID:  <20200508180252.GA29832@www.zefox.net>

next in thread | raw e-mail | index | archive | help
Here's an example of a curious hang on a Pi3 during -j4 buildworld.
The machine uses a 37GB mechanical USB disk running 12.1 r360689.

/boot/loader.conf contains
kern.cam.boot_delay="20000"
vm.pageout_oom_seq="4096"
vm.pfault_oom_attempts="-1"

Top reports:

last pid: 84778;  load averages:  0.00,  0.47,  1.35                      up 0+19:11:52  09:43:58
55 processes:  1 running, 54 sleeping
CPU:  0.0% user,  0.0% nice,  0.7% system,  0.0% interrupt, 99.3% idle
Mem: 243M Active, 26M Inact, 13M Laundry, 171M Wired, 98M Buf, 453M Free
Swap: 2263M Total, 698M Used, 1565M Free, 30% Inuse, 696K In

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
82615 root          1  20    0   726M   263M swread   3   8:43   0.46% c++
 1025 bob           1  20    0    13M  1052K CPU0     0   2:47   0.16% top
 1020 bob           1  20    0    19M   800K select   1   0:18   0.02% sshd
66244 root          1  20    0    11M   228K select   0   0:03   0.00% make
  656 root          1  20    0    10M   244K select   0   0:00   0.00% devd
82405 root          1  20    0    17M  2460K select   0   0:25   0.00% make
27467 root          1  20    0    10M   228K select   0   0:11   0.00% make
27499 root          1  20    0    11M   232K select   0   0:07   0.00% make
40948 root          1  20    0    11M   228K select   0   0:06   0.00% make
  928 root          1  20    0    16M  1004K select   0   0:05   0.00% sendmail
66255 root          1  20    0    14M   236K select   1   0:04   0.00% make
73291 root          1  20    0    10M   224K select   1   0:02   0.00% make
  932 root          1  20    0    11M   340K nanslp   3   0:01   0.00% cron
  727 root          1  20    0    11M   664K select   3   0:01   0.00% syslogd

The buildworld output log file has stopped updating:

-rw-r--r--  1 root  wheel  18178806 May  8 06:34 buildworld.log
bob@nemesis:/usr/src % ls -l *.log
-rw-r--r--  1 root  wheel  18345625 May  8 08:17 buildworld.log
bob@nemesis:/usr/src % ls -l *.log
-rw-r--r--  1 root  wheel  18532738 May  8 09:34 buildworld.log
bob@nemesis:/usr/src % ls -l *.log
-rw-r--r--  1 root  wheel  18532738 May  8 09:34 buildworld.log
bob@nemesis:/usr/src % ls -l *.log
-rw-r--r--  1 root  wheel  18532738 May  8 09:34 buildworld.log
bob@nemesis:/usr/src % ls -l *.log
-rw-r--r--  1 root  wheel  18532738 May  8 09:34 buildworld.log
bob@nemesis:/usr/src % 
At this point clock time is 09:46, so it's been stuck 8 minutes.

Gstat reports
dT: 1.010s  w: 1.000s
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
    0      0      0      0    0.0      0      0    0.0    0.0| mmcsd0
    1     80     80    321   12.4      0      0    0.0   99.4| da0
    0      0      0      0    0.0      0      0    0.0    0.0| mmcsd0s1
    0      0      0      0    0.0      0      0    0.0    0.0| mmcsd0s2
    0      0      0      0    0.0      0      0    0.0    0.0| diskid/DISK-9B9A5304
    0      0      0      0    0.0      0      0    0.0    0.0| msdosfs/MSDOSBOOT
    0      0      0      0    0.0      0      0    0.0    0.0| mmcsd0s2a
    0      0      0      0    0.0      0      0    0.0    0.0| mmcsd0s2b
    0      0      0      0    0.0      0      0    0.0    0.0| diskid/DISK-9B9A5304s1
    0      0      0      0    0.0      0      0    0.0    0.0| diskid/DISK-9B9A5304s2
    0      0      0      0    0.0      0      0    0.0    0.0| ufsid/5bbd5a23da8b6195
    0      0      0      0    0.0      0      0    0.0    0.0| ufs/rootfs
    0      0      0      0    0.0      0      0    0.0    0.0| diskid/DISK-9B9A5304s2a
    0      0      0      0    0.0      0      0    0.0    0.0| diskid/DISK-9B9A5304s2b
    0      0      0      0    0.0      0      0    0.0    0.0| da0s1
    1     80     80    321   12.4      0      0    0.0   99.5| da0s2
    0      0      0      0    0.0      0      0    0.0    0.0| da0s2a
    1     80     80    321   12.4      0      0    0.0   99.5| da0s2b
bob@nemesis:~ % 

It looks as if the swap device is saturated, but swap is not exhausted.

The machine isn't hung; ssh sessions are responsive, the serial console
refreshes the login prompt when the Enter key is hit. 

Also, there are no _current_ errors on the serial console. However, there
a quite a number of swap_pager_getswapspace(32): failed messages, which
stopped some considerable time  earlier. Buildworld didn't seem affected by them. 

Perhaps strangest of all, starting up a gstat script to log system
activity seems to slowly revive buildworld. %idle gradually drops to 
a few percent. If the gstat script is started before buildworld 
the stall doesn't seem to happen and buildworld finishes in about 27
hours, which is as good as it gets.  

Earlier tests  letting the machine run undisturbed without the gastat 
script running suggests it'll eventually finish buildworld in around 
36 hours.

This vaguely echoes an incident long ago where Xorg was unusably slow
until a small job was started in the background, at which point it
became responsive. Can't readily imagine a connection, though.

It's rather counterintituve that giving the machine more work to do
should make it work faster....If anybody can suggest what's going
on I'd most curious. I could try poking around in the debugger if
someone can offer guidance. On my own I'm clueless. 

Thanks for reading,

bob prohaska




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20200508180252.GA29832>