Date: Fri, 8 May 2020 11:02:52 -0700 From: bob prohaska <fbsd@www.zefox.net> To: freebsd-arm@freebsd.org Subject: Curious buildworld stoppage on RPi3B running 12.1-stable. Message-ID: <20200508180252.GA29832@www.zefox.net>
next in thread | raw e-mail | index | archive | help
Here's an example of a curious hang on a Pi3 during -j4 buildworld. The machine uses a 37GB mechanical USB disk running 12.1 r360689. /boot/loader.conf contains kern.cam.boot_delay="20000" vm.pageout_oom_seq="4096" vm.pfault_oom_attempts="-1" Top reports: last pid: 84778; load averages: 0.00, 0.47, 1.35 up 0+19:11:52 09:43:58 55 processes: 1 running, 54 sleeping CPU: 0.0% user, 0.0% nice, 0.7% system, 0.0% interrupt, 99.3% idle Mem: 243M Active, 26M Inact, 13M Laundry, 171M Wired, 98M Buf, 453M Free Swap: 2263M Total, 698M Used, 1565M Free, 30% Inuse, 696K In PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 82615 root 1 20 0 726M 263M swread 3 8:43 0.46% c++ 1025 bob 1 20 0 13M 1052K CPU0 0 2:47 0.16% top 1020 bob 1 20 0 19M 800K select 1 0:18 0.02% sshd 66244 root 1 20 0 11M 228K select 0 0:03 0.00% make 656 root 1 20 0 10M 244K select 0 0:00 0.00% devd 82405 root 1 20 0 17M 2460K select 0 0:25 0.00% make 27467 root 1 20 0 10M 228K select 0 0:11 0.00% make 27499 root 1 20 0 11M 232K select 0 0:07 0.00% make 40948 root 1 20 0 11M 228K select 0 0:06 0.00% make 928 root 1 20 0 16M 1004K select 0 0:05 0.00% sendmail 66255 root 1 20 0 14M 236K select 1 0:04 0.00% make 73291 root 1 20 0 10M 224K select 1 0:02 0.00% make 932 root 1 20 0 11M 340K nanslp 3 0:01 0.00% cron 727 root 1 20 0 11M 664K select 3 0:01 0.00% syslogd The buildworld output log file has stopped updating: -rw-r--r-- 1 root wheel 18178806 May 8 06:34 buildworld.log bob@nemesis:/usr/src % ls -l *.log -rw-r--r-- 1 root wheel 18345625 May 8 08:17 buildworld.log bob@nemesis:/usr/src % ls -l *.log -rw-r--r-- 1 root wheel 18532738 May 8 09:34 buildworld.log bob@nemesis:/usr/src % ls -l *.log -rw-r--r-- 1 root wheel 18532738 May 8 09:34 buildworld.log bob@nemesis:/usr/src % ls -l *.log -rw-r--r-- 1 root wheel 18532738 May 8 09:34 buildworld.log bob@nemesis:/usr/src % ls -l *.log -rw-r--r-- 1 root wheel 18532738 May 8 09:34 buildworld.log bob@nemesis:/usr/src % At this point clock time is 09:46, so it's been stuck 8 minutes. Gstat reports dT: 1.010s w: 1.000s L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 0 0 0 0 0.0 0 0 0.0 0.0| mmcsd0 1 80 80 321 12.4 0 0 0.0 99.4| da0 0 0 0 0 0.0 0 0 0.0 0.0| mmcsd0s1 0 0 0 0 0.0 0 0 0.0 0.0| mmcsd0s2 0 0 0 0 0.0 0 0 0.0 0.0| diskid/DISK-9B9A5304 0 0 0 0 0.0 0 0 0.0 0.0| msdosfs/MSDOSBOOT 0 0 0 0 0.0 0 0 0.0 0.0| mmcsd0s2a 0 0 0 0 0.0 0 0 0.0 0.0| mmcsd0s2b 0 0 0 0 0.0 0 0 0.0 0.0| diskid/DISK-9B9A5304s1 0 0 0 0 0.0 0 0 0.0 0.0| diskid/DISK-9B9A5304s2 0 0 0 0 0.0 0 0 0.0 0.0| ufsid/5bbd5a23da8b6195 0 0 0 0 0.0 0 0 0.0 0.0| ufs/rootfs 0 0 0 0 0.0 0 0 0.0 0.0| diskid/DISK-9B9A5304s2a 0 0 0 0 0.0 0 0 0.0 0.0| diskid/DISK-9B9A5304s2b 0 0 0 0 0.0 0 0 0.0 0.0| da0s1 1 80 80 321 12.4 0 0 0.0 99.5| da0s2 0 0 0 0 0.0 0 0 0.0 0.0| da0s2a 1 80 80 321 12.4 0 0 0.0 99.5| da0s2b bob@nemesis:~ % It looks as if the swap device is saturated, but swap is not exhausted. The machine isn't hung; ssh sessions are responsive, the serial console refreshes the login prompt when the Enter key is hit. Also, there are no _current_ errors on the serial console. However, there a quite a number of swap_pager_getswapspace(32): failed messages, which stopped some considerable time earlier. Buildworld didn't seem affected by them. Perhaps strangest of all, starting up a gstat script to log system activity seems to slowly revive buildworld. %idle gradually drops to a few percent. If the gstat script is started before buildworld the stall doesn't seem to happen and buildworld finishes in about 27 hours, which is as good as it gets. Earlier tests letting the machine run undisturbed without the gastat script running suggests it'll eventually finish buildworld in around 36 hours. This vaguely echoes an incident long ago where Xorg was unusably slow until a small job was started in the background, at which point it became responsive. Can't readily imagine a connection, though. It's rather counterintituve that giving the machine more work to do should make it work faster....If anybody can suggest what's going on I'd most curious. I could try poking around in the debugger if someone can offer guidance. On my own I'm clueless. Thanks for reading, bob prohaska
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20200508180252.GA29832>