Date: Wed, 25 Mar 2020 01:18:36 -0700 From: Mark Millard <marklmi@yahoo.com> To: bob prohaska <fbsd@www.zefox.net> Cc: freebsd-arm@freebsd.org Subject: Re: Belated out of swap kill on rpi3 at r359216 Message-ID: <0FF6BC4C-296F-49F3-8FB8-AA87A49349E2@yahoo.com> In-Reply-To: <20200325015633.GA93057@www.zefox.net> References: <20200324155753.GA91922@www.zefox.net> <83E41A13-6C24-4B56-A837-779044038FBC@yahoo.com> <20200324185518.GA92311@www.zefox.net> <75CE3C07-8A0A-4D32-84C6-24BEA967447E@yahoo.com> <20200324224658.GA92726@www.zefox.net> <764D5A86-6A42-44E0-A706-F1C49BB198DA@yahoo.com> <20200325015633.GA93057@www.zefox.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2020-Mar-24, at 18:56, bob prohaska <fbsd at www.zefox.net> wrote: > [this thread needs a haircut, new comments start near line 288] > On Tue, Mar 24, 2020 at 05:21:47PM -0700, Mark Millard wrote: >> On 2020-Mar-24, at 15:46, bob prohaska <fbsd at www.zefox.net> wrote: >>=20 >>> . . . >>=20 >> . . . >>=20 >>=20 >> The text before "Mar 24 04:20:50" messages, with notes >> and blank lines mixed in to group things: >>=20 >> . . . >> Tue Mar 24 04:20:38 PDT 2020 Note the time above. >> Device 1K-blocks Used Avail Capacity >> /dev/mmcsd0s2b 4404252 403784 4000468 9% >> Mar 24 02:50:01 www kernel: swap_pager: indefinite wait buffer: = bufobj: 0, blkno: 623056, size: 16384 >> Mar 24 04:17:17 www kernel: swap_pager: indefinite wait buffer: = bufobj: 0, blkno: 1033854, size: 4096 >> 3/773/776/19177 mbuf clusters in use (current/cache/total/max) >>=20 >> procs memory page disks faults = cpu >> r b w avm fre flt re pi po fr sr mm0 da0 in sy = cs us sy id >> 0 0 30 2034696 54180 12942 7 2 4 13432 1374 0 0 14074 = 8102 2849 65 29 5 >> dT: 1.031s w: 1.000s >> L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps = ms/d %busy Name >> 18 1 0 0 0.0 1 31 26686 0 0 = 0.0 287.1 mmcsd0 >> 18 1 0 0 0.0 1 31 26686 0 0 = 0.0 287.1 mmcsd0s2 >> 10 1 0 0 0.0 1 31 26686 0 0 = 0.0 287.1 mmcsd0s2a >> 10 1 0 0 0.0 1 31 26686 0 0 = 0.0 287.1 ufs/rootfs >> Tue Mar 24 04:20:40 PDT 2020 >> Device 1K-blocks Used Avail Capacity >> /dev/mmcsd0s2b 4404252 403784 4000468 9% >> Mar 24 02:50:01 www kernel: swap_pager: indefinite wait buffer: = bufobj: 0, blkno: 623056, size: 16384 >> Mar 24 04:17:17 www kernel: swap_pager: indefinite wait buffer: = bufobj: 0, blkno: 1033854, size: 4096 >> 6/770/776/19177 mbuf clusters in use (current/cache/total/max) >>=20 >> Note that the above has 26sec+ ms/w figures. No reads. >> No mmcsd0s2b. >>=20 >>=20 >> procs memory page disks faults = cpu >> r b w avm fre flt re pi po fr sr mm0 da0 in sy = cs us sy id >> 0 0 30 2034696 54252 12942 7 2 4 13432 1374 0 0 14074 = 8102 2849 65 29 5 >> dT: 1.064s w: 1.000s >> L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps = ms/d %busy Name >> Tue Mar 24 04:20:42 PDT 2020 >> Device 1K-blocks Used Avail Capacity >> /dev/mmcsd0s2b 4404252 403784 4000468 9% >> Mar 24 02:50:01 www kernel: swap_pager: indefinite wait buffer: = bufobj: 0, blkno: 623056, size: 16384 >> Mar 24 04:17:17 www kernel: swap_pager: indefinite wait buffer: = bufobj: 0, blkno: 1033854, size: 4096 >> 9/1023/1032/19177 mbuf clusters in use (current/cache/total/max) >>=20 >> For the above, note the lack of any lines with ms/r or ms/w figures, >> just the title rows show. >>=20 >=20 > Same thing happens when gstat -a is run on an idle machine. > I thought it normal, the CPU had all the data it needed.=20 >=20 >=20 >=20 >>=20 >> procs memory page disks faults = cpu >> r b w avm fre flt re pi po fr sr mm0 da0 in sy = cs us sy id >> 0 0 30 2034696 53668 12942 7 2 4 13432 1374 0 0 14074 = 8102 2849 65 29 5 >> dT: 1.046s w: 1.000s >> L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps = ms/d %busy Name >> 17 1 0 0 0.0 1 31 30042 0 0 = 0.0 320.7 mmcsd0 >> 17 1 0 0 0.0 1 31 30042 0 0 = 0.0 320.7 mmcsd0s2 >> 9 1 0 0 0.0 1 31 30042 0 0 = 0.0 320.7 mmcsd0s2a >> 9 1 0 0 0.0 1 31 30042 0 0 = 0.0 320.7 ufs/rootfs >> Tue Mar 24 04:20:43 PDT 2020 Note the time above compared to the earlier one that I referenced: around 5sec later (=3D=3D 43sec-38sec), despite the ms/w figures involved (over 26sec/w and over 30sec/w with a nothing-listed in the middle). >> Device 1K-blocks Used Avail Capacity >> /dev/mmcsd0s2b 4404252 403784 4000468 9% >> Mar 24 02:50:01 www kernel: swap_pager: indefinite wait buffer: = bufobj: 0, blkno: 623056, size: 16384 >> Mar 24 04:17:17 www kernel: swap_pager: indefinite wait buffer: = bufobj: 0, blkno: 1033854, size: 4096 >> 9/1023/1032/19177 mbuf clusters in use (current/cache/total/max) >>=20 >> Note that the above has 30sec+ ms/w figures. No reads. >>=20 >> Also note that the L(q) figures dropped by only 1 over >> the about 04:20:40 to 04:20:43 interval (18->17 and >> 10->9). (I only take these deltas as suggestive. They >> might be accidental near matches.) No mmcsd0s2b. >>=20 >>=20 >> procs memory page disks faults = cpu >> r b w avm fre flt re pi po fr sr mm0 da0 in sy = cs us sy id >> 0 0 30 2034696 53624 12942 7 2 4 13432 1374 0 0 14074 = 8102 2849 65 29 5 >> dT: 1.003s w: 1.000s >> L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps = ms/d %busy Name >> Tue Mar 24 04:20:45 PDT 2020 >> Device 1K-blocks Used Avail Capacity >> /dev/mmcsd0s2b 4404252 403784 4000468 9% >> Mar 24 02:50:01 www kernel: swap_pager: indefinite wait buffer: = bufobj: 0, blkno: 623056, size: 16384 >> Mar 24 04:17:17 www kernel: swap_pager: indefinite wait buffer: = bufobj: 0, blkno: 1033854, size: 4096 >> 12/1020/1032/19177 mbuf clusters in use (current/cache/total/max) >>=20 >> For the above, note the lack of any lines with ms/r or ms/w figures, >> just the title rows show. >>=20 >=20 > It's tempting, but maybe not correct, the think the I/O caught up. On this side of the 30sec+ example, things are less clear than the above note, so maybe. (Same below.) >>=20 >> procs memory page disks faults = cpu >> r b w avm fre flt re pi po fr sr mm0 da0 in sy = cs us sy id >> 0 0 30 2034696 53600 12942 7 2 4 13432 1374 0 0 14074 = 8102 2849 65 29 5 >> dT: 1.063s w: 1.000s >> L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps = ms/d %busy Name >> Tue Mar 24 04:20:47 PDT 2020 >> Device 1K-blocks Used Avail Capacity >> /dev/mmcsd0s2b 4404252 403784 4000468 9% >> Mar 24 02:50:01 www kernel: swap_pager: indefinite wait buffer: = bufobj: 0, blkno: 623056, size: 16384 >> Mar 24 04:17:17 www kernel: swap_pager: indefinite wait buffer: = bufobj: 0, blkno: 1033854, size: 4096 >> 12/1020/1032/19177 mbuf clusters in use (current/cache/total/max) >>=20 >> For the above, note the lack of any lines with ms/r or ms/w figures, >> just the title rows show. >>=20 >=20 > Can't that be interpreted as gstat -a having nothing to report?=20 > Just _why_ it might have nothing to report never crossed my mind.... >=20 >> procs memory page disks faults = cpu >> r b w avm fre flt re pi po fr sr mm0 da0 in sy = cs us sy id >> 0 0 30 2034696 53592 12942 7 2 4 13432 1374 0 0 14074 = 8102 2849 65 29 5 >> dT: 1.006s w: 1.000s >> L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps = ms/d %busy Name >> Tue Mar 24 04:20:48 PDT 2020 >> Device 1K-blocks Used Avail Capacity >> /dev/mmcsd0s2b 4404252 403784 4000468 9% >> Mar 24 02:50:01 www kernel: swap_pager: indefinite wait buffer: = bufobj: 0, blkno: 623056, size: 16384 >> Mar 24 04:17:17 www kernel: swap_pager: indefinite wait buffer: = bufobj: 0, blkno: 1033854, size: 4096 >> 15/1017/1032/19177 mbuf clusters in use (current/cache/total/max) >>=20 >> For the above, note the lack of any lines with ms/r or ms/w figures, >> just the title rows show. >>=20 >>=20 >>=20 >> procs memory page disks faults = cpu >> r b w avm fre flt re pi po fr sr mm0 da0 in sy = cs us sy id >> 0 0 30 2034696 53672 12942 7 2 4 13432 1374 0 0 14074 = 8102 2849 65 29 5 >> dT: 1.002s w: 1.000s >> L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps = ms/d %busy Name >> Tue Mar 24 04:20:50 PDT 2020 >> Device 1K-blocks Used Avail Capacity >> /dev/mmcsd0s2b 4404252 403784 4000468 9% >> Mar 24 02:50:01 www kernel: swap_pager: indefinite wait buffer: = bufobj: 0, blkno: 623056, size: 16384 >> Mar 24 04:17:17 www kernel: swap_pager: indefinite wait buffer: = bufobj: 0, blkno: 1033854, size: 4096 >> 15/1017/1032/19177 mbuf clusters in use (current/cache/total/max) >>=20 >> For the above, note the list of any lines with ms/r or ms/w figures, >> just the title rows. >>=20 >>=20 >>=20 >> procs memory page disks faults = cpu >> r b w avm fre flt re pi po fr sr mm0 da0 in sy = cs us sy id >> 2 0 30 2045028 54160 12942 7 2 4 13432 1374 0 0 14074 = 8102 2849 65 29 5 >> dT: 1.001s w: 1.000s >> L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps = ms/d %busy Name >> 2 329 236 3839 13.0 93 1262 18.6 0 0 = 0.0 99.4 mmcsd0 >> 2 329 236 3839 13.2 93 1262 18.7 0 0 = 0.0 101.1 mmcsd0s2 >> 1 153 152 2816 12.5 1 32 366.5 0 0 = 0.0 95.2 mmcsd0s2a >> 1 176 84 1023 14.5 92 1230 14.9 0 0 = 0.0 66.8 mmcsd0s2b >> 1 153 152 2816 12.5 1 32 366.6 0 0 = 0.0 95.3 ufs/rootfs >> Tue Mar 24 04:20:51 PDT 2020 >> Device 1K-blocks Used Avail Capacity >> /dev/mmcsd0s2b 4404252 296976 4107276 7% >> Mar 24 04:20:50 www kernel: swap_pager: indefinite wait buffer: = bufobj: 0, blkno: 21842, size: 12288 >> Mar 24 04:20:50 www kernel: swap_pager: indefinite wait buffer: = bufobj: 0, blkno: 25477, size: 4096 >> 0/1032/1032/19177 mbuf clusters in use (current/cache/total/max) >>=20 >> For the above, the lines with ms/r and ms/w figures are back >> but the indefinite wait buffer for "Mar 24 04:20:50" happened >> too, before the "Tue Mar 24 04:20:51 PDT 2020" above. I'd guess >> they are from before the data that the gstat output is based on. >>=20 >> Also there are lots of reads and some writes in the above >> gstat output. mmcsd0s2b is again showing as well. >>=20 >> Note: It looks like ufs/rootfs might be a label >> identifying mmcsd0s2a, so they track up to slight >> time-of-information differences. I'm guessing that >> mmcsd0s2b is just a swap/paging partition. >>=20 >=20 > That's correct. =20 >=20 >> So it looks like activity for ufs/rootfs may have >> blocked/delayed activity for paging/swaping >> (mmcsd0s2b), at least for a time, even if it is >> not directly the cause of the specific OOM >> activity. >>=20 >=20 > How would one distinguish slow swap i/o from slow > filesystem i/o ?=20 The 26sec+ and 30sec+ examples do not list mmcsd0s2b but do list mmcsd0s2a (and ufs/rootfs). Presumably that means no reas/writes to mmcsd0s2b in the time frame spanned. That is what I was referring to. I was not comparing against mmcsd0s2b I/O activity. > Would moving swap to a USB device avoid blocking=20 > by writes to microSD? That's not hard to try, but > it puts a whole new kettle of fish on the fire. Presuming the USB device is well performing, splitting the load across buses might well help. A good USB SSD might handle both activities better than involving the microsd card at all(?). It is possible to have only /boot/ materials on ufs on the microsd card and so to have the (ufs) root file system on the USB. For example, in the Pine64+2GB's /boot/loader.conf on the microsd card I have: vfs.root.mountfrom=3D"ufs:/dev/gpt/PINE642Groot" That label identifies the USB SSD's ufs partition holding the root file system. On that file system is an empty directory /microsd_ufs used as a mount point to get to the microsd card's ufs-based /boot/ . # gpart show -p =3D> 63 249737153 mmcsd0 MBR (119G) 63 16380 - free - (8.0M) 16443 131040 mmcsd0s1 fat32lba [active] (64M) 147483 997 - free - (499K) 148480 241172480 mmcsd0s2 freebsd (115G) 241320960 8416256 - free - (4.0G) =3D> 0 241172480 mmcsd0s2 BSD (115G) 0 230686720 mmcsd0s2a freebsd-ufs (110G) 230686720 10485760 - free - (5.0G) =3D> 40 468862048 da0 GPT (224G) 40 2008 - free - (1.0M) 2048 413138944 da0p1 freebsd-ufs (197G) 413140992 6291456 da0p2 freebsd-swap (3.0G) 419432448 6291456 da0p4 freebsd-swap (3.0G) 425723904 43138184 - free - (21G) # gpart show -pl =3D> 63 249737153 mmcsd0 MBR (119G) 63 16380 - free - (8.0M) 16443 131040 mmcsd0s1 (null) [active] (64M) 147483 997 - free - (499K) 148480 241172480 mmcsd0s2 (null) (115G) 241320960 8416256 - free - (4.0G) =3D> 0 241172480 mmcsd0s2 BSD (115G) 0 230686720 mmcsd0s2a (null) (110G) 230686720 10485760 - free - (5.0G) =3D> 40 468862048 da0 GPT (224G) 40 2008 - free - (1.0M) 2048 413138944 da0p1 PINE642Groot (197G) 413140992 6291456 da0p2 PINE642Gswap (3.0G) 419432448 6291456 da0p4 PINE642Gswp2 (3.0G) 425723904 43138184 - free - (21G) After booting it looks like: # df -m Filesystem 1M-blocks Used Avail Capacity Mounted on /dev/gpt/PINE642Groot 195378 34613 145135 19% / devfs 0 0 0 100% /dev /dev/label/PINE64P2Groot 109101 75 100297 0% /microsd_ufs /dev/label/PINE642GAboot 63 43 20 69% /boot/efi (The Pine64+2GB uses a /boot/efi/ instead of a /boot/msdos/ .) # swapinfo Device 1K-blocks Used Avail Capacity /dev/gpt/PINE642Gswap 3145728 0 3145728 0% /dev/gpt/PINE642Gswp2 3145728 0 3145728 0% Total 6291456 0 6291456 0% (I'm not suggesting that having 2 swap partitions is important, it just is what I happen to have.) # dumpon -l gpt/PINE642Gswp2,gpt/PINE642Gswp2 (The duplication is a result of something I report later.) I also have in the USB SSD's ufs file system's /etc/fstab : /dev/label/PINE64P2Groot /microsd_ufs ufs rw,noatime = 1 1 (Again, I used labeling, in this case glabel based on the microsd card. Having a /boot/ is a subset of having a full root file system and I still used "root" terminology in the label.) I keep a /boot/ on the USB SSD and update the microsd card's copy from the USB copy: # more /boot/copy_boot_to_microsd.sh=20 rsync -axHh --info=3Dprogress2 --exclude=3D/boot/entropy --delete -r = /boot /microsd_ufs/ (I do not have /boot/entropy on the USB SSD, just on the microsd card at its /boot/entropy . The exclude prevents the --delete from touching the /boot/entropy file on the microsd card.) I also have in /etc/rc.conf (so: on the USB SSD): entropy_boot_file=3D"/microsd_ufs/boot/entropy" For reference: # ls -ldT /microsd_ufs/* -r--r--r-- 1 root wheel 6170 Feb 1 04:48:34 2020 = /microsd_ufs/COPYRIGHT drwxr-xr-x 11 root wheel 1024 Mar 24 19:06:27 2020 /microsd_ufs/boot (The COPYRIGHT file is, by content, the normal one that is on a root filesystem.) # ls -ldT /microsd_ufs/*/* -r--r--r-- 1 root wheel 3547 Mar 14 15:05:54 2020 = /microsd_ufs/boot/beastie.4th . . . -rw------- 1 root wheel 4096 Dec 31 16:00:25 2009 = /microsd_ufs/boot/entropy . . . drwxr-xr-x 2 root wheel 512 Dec 23 22:17:51 2019 = /microsd_ufs/boot/zfs (The date/time for /microsd_ufs/boot/entropy and /entropy are from before time is adjusted via ntp, so the dates can be odd.) I also use: dumpdev=3D"/dev/gpt/PINE642Gswp2" in /boot/loader.conf (both media) and in /etc/rc.conf . (This leads to the duplication noted earlier.) The same media-pair (microsd card plus USB SSD) can boot the RPi3. For the Pine64+2GB and RPi3, these days I use a small powered USB hub instead of directly plugging in the USB SSD to the arm-based board. I had power problems otherwise, at least on the Pine64+2GB in recent times. Technically, I use a USB3 hub and a USB3 SSD, both of which happen to allow USB2 use as well. The OPi+2e (armv7) and Rock64 (aarch64) are set up similarly. In all cases the USB SSDs used happen to perform better than the microsd cards used. (On the Rock64, the USB SSD also out performs the e.MMC that is in use --and the e.MMC outperforms the microsd card.) (The Rock64 USB3 is not operational yet. As stands, it too is a USB2 based context in my use.) I do not have other external USB devices involved. Mulitple external USB devices may share an internal hub on some arm boards, leading to a lack of multi-channel performance gains for multiple drives. > It should be added that the original buildworld has been restarted = with > vm.pfault_oom_attempts: -1 and so far, though it still generates the > "indefinite wait" warnings, the ms/w times look better. It's not done = yet, > though. How much overhead comes with OOMA? I doubt the ms/w figures span execution of the code that deals with OOM kill decisions or retries logic. But vm.pfault_oom_attempts=3D=3D-1 would avoid any retries, if I understand right. That in turn might put less of a load on the microsd card? (Guess work on my part.) =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?0FF6BC4C-296F-49F3-8FB8-AA87A49349E2>