Date: Tue, 24 Mar 2020 11:55:18 -0700 From: bob prohaska <fbsd@www.zefox.net> To: Mark Millard <marklmi@yahoo.com> Cc: freebsd-arm@freebsd.org, bob prohaska <fbsd@www.zefox.net> Subject: Re: Belated out of swap kill on rpi3 at r359216 Message-ID: <20200324185518.GA92311@www.zefox.net> In-Reply-To: <83E41A13-6C24-4B56-A837-779044038FBC@yahoo.com> References: <20200324155753.GA91922@www.zefox.net> <83E41A13-6C24-4B56-A837-779044038FBC@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Mar 24, 2020 at 10:09:28AM -0700, Mark Millard wrote: > > > On 2020-Mar-24, at 08:57, bob prohaska <fbsd at www.zefox.net> wrote: > > > An attempt to buildworld on an rpi3 running r359216 stopped with an > > OOMA kill. Sources were at 359264, loader.conf contained > > vm.pageout_oom_seq="4096" . > > What of the value of vm.pfault_oom_attempts ? > > If vm.pfault_oom_attempts was not -1, > what of the value of vm.pfault_oom_wait ? > bob@www:/usr/src % sysctl vm.pfault_oom_wait vm.pfault_oom_wait: 10 I didn't change it, this must be the default. Would it be useful to add something like vm.pfault_oom_wait="20" to /boot/loader.conf? > > A snipped of gstat log suggests the worst congestion in the storage I/O > > happened at Tue Mar 24 04:52:26 PDT 2020 with an L(q) of 37, but the > > OOMA kill happened at Tue Mar 24 04:53:04 PDT 2020, by which time the > > L(q) had dropped to one, half a minute later. > > > > Is the delay in OOMA action to be expected? > > > > Here's the relevant part of the log, I hope the columns display readably: > > > > 0/2/2/19177 mbuf clusters in use (current/cache/total/max) > > procs memory page disks faults cpu > > r b w avm fre flt re pi po fr sr mm0 da0 in sy cs us sy id > > 0 0 29 1897320 47312 12851 9 4 5 13330 1669 0 0 14172 8020 3034 65 29 6 > > dT: 1.056s w: 1.000s > > L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps ms/d %busy Name > > 37 367 323 5463 6.4 45 1511 76.8 0 0 0.0 91.7 mmcsd0 > > 37 367 323 5463 6.5 45 1511 76.9 0 0 0.0 93.3 mmcsd0s2 > > 34 133 111 3209 7.6 22 697 134.7 0 0 0.0 74.0 mmcsd0s2a > > 3 235 212 2254 5.9 23 814 21.5 0 0 0.0 70.0 mmcsd0s2b > > 34 133 111 3209 7.6 22 697 134.7 0 0 0.0 74.2 ufs/rootfs > > Tue Mar 24 04:52:26 PDT 2020 > > Device 1K-blocks Used Avail Capacity > > /dev/mmcsd0s2b 4404252 224484 4179768 5% > > Mar 24 04:20:50 www kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 21842, size: 12288 > > Mar 24 04:20:50 www kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 25477, size: 4096 > > How bad were things back when the "indefinate wait buffer" notices > were generated (Mar 24 04:20:50 time frame)? > It looks like _new_ indefinite wait messages started coming at around Mon Mar 23 23:00:28 PDT 2020: procs memory page disks faults cpu r b w avm fre flt re pi po fr sr mm0 da0 in sy cs us sy id 2 0 18 1588824 197676 14108 2 1 0 14759 238 0 0 14055 9263 2668 62 32 5 dT: 1.027s w: 1.000s L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps ms/d %busy Name 9 1 0 0 0.0 1 31 13451 0 0 0.0 326.9 mmcsd0 9 1 0 0 0.0 1 31 13451 0 0 0.0 326.9 mmcsd0s2 6 1 0 0 0.0 1 31 13451 0 0 0.0 326.9 mmcsd0s2a 6 1 0 0 0.0 1 31 13451 0 0 0.0 326.9 ufs/rootfs Mon Mar 23 23:00:28 PDT 2020 Device 1K-blocks Used Avail Capacity /dev/mmcsd0s2b 4404252 245288 4158964 6% Mar 23 23:00:28 www kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 280324, size: 8192 Mar 23 23:00:28 www kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 285903, size: 16384 Admittedly, 300% busy looks pretty bad. Still the machine didn't quit..... On the next sample %busy went down to 18% for swap, less for all else. > > Below I show code changes to be more explicit in the > output messaging about what contributes to initiating > OOM kills, without needing boot verbose or the like. > There are also some messages from Mark J.'s old code > for reporting on things related to initiating OOM > kills, or some minor variations of them. > > You may want to try such changes to provide more > context for your OOM kills when they happen. Below > the 4 reasons reported are (showing the most > detailed of the related messages from different > stages): > > swp_pager_meta_build: swap blk uma zone exhausted > > swp_pager_meta_build: swap pctrie uma zone exhausted > > vm_fault_allocate: proc %d (%s) failed to alloc page on fault, starting OOM > > m_pageout_mightbe_oom: kill context: v_free_count: %u, v_inactive_count: %u > > Could inquiries instead be added to the logging script? Right now it's #!/bin/sh while true do vmstat ; gstat -abd -I 1s ; date ; swapinfo ; tail -n 2 /var/log/messages ; netstat -m | grep "mbuf clusters" done I'd much rather tamper with the logging script than the kernel 8-) Many thanks for all your help! bob prohaska
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20200324185518.GA92311>