Date: Tue, 24 Mar 2020 15:46:58 -0700 From: bob prohaska <fbsd@www.zefox.net> To: Mark Millard <marklmi@yahoo.com> Cc: freebsd-arm@freebsd.org, bob prohaska <fbsd@www.zefox.net> Subject: Re: Belated out of swap kill on rpi3 at r359216 Message-ID: <20200324224658.GA92726@www.zefox.net> In-Reply-To: <75CE3C07-8A0A-4D32-84C6-24BEA967447E@yahoo.com> References: <20200324155753.GA91922@www.zefox.net> <83E41A13-6C24-4B56-A837-779044038FBC@yahoo.com> <20200324185518.GA92311@www.zefox.net> <75CE3C07-8A0A-4D32-84C6-24BEA967447E@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
The logfile being discussed is at http://www.zefox.net/~fbsd/rpi3/swaptests/r359216/swapscript.osupdate.log for convenient reference. More replies below. On Tue, Mar 24, 2020 at 01:21:42PM -0700, Mark Millard wrote: > > > On 2020-Mar-24, at 11:55, bob prohaska <fbsd at www.zefox.net> wrote: > > > On Tue, Mar 24, 2020 at 10:09:28AM -0700, Mark Millard wrote: > >> > >> > >> On 2020-Mar-24, at 08:57, bob prohaska <fbsd at www.zefox.net> wrote: > >> > >>> An attempt to buildworld on an rpi3 running r359216 stopped with an > >>> OOMA kill. Sources were at 359264, loader.conf contained > >>> vm.pageout_oom_seq="4096" . > >> > >> What of the value of vm.pfault_oom_attempts ? > >> > >> If vm.pfault_oom_attempts was not -1, > >> what of the value of vm.pfault_oom_wait ? > >> > > bob@www:/usr/src % sysctl vm.pfault_oom_wait > > vm.pfault_oom_wait: 10 > > I didn't change it, this must be the default. > > Would it be useful to add something like > > vm.pfault_oom_wait="20" > > to /boot/loader.conf? > > I assume that this answer means vm.pfault_oom_attempts > was not -1. But you did not list what it was. > Sorry, the variable names are running together in my head. bob@www:/usr/src % sysctl vm.pfault_oom_attempts vm.pfault_oom_attempts: 3 It's now reset to -1, hopefully it'll work better than last time 8-) > vm.pfault_oom_attempts=-1 is the means of avoiding > the the vm.pfault_oom_attempts,vm.pfault_oom_wait > pair from causing OOM at all. (vm.pfault_oom_wait > becomes irrelevant.) > > When vm.pfault_oom_attempts != -1 , then there are > potential time-outs that overall involve: > > vm.pfault_oom_attempts * vm.pfault_oom_wait > > but has some tradeoffs in the partitioning between > the 2 factors: > > # sysctl -d vm.pfault_oom_attempts > vm.pfault_oom_attempts: Number of page allocation attempts in page fault handler before it triggers OOM handling > > # sysctl -d vm.pfault_oom_wait > vm.pfault_oom_wait: Number of seconds to wait for free pages before retrying the page fault handler > > As I understand, the following cases could each have > very different results depending on what the latencies > are like and such: > > vm.pfault_oom_attempts==20 && vm.pfault_oom_wait==1 > vs. > vm.pfault_oom_attempts==1 && vm.pfault_oom_wait==20 > vs. > vm.pfault_oom_attempts==4 && vm.pfault_oom_wait==5 > > As I remember, Konstantin Belousov once asked someone > that was having a repeatable problem to try some > alternatives that explored this but, as I remember, > he got no reply to the request. > Konstantin wrote to both me and the list in a thread on Re: panic: non-current pmap. If it's relevant please indicate. He also wrote to Don Lewis, in a thread on Re: spurious out of swap kills but that was on a " 16-thread Ryzen machine, with 64 GB of RAM and 40 GB of swap" which is surely unlike my predicament 8-) > The person might have just used: vm.pfault_oom_attempts=-1 > and continued with their primary activity, for all I know. > > vm.pfault_oom_attempts=-1 is only recommended when one > is sure that they will not run out of swap/paging space, > if I understand right. > > For reference, for other settings: > > # sysctl -d vm.pageout_oom_seq > vm.pageout_oom_seq: back-to-back calls to oom detector to start OOM > > There is also: > > # sysctl -d vm.oom_pf_secs > vm.oom_pf_secs: > > but it seems to have an empty description. May be > it reports vm.pfault_oom_attempts * vm.pfault_oom_wait > when vm.pfault_oom_attempts != -1 . (I've not looked.) > > >>> A snipped of gstat log suggests the worst congestion in the storage I/O > >>> happened at Tue Mar 24 04:52:26 PDT 2020 with an L(q) of 37, but the > >>> OOMA kill happened at Tue Mar 24 04:53:04 PDT 2020, by which time the > >>> L(q) had dropped to one, half a minute later. > >>> > >>> Is the delay in OOMA action to be expected? > >>> > >>> Here's the relevant part of the log, I hope the columns display readably: > >>> > >>> 0/2/2/19177 mbuf clusters in use (current/cache/total/max) > >>> procs memory page disks faults cpu > >>> r b w avm fre flt re pi po fr sr mm0 da0 in sy cs us sy id > >>> 0 0 29 1897320 47312 12851 9 4 5 13330 1669 0 0 14172 8020 3034 65 29 6 > >>> dT: 1.056s w: 1.000s > >>> L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps ms/d %busy Name > >>> 37 367 323 5463 6.4 45 1511 76.8 0 0 0.0 91.7 mmcsd0 > >>> 37 367 323 5463 6.5 45 1511 76.9 0 0 0.0 93.3 mmcsd0s2 > >>> 34 133 111 3209 7.6 22 697 134.7 0 0 0.0 74.0 mmcsd0s2a > >>> 3 235 212 2254 5.9 23 814 21.5 0 0 0.0 70.0 mmcsd0s2b > >>> 34 133 111 3209 7.6 22 697 134.7 0 0 0.0 74.2 ufs/rootfs > >>> Tue Mar 24 04:52:26 PDT 2020 > >>> Device 1K-blocks Used Avail Capacity > >>> /dev/mmcsd0s2b 4404252 224484 4179768 5% > >>> Mar 24 04:20:50 www kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 21842, size: 12288 > >>> Mar 24 04:20:50 www kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 25477, size: 4096 > >> > >> How bad were things back when the "indefinate wait buffer" notices > >> were generated (Mar 24 04:20:50 time frame)? > >> > > It looks like _new_ indefinite wait messages started coming at around Mon Mar 23 23:00:28 PDT 2020: > > procs memory page disks faults cpu > > r b w avm fre flt re pi po fr sr mm0 da0 in sy cs us sy id > > 2 0 18 1588824 197676 14108 2 1 0 14759 238 0 0 14055 9263 2668 62 32 5 > > dT: 1.027s w: 1.000s > > L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps ms/d %busy Name > > 9 1 0 0 0.0 1 31 13451 0 0 0.0 326.9 mmcsd0 > > 9 1 0 0 0.0 1 31 13451 0 0 0.0 326.9 mmcsd0s2 > > 6 1 0 0 0.0 1 31 13451 0 0 0.0 326.9 mmcsd0s2a > > 6 1 0 0 0.0 1 31 13451 0 0 0.0 326.9 ufs/rootfs > > Mon Mar 23 23:00:28 PDT 2020 > > Device 1K-blocks Used Avail Capacity > > /dev/mmcsd0s2b 4404252 245288 4158964 6% > > Mar 23 23:00:28 www kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 280324, size: 8192 > > Mar 23 23:00:28 www kernel: swap_pager: indefinite wait buffer: bufobj: 0, blkno: 285903, size: 16384 > > > > Admittedly, 300% busy looks pretty bad. Still the machine didn't quit..... > > On the next sample %busy went down to 18% for swap, less for all else. > > As I understand, the ms/w value 13451 means that there was > for a time (a mean of ?) somewhat under 14 seconds from a > write being queued to it being completed. > > If a read could be blocked for such time frames by such > writes, it suggests that vm.pfault_oom_wait might need to > be larger than 13 if vm.pfault_oom_attempts=-1 is not in > use. > Ok, I'm starting to get it. On this machine it's 10. But the 13 second delay appeared at Mon Mar 23 23:00:28 PDT 2020. The kill occurred around Tue Mar 24 04:53:08 PDT 2020 under what look like much more benign circumstances. > Or vm.pfault_oom_attempts=-1 could be used to avoid large > time frames from directly leading to OOM activity. > Understood. I simply forgot to restore the setting after the initial troubles with it. > >> > >> Below I show code changes to be more explicit in the > >> output messaging about what contributes to initiating > >> OOM kills, without needing boot verbose or the like. > >> There are also some messages from Mark J.'s old code > >> for reporting on things related to initiating OOM > >> kills, or some minor variations of them. > >> > > > >> You may want to try such changes to provide more > >> context for your OOM kills when they happen. Below > >> the 4 reasons reported are (showing the most > >> detailed of the related messages from different > >> stages): > >> > >> swp_pager_meta_build: swap blk uma zone exhausted > >> > >> swp_pager_meta_build: swap pctrie uma zone exhausted > >> > >> vm_fault_allocate: proc %d (%s) failed to alloc page on fault, starting OOM > >> > >> m_pageout_mightbe_oom: kill context: v_free_count: %u, v_inactive_count: %u > >> > >> > > > > Could inquiries instead be added to the logging script? > > Right now it's > > #!/bin/sh > > while true > > do vmstat ; gstat -abd -I 1s ; date ; swapinfo ; tail -n 2 /var/log/messages ; netstat -m | grep "mbuf clusters" > > done > > > > I'd much rather tamper with the logging script than the kernel 8-) > > Unfortunately, the reason I made the kernel messaging > changes is that, as far as I know, the normal kernel > simply does not publish the information anywhere when > the internal criteria leads to OOM activity: No > inquiry available without kernel changes. If all else fails I'll try to apply your patches to the kernel and recompile. With my thanks, bob prohaska
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20200324224658.GA92726>