Date: Sun, 12 Aug 2018 20:36:01 -0700 From: Mark Millard <marklmi@yahoo.com> To: bob prohaska <fbsd@www.zefox.net> Cc: Mark Johnston <markj@FreeBSD.org>, John Kennedy <warlock@phouka.net>, freebsd-arm <freebsd-arm@freebsd.org> Subject: Re: RPI3 swap experiments ["was killed: out of swap space" with: "v_free_count: 5439, v_inactive_count: 1"] Message-ID: <0D8B9A29-DD95-4FA3-8F7D-4B85A3BB54D7@yahoo.com> In-Reply-To: <20180813021226.GA46750@www.zefox.net> References: <EC74A5A6-0DF4-48EB-88DA-543FD70FEA07@yahoo.com> <20180806155837.GA6277@raichu> <20180808153800.GF26133@www.zefox.net> <20180808204841.GA19379@raichu> <2DC1A479-92A0-48E6-9245-3FF5CFD89DEF@yahoo.com> <20180809033735.GJ30738@phouka1.phouka.net> <20180809175802.GA32974@www.zefox.net> <20180812173248.GA81324@phouka1.phouka.net> <20180812224021.GA46372@www.zefox.net> <B81E53A9-459E-4489-883B-24175B87D049@yahoo.com> <20180813021226.GA46750@www.zefox.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2018-Aug-12, at 7:12 PM, bob prohaska <fbsd at www.zefox.net> wrote: > On Sun, Aug 12, 2018 at 04:23:31PM -0700, Mark Millard wrote: >> On 2018-Aug-12, at 3:40 PM, bob prohaska <fbsd at www.zefox.net> = wrote: >>=20 >>> On Sun, Aug 12, 2018 at 10:32:48AM -0700, John Kennedy wrote: >>>> . . . >>> Setting vm.pageout_oom_seq to 120 made a decisive improvement, = almost allowing >>> buildworld to finish. By the time I tried CAM_IOSCHED_DYNAMIC = buildworld was >>> getting only about half as far, so it seems the patches were harmful = to a degree. >>> Changes were applied in the order=20 >>=20 >> You could experiment with figures bigger than 120 for >> vm.pageout_oom_seq . >>=20 > Could anybody hazard a guess as to how much? The leap from 12 to 120 = rather > startled me, I thought a factor of two a big adjustment. Maybe go to = 240, > or is that insignificant? I'd keep multiplying by 10 until it works (or fails some other way), then back off by smaller factors if you want a narrower range to be known between failing and working (or failing differently). >> I'll note that the creation of this mechanism seems >> to be shown for -r290920 at: >>=20 >> = https://lists.freebsd.org/pipermail/svn-src-head/2015-November/078968.html= >>=20 >> In part is says: >>=20 >> . . . only raise OOM when pagedaemon is unable to produce a free >> page in several back-to-back passes. Track the failed passes per >> pagedaemon thread. >>=20 >> The number of passes to trigger OOM was selected empirically and >> tested both on small (32M-64M i386 VM) and large (32G amd64) >> configurations. If the specifics of the load require tuning, sysctl >> vm.pageout_oom_seq sets the number of back-to-back passes which must >> fail before OOM is raised. Each pass takes 1/2 of seconds. Less = the >> value, more sensible the pagedaemon is to the page shortage. >>=20 >> The code shows: >>=20 >> int vmd_oom_seq >>=20 >> and it looks like fairly large values would be >> tolerated. You may be able to scale beyond >> the problem showing up in your context. >=20 > Would 1024 be enough to turn OOMA off completely? That's what I = originally wanted to=20 > try. As far as I know until arithmetic fails for the sizes involved, it scales. The factor of 10 rule makes the number of tests logarithmic to find an sufficient upper bound (if there is an upper bound). After that with high/low bounds binary searching is a possibility. (That ignores any effort at determining repeatability.) >>=20 >>> pageout=20 >>> batchqueue >>> slow_swap >>> iosched >>=20 >> For my new Pine64+ 2GB experiments I've only applied >> the Mark J. reporting patches, not the #define one. >> Nor have I involved CAM_IOSCHED_DYNAMIC. >>=20 >> But with 2 GiBytes of RAM and the default 12 for >> vm.pageout_oom_seq I got: >>=20 >> v_free_count: 7773, v_inactive_count: 1 >> Aug 12 09:30:13 pine64 kernel: pid 80573 (c++), uid 0, was killed: = out of swap space >>=20 >> with no other reports from Mark Johnston's reporting >> patches. >>=20 >> It appears that long I/O latencies as seen by the >> subsystem are not necessary to ending up with OOM >> kills, even if they can contribute when they occur. >>=20 >=20 > It has seemed to me in the past that OOMA kills aren't closely-tied to = busy > swap. They do seem closely-related to busy storage (swap and disk). My Pine64+ 2GB experiment suggests to me that for 4 cores running 4 processes (threads) at basically 100% per core, with the processes/threads allocating and using ever the more memory actively over time, without freeing=20 until near the end, will lead to the OOM kills if they run long enough. (I'm taking the rest of the processes as being relatively idle, not freeing up very much memory explicitly very often. This is much like the -j4 buildworld buildkernel in my context.) I'd not be surprised if a programs (threads) that do no explicit I/O would get the same result if the memory use and the "compute/memory bound" property was similar. >> (7773 * 4 KiBytes =3D 31,838,298 Bytes, by the way.) >>=20 > The RPI3 seems to start adding to swap use when free memory drops = below about 20 MB, > Does that seem consistent with your observations? I did not record anything that would show when for the first Pine64+ 2GB experiment. There were around 19 MiBytes of in-use swap left around from before at the start of the 2nd test. Also not the best for finding when things start. But the first increment beyond 19M was (two lines from top output for each time): Sun Aug 12 16:58:19 PDT 2018 Mem: 1407M Active, 144M Inact, 18M Laundry, 352M Wired, 202M Buf, 43M = Free Swap: 3072M Total, 19M Used, 3053M Free Sun Aug 12 16:58:20 PDT 2018 Mem: 1003M Active, 147M Inact, 15M Laundry, 350M Wired, 202M Buf, 453M = Free Swap: 3072M Total, 22M Used, 3050M Free >>> My RPI3 is now updating to 337688 with no patches/config changes. = I'll start the >>> sequence over and would be grateful if anybody could suggest a = better sequence. >>=20 > It seems rather clear that turning up vm.pageout_oom_seq is the first = thing to try. > The question is how much: 240 (double Mark J.'s number), 1024 (small = for an int on > a 64 bit machine)? I made a recommendation earlier above. I'm still at the 120 test in my context. > If in fact the reporting patches do increase the load on the machine, = is the=20 > slow swap patch the next thing to try, or the iosched option? Maybe = something else > altogether? The slow_swap.patch material is reporting material, and so is one of the patches that I put in place so that I might see messages about: waited ?s for swap suffer [happens for 3 <=3D s] waited ?s for async swap write [happens for 3 <=3D s] thread ? waiting for memory (None of which were produced in my test. As far as I know no one has gotten the thread one.) CAM_IOSCHED_DYNAMIC does not seem to apply to my Pine64+ 2GB test that did not report any I/O latency problems for the subsystem. I've no reason to go that direction from the evidence available. And my tests do not help with identifying how to survive I/O latency problems (so far). For now vm.pageout_oom_seq variation is all the control that seems to fit my context. (Presumes your negative result for VM_BATCHQUEUE_SIZE making an improvement applies.) Other goals/cpontexts get into doing other things. I've no clue if there is anything interesting to control for CAM_IOSCHED_DYNAMIC. Nor for variations on the VM_BATCHQUEUE_SIZE figure beyond the 1 and 7 that did not help your I/O latency context. It does appear to me that you have a bigger problem, more difficult to control, because of the I/O latency involvement. What might work for me might not be sufficient for you, even if it is involved for you. > There's no immediate expectation of fixing things; just to shed a = little light. >=20 For now, as far as I know, Mark Johnston's reporting patches are the means of exposing useful information for whatever range of contexts/configurations. For now I'm just exploring vm.pageout_oom_seq value variations and what is reported (or if it finished without a OOM kill). =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?0D8B9A29-DD95-4FA3-8F7D-4B85A3BB54D7>