FreeBSD Mail Archives

Date:      Mon, 13 Aug 2018 08:48:04 -0700 (PDT)
From:      "Rodney W. Grimes" <freebsd-rwg@pdx.rh.CN85.dnsmgr.net>
To:        Mark Millard <marklmi@yahoo.com>
Cc:        bob prohaska <fbsd@www.zefox.net>, freebsd-arm <freebsd-arm@freebsd.org>,  Mark Johnston <markj@freebsd.org>
Subject:   Re: RPI3 swap experiments ["was killed: out of swap space" with: "v_free_count: 5439, v_inactive_count: 1"]
Message-ID:  <201808131548.w7DFm4e8037721@pdx.rh.CN85.dnsmgr.net>
In-Reply-To: <0D8B9A29-DD95-4FA3-8F7D-4B85A3BB54D7@yahoo.com>

> On 2018-Aug-12, at 7:12 PM, bob prohaska <fbsd at www.zefox.net> wrote:
> 
> > On Sun, Aug 12, 2018 at 04:23:31PM -0700, Mark Millard wrote:
> >> On 2018-Aug-12, at 3:40 PM, bob prohaska <fbsd at www.zefox.net> wrote:
> >> 
> >>> On Sun, Aug 12, 2018 at 10:32:48AM -0700, John Kennedy wrote:
> >>>> . . .
> >>> Setting vm.pageout_oom_seq to 120 made a decisive improvement, almost allowing
> >>> buildworld to finish. By the time I tried CAM_IOSCHED_DYNAMIC buildworld was
> >>> getting only about half as far, so it seems the patches were harmful to a degree.
> >>> Changes were applied in the order 
> >> 
> >> You could experiment with figures bigger than 120 for
> >> vm.pageout_oom_seq .
> >> 
> > Could anybody hazard a guess as to how much? The leap from 12 to 120 rather
> > startled me, I thought a factor of two a big adjustment. Maybe go to 240,
> > or is that insignificant?
> 
> I'd keep multiplying by 10 until it works (or fails some
> other way), then back off by smaller factors if you want
> a narrower range to be known between failing and working
> (or failing differently).
> 
> >> I'll note that the creation of this mechanism seems
> >> to be shown for -r290920 at:
> >> 
> >> https://lists.freebsd.org/pipermail/svn-src-head/2015-November/078968.html
> >> 
> >> In part is says:
> >> 
> >>  . . . only raise OOM when pagedaemon is unable to produce a free
> >>  page in several back-to-back passes.  Track the failed passes per
> >>  pagedaemon thread.
> >> 
> >>  The number of passes to trigger OOM was selected empirically and
> >>  tested both on small (32M-64M i386 VM) and large (32G amd64)
> >>  configurations.  If the specifics of the load require tuning, sysctl
> >>  vm.pageout_oom_seq sets the number of back-to-back passes which must
> >>  fail before OOM is raised.  Each pass takes 1/2 of seconds.  Less the
> >>  value, more sensible the pagedaemon is to the page shortage.
> >> 
> >> The code shows:
> >> 
> >> int vmd_oom_seq
> >> 
> >> and it looks like fairly large values would be
> >> tolerated. You may be able to scale beyond
> >> the problem showing up in your context.
> > 
> > Would 1024 be enough to turn OOMA off completely?  That's what I originally wanted to 
> > try.
> 
> As far as I know until arithmetic fails for the sizes
> involved, it scales.
> 
> The factor of 10 rule makes the number of tests
> logarithmic to find an sufficient upper bound (if
> there is an upper bound). After that with high/low
> bounds binary searching is a possibility.
> 
> (That ignores any effort at determining repeatability.)

Perhaps a binary search of make -j1 buildworld on 
an AMD64 system of memory size that can complete
this job without OOM.   I bet once you find that
value you well find that make -JN scales pretty
well to requiring that amount of hard memory
to complete a buildworld.

My reasonsing is the "can not swap runable processes" that
Mark found in the description of how the  FreeBSD VM system
works.

Swap size/space does not matter for this condition as the
system is not going to swap the large runnable compilers
and linkers that occur during buildworld.


> 
> >> 
> >>> pageout 
> >>> batchqueue
> >>> slow_swap
> >>> iosched
> >> 
> >> For my new Pine64+ 2GB experiments I've only applied
> >> the Mark J. reporting patches, not the #define one.
> >> Nor have I involved CAM_IOSCHED_DYNAMIC.
> >> 
> >> But with 2 GiBytes of RAM and the default 12 for
> >> vm.pageout_oom_seq I got:
> >> 
> >> v_free_count: 7773, v_inactive_count: 1
> >> Aug 12 09:30:13 pine64 kernel: pid 80573 (c++), uid 0, was killed: out of swap space
> >> 
> >> with no other reports from Mark Johnston's reporting
> >> patches.
> >> 
> >> It appears that long I/O latencies as seen by the
> >> subsystem are not necessary to ending up with OOM
> >> kills, even if they can contribute when they occur.
> >> 
> > 
> > It has seemed to me in the past that OOMA kills aren't closely-tied to busy
> > swap. They do seem closely-related to busy storage (swap and disk).
> 
> My Pine64+ 2GB experiment suggests to me that for 4 cores
> running 4 processes (threads) at basically 100% per core,
> with the processes/threads allocating and using ever
> the more memory actively over time, without freeing 
> until near the end, will lead to the OOM kills if they
> run long enough.
> 
> (I'm taking the rest of the processes as being relatively
> idle, not freeing up very much memory explicitly very
> often. This is much like the -j4 buildworld buildkernel
> in my context.)
> 
> I'd not be surprised if a programs (threads) that do no
> explicit I/O would get the same result if the memory
> use and the "compute/memory bound" property was similar.
> 
> >> (7773 * 4 KiBytes = 31,838,298 Bytes, by the way.)
> >> 
> > The RPI3 seems to start adding to swap use when free memory drops below about 20 MB,
> > Does that seem consistent with your observations?
> 
> I did not record anything that would show when for
> the first Pine64+ 2GB experiment.
> 
> There were around 19 MiBytes of in-use swap left around
> from before at the start of the 2nd test. Also not the
> best for finding when things start. But the first increment
> beyond 19M was (two lines from top output for each time):
> 
> Sun Aug 12 16:58:19 PDT 2018
> Mem: 1407M Active, 144M Inact, 18M Laundry, 352M Wired, 202M Buf, 43M Free
> Swap: 3072M Total, 19M Used, 3053M Free
> 
> Sun Aug 12 16:58:20 PDT 2018
> Mem: 1003M Active, 147M Inact, 15M Laundry, 350M Wired, 202M Buf, 453M Free
> Swap: 3072M Total, 22M Used, 3050M Free
> 
> 
> >>> My RPI3 is now updating to 337688 with no patches/config changes. I'll start the
> >>> sequence over and would be grateful if anybody could suggest a better sequence.
> >> 
> > It seems rather clear that turning up  vm.pageout_oom_seq is the first thing to try.
> > The question is how much: 240 (double Mark J.'s number), 1024 (small for an int on
> > a 64 bit machine)?
> 
> I made a recommendation earlier above. I'm still at the 120 test
> in my context.
> 
> > If in fact the reporting patches do increase the load on the machine, is the 
> > slow swap patch the next thing to try, or the iosched option? Maybe something else
> > altogether?
> 
> The slow_swap.patch material is reporting material,
> and so is one of the patches that I put in place so
> that I might see messages about:
> 
> waited ?s for swap suffer      [happens for 3 <= s]
> waited ?s for async swap write [happens for 3 <= s]
> thread ? waiting for memory
> 
> (None of which were produced in my test. As far as
> I know no one has gotten the thread one.)
> 
> CAM_IOSCHED_DYNAMIC does not seem to apply to my
> Pine64+ 2GB test that did not report any I/O latency
> problems for the subsystem. I've no reason to go
> that direction from the evidence available. And my
> tests do not help with identifying how to survive
> I/O latency problems (so far).
> 
> For now vm.pageout_oom_seq variation is all the control
> that seems to fit my context. (Presumes your negative
> result for VM_BATCHQUEUE_SIZE making an improvement
> applies.)
> 
> Other goals/cpontexts get into doing other things. I've
> no clue if there is anything interesting to control for
> CAM_IOSCHED_DYNAMIC. Nor for variations on the
> VM_BATCHQUEUE_SIZE figure beyond the 1 and 7 that did
> not help your I/O latency context.
> 
> It does appear to me that you have a bigger problem,
> more difficult to control, because of the I/O latency
> involvement. What might work for me might not be
> sufficient for you, even if it is involved for you.
> 
> > There's no immediate expectation of fixing things; just to shed a little light.
> > 
> 
> For now, as far as I know, Mark Johnston's reporting patches
> are the means of exposing useful information for whatever
> range of contexts/configurations. For now I'm just
> exploring vm.pageout_oom_seq value variations and what is
> reported (or if it finished without a OOM kill).
> 
> ===
> Mark Millard
> marklmi at yahoo.com
> ( dsl-only.net went
> away in early 2018-Mar)
> 
> _______________________________________________
> freebsd-arm@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-arm
> To unsubscribe, send any mail to "freebsd-arm-unsubscribe@freebsd.org"
> 

-- 
Rod Grimes                                                 rgrimes@freebsd.org

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201808131548.w7DFm4e8037721>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation