From owner-freebsd-arm@freebsd.org Mon Aug 13 15:48:07 2018 Return-Path: Delivered-To: freebsd-arm@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0121010760E3 for ; Mon, 13 Aug 2018 15:48:07 +0000 (UTC) (envelope-from freebsd-rwg@pdx.rh.CN85.dnsmgr.net) Received: from pdx.rh.CN85.dnsmgr.net (br1.CN84in.dnsmgr.net [69.59.192.140]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 5D25C7F676; Mon, 13 Aug 2018 15:48:06 +0000 (UTC) (envelope-from freebsd-rwg@pdx.rh.CN85.dnsmgr.net) Received: from pdx.rh.CN85.dnsmgr.net (localhost [127.0.0.1]) by pdx.rh.CN85.dnsmgr.net (8.13.3/8.13.3) with ESMTP id w7DFm43M037722; Mon, 13 Aug 2018 08:48:04 -0700 (PDT) (envelope-from freebsd-rwg@pdx.rh.CN85.dnsmgr.net) Received: (from freebsd-rwg@localhost) by pdx.rh.CN85.dnsmgr.net (8.13.3/8.13.3/Submit) id w7DFm4e8037721; Mon, 13 Aug 2018 08:48:04 -0700 (PDT) (envelope-from freebsd-rwg) From: "Rodney W. Grimes" Message-Id: <201808131548.w7DFm4e8037721@pdx.rh.CN85.dnsmgr.net> Subject: Re: RPI3 swap experiments ["was killed: out of swap space" with: "v_free_count: 5439, v_inactive_count: 1"] In-Reply-To: <0D8B9A29-DD95-4FA3-8F7D-4B85A3BB54D7@yahoo.com> To: Mark Millard Date: Mon, 13 Aug 2018 08:48:04 -0700 (PDT) CC: bob prohaska , freebsd-arm , Mark Johnston X-Mailer: ELM [version 2.4ME+ PL121h (25)] MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: "Porting FreeBSD to ARM processors." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Aug 2018 15:48:07 -0000 > On 2018-Aug-12, at 7:12 PM, bob prohaska wrote: > > > On Sun, Aug 12, 2018 at 04:23:31PM -0700, Mark Millard wrote: > >> On 2018-Aug-12, at 3:40 PM, bob prohaska wrote: > >> > >>> On Sun, Aug 12, 2018 at 10:32:48AM -0700, John Kennedy wrote: > >>>> . . . > >>> Setting vm.pageout_oom_seq to 120 made a decisive improvement, almost allowing > >>> buildworld to finish. By the time I tried CAM_IOSCHED_DYNAMIC buildworld was > >>> getting only about half as far, so it seems the patches were harmful to a degree. > >>> Changes were applied in the order > >> > >> You could experiment with figures bigger than 120 for > >> vm.pageout_oom_seq . > >> > > Could anybody hazard a guess as to how much? The leap from 12 to 120 rather > > startled me, I thought a factor of two a big adjustment. Maybe go to 240, > > or is that insignificant? > > I'd keep multiplying by 10 until it works (or fails some > other way), then back off by smaller factors if you want > a narrower range to be known between failing and working > (or failing differently). > > >> I'll note that the creation of this mechanism seems > >> to be shown for -r290920 at: > >> > >> https://lists.freebsd.org/pipermail/svn-src-head/2015-November/078968.html > >> > >> In part is says: > >> > >> . . . only raise OOM when pagedaemon is unable to produce a free > >> page in several back-to-back passes. Track the failed passes per > >> pagedaemon thread. > >> > >> The number of passes to trigger OOM was selected empirically and > >> tested both on small (32M-64M i386 VM) and large (32G amd64) > >> configurations. If the specifics of the load require tuning, sysctl > >> vm.pageout_oom_seq sets the number of back-to-back passes which must > >> fail before OOM is raised. Each pass takes 1/2 of seconds. Less the > >> value, more sensible the pagedaemon is to the page shortage. > >> > >> The code shows: > >> > >> int vmd_oom_seq > >> > >> and it looks like fairly large values would be > >> tolerated. You may be able to scale beyond > >> the problem showing up in your context. > > > > Would 1024 be enough to turn OOMA off completely? That's what I originally wanted to > > try. > > As far as I know until arithmetic fails for the sizes > involved, it scales. > > The factor of 10 rule makes the number of tests > logarithmic to find an sufficient upper bound (if > there is an upper bound). After that with high/low > bounds binary searching is a possibility. > > (That ignores any effort at determining repeatability.) Perhaps a binary search of make -j1 buildworld on an AMD64 system of memory size that can complete this job without OOM. I bet once you find that value you well find that make -JN scales pretty well to requiring that amount of hard memory to complete a buildworld. My reasonsing is the "can not swap runable processes" that Mark found in the description of how the FreeBSD VM system works. Swap size/space does not matter for this condition as the system is not going to swap the large runnable compilers and linkers that occur during buildworld. > > >> > >>> pageout > >>> batchqueue > >>> slow_swap > >>> iosched > >> > >> For my new Pine64+ 2GB experiments I've only applied > >> the Mark J. reporting patches, not the #define one. > >> Nor have I involved CAM_IOSCHED_DYNAMIC. > >> > >> But with 2 GiBytes of RAM and the default 12 for > >> vm.pageout_oom_seq I got: > >> > >> v_free_count: 7773, v_inactive_count: 1 > >> Aug 12 09:30:13 pine64 kernel: pid 80573 (c++), uid 0, was killed: out of swap space > >> > >> with no other reports from Mark Johnston's reporting > >> patches. > >> > >> It appears that long I/O latencies as seen by the > >> subsystem are not necessary to ending up with OOM > >> kills, even if they can contribute when they occur. > >> > > > > It has seemed to me in the past that OOMA kills aren't closely-tied to busy > > swap. They do seem closely-related to busy storage (swap and disk). > > My Pine64+ 2GB experiment suggests to me that for 4 cores > running 4 processes (threads) at basically 100% per core, > with the processes/threads allocating and using ever > the more memory actively over time, without freeing > until near the end, will lead to the OOM kills if they > run long enough. > > (I'm taking the rest of the processes as being relatively > idle, not freeing up very much memory explicitly very > often. This is much like the -j4 buildworld buildkernel > in my context.) > > I'd not be surprised if a programs (threads) that do no > explicit I/O would get the same result if the memory > use and the "compute/memory bound" property was similar. > > >> (7773 * 4 KiBytes = 31,838,298 Bytes, by the way.) > >> > > The RPI3 seems to start adding to swap use when free memory drops below about 20 MB, > > Does that seem consistent with your observations? > > I did not record anything that would show when for > the first Pine64+ 2GB experiment. > > There were around 19 MiBytes of in-use swap left around > from before at the start of the 2nd test. Also not the > best for finding when things start. But the first increment > beyond 19M was (two lines from top output for each time): > > Sun Aug 12 16:58:19 PDT 2018 > Mem: 1407M Active, 144M Inact, 18M Laundry, 352M Wired, 202M Buf, 43M Free > Swap: 3072M Total, 19M Used, 3053M Free > > Sun Aug 12 16:58:20 PDT 2018 > Mem: 1003M Active, 147M Inact, 15M Laundry, 350M Wired, 202M Buf, 453M Free > Swap: 3072M Total, 22M Used, 3050M Free > > > >>> My RPI3 is now updating to 337688 with no patches/config changes. I'll start the > >>> sequence over and would be grateful if anybody could suggest a better sequence. > >> > > It seems rather clear that turning up vm.pageout_oom_seq is the first thing to try. > > The question is how much: 240 (double Mark J.'s number), 1024 (small for an int on > > a 64 bit machine)? > > I made a recommendation earlier above. I'm still at the 120 test > in my context. > > > If in fact the reporting patches do increase the load on the machine, is the > > slow swap patch the next thing to try, or the iosched option? Maybe something else > > altogether? > > The slow_swap.patch material is reporting material, > and so is one of the patches that I put in place so > that I might see messages about: > > waited ?s for swap suffer [happens for 3 <= s] > waited ?s for async swap write [happens for 3 <= s] > thread ? waiting for memory > > (None of which were produced in my test. As far as > I know no one has gotten the thread one.) > > CAM_IOSCHED_DYNAMIC does not seem to apply to my > Pine64+ 2GB test that did not report any I/O latency > problems for the subsystem. I've no reason to go > that direction from the evidence available. And my > tests do not help with identifying how to survive > I/O latency problems (so far). > > For now vm.pageout_oom_seq variation is all the control > that seems to fit my context. (Presumes your negative > result for VM_BATCHQUEUE_SIZE making an improvement > applies.) > > Other goals/cpontexts get into doing other things. I've > no clue if there is anything interesting to control for > CAM_IOSCHED_DYNAMIC. Nor for variations on the > VM_BATCHQUEUE_SIZE figure beyond the 1 and 7 that did > not help your I/O latency context. > > It does appear to me that you have a bigger problem, > more difficult to control, because of the I/O latency > involvement. What might work for me might not be > sufficient for you, even if it is involved for you. > > > There's no immediate expectation of fixing things; just to shed a little light. > > > > For now, as far as I know, Mark Johnston's reporting patches > are the means of exposing useful information for whatever > range of contexts/configurations. For now I'm just > exploring vm.pageout_oom_seq value variations and what is > reported (or if it finished without a OOM kill). > > === > Mark Millard > marklmi at yahoo.com > ( dsl-only.net went > away in early 2018-Mar) > > _______________________________________________ > freebsd-arm@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-arm > To unsubscribe, send any mail to "freebsd-arm-unsubscribe@freebsd.org" > -- Rod Grimes rgrimes@freebsd.org