Date: Sat, 4 Aug 2018 22:56:42 -0700 From: Mark Millard <marklmi@yahoo.com> To: John-Mark Gurney <jmg@funkthat.com> Cc: Jamie Landeg-Jones <jamie@catflap.org>, bob prohaska <fbsd@www.zefox.net>, freebsd-arm <freebsd-arm@freebsd.org>, markj@freebsd.org Subject: Re: RPI3 swap experiments ["was killed: out of swap space" with: "v_free_count: 5439, v_inactive_count: 1"] Message-ID: <AD5BF62D-DE8F-488A-8F10-75C7485D0061@yahoo.com> In-Reply-To: <20180805014545.GK2884@funkthat.com> References: <6BFE7B77-A0E2-4FAF-9C68-81951D2F6627@yahoo.com> <20180802002841.GB99523@www.zefox.net> <20180802015135.GC99523@www.zefox.net> <EC74A5A6-0DF4-48EB-88DA-543FD70FEA07@yahoo.com> <201808030034.w730YURL034270@donotpassgo.dyslexicfish.net> <F788BDD8-80DC-441A-AA3E-2745F50C3B56@yahoo.com> <201808040355.w743tPsF039729@donotpassgo.dyslexicfish.net> <8CC5DF53-F950-495C-9DC8-56FCA0087259@yahoo.com> <20180804140816.GJ2884@funkthat.com> <16ABD9F0-C908-479C-960D-0C1AEDE89053@yahoo.com> <20180805014545.GK2884@funkthat.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2018-Aug-4, at 6:45 PM, John-Mark Gurney <jmg at funkthat.com> wrote: > Mark Millard wrote this message on Sat, Aug 04, 2018 at 09:08 -0700: >> On 2018-Aug-4, at 7:08 AM, John-Mark Gurney <jmg at funkthat.com> = wrote: >>=20 >>> Mark Millard via freebsd-arm wrote this message on Sat, Aug 04, 2018 = at 00:14 -0700: >>>> On 2018-Aug-3, at 8:55 PM, Jamie Landeg-Jones <jamie at = catflap.org> wrote: >>>>=20 >>>>> Mark Millard <marklmi at yahoo.com> wrote: >>>>>=20 >>>>>> If Inact+Laundry+Buf(?)+Free was not enough to provide sufficient >>>>>> additional RAM, I'd would have guessed that some Active Real = Memory >>>>>> should then have been paged/swapped out and so RAM would be made >>>>>> available. (This requires the system to have left itself = sufficient >>>>>> room in RAM for that guessed activity.) >>>>>>=20 >>>>>> But I'm no expert at the intent or actual operation. >>>>>>=20 >>>>>> Bob P.'s reports (for having sufficient swap space) >>>>>> also indicate the likes of: >>>>>>=20 >>>>>> v_free_count: 5439, v_inactive_count: 1 >>>>>>=20 >>>>>>=20 >>>>>> So all the examples have: "v_inactive_count: 1". >>>>>> (So: vmd->vmd_pagequeues[PQ_INACTIVE].pq_cnt=3D=3D1 ) >>>>>=20 >>>>> Thanks for the feedback. I'll do a few more runs and other stress = tests >>>>> to see if that result is consistent. I'm open to any other idea = too! >>>>>=20 >>>>=20 >>>> The book "The Design and Implementation of the FreeBSD Operating = System" >>>> (2nd edition, 2014) states (page labeled 296): >>>>=20 >>>> QUOTE: >>>> The FreeBSD swap-out daemon will not select a runnable processes to = swap >>>> out. So, if the set of runnable processes do not fit in memory, the >>>> machine will effectively deadlock. Current machines have enough = memory >>>> that this condition usually does not arise. If it does, FreeBSD = avoids >>>> deadlock by killing the largest process. If the condition begins to = arise >>>> in normal operation, the 4.4BSD algorithm will need to be restored. >>>> END QUOTE. >>>>=20 >>>> As near as I can tell, for the likes of rpi3's and rpi2's, the = condition >>>> is occurring during buildworld "normal operation" that tries to use = the >>>> available cores to advantage. (Your context does not have the I/O >>>> problems that Bob P.'s have had in at least some of your OOM = process >>>> kill examples, if I understand right.) >>>>=20 >>>> (4.4BSD used to swap out the runnable process that had been = resident >>>> the longest, followed by the processes taking turns being swapped = out. >>>> I'll not quote the exact text about such.) >>>>=20 >>>> So I guess the question becomes, is there a reasonable way to = enable >>>> the 4.4BSD style of "Swapping" for "small" memory machines in order = to >>>> avoid having to figure out how to not end up with OOM process kills >>>> while also not just wasting cores by using -j1 for buildworld? >>>>=20 >>>> In other words: enable swapping out active RAM when it eats nearly >>>> all the non-wired RAM. >>>>=20 >>>> But it might be discovered that the performance is not better than >>>> using fewer cores during buildworld. (Experiments needed and >>>> possibly environment specific for the tradeoffs.) Avoiding having >>>> to figure out the maximum -j? that avoids OOM process kills but >>>> avoids just sticking to -j1 seems and advantage for some rpi3 and >>>> rpi2 folks. >>>=20 >>> Interesting observation, maybe playing w/: >>> vm.swap_idle_threshold2: Time before a process will be swapped out >>> vm.swap_idle_threshold1: Guaranteed swapped in time for a process >>>=20 >>> will help thing... lowering 2 will likely make the processes = available >>> for swap sooner... >>=20 >> Looking up related information: >>=20 >> https://www.freebsd.org/doc/handbook/configtuning-disk.html >>=20 >> says vm.swap_idle_enabled is also involved with those two. In fact >> it indicates the two are not even used until vm.swap_idle_enabled=3D1 = . >>=20 >> QUOTE >> 11.10.1.4. vm.swap_idle_enabled >> The vm.swap_idle_enabled sysctl(8) variable is useful in large = multi-user systems with many active login users and lots of idle = processes. Such systems tend to generate continuous pressure on free = memory reserves. Turning this feature on and tweaking the swapout = hysteresis (in idle seconds) via vm.swap_idle_threshold1 and = vm.swap_idle_threshold2 depresses the priority of memory pages = associated with idle processes more quickly then the normal pageout = algorithm. This gives a helping hand to the pageout daemon. Only turn = this option on if needed, because the tradeoff is essentially pre-page = memory sooner rather than later which eats more swap and disk bandwidth. = In a small system this option will have a determinable effect, but in a = large system that is already doing moderate paging, this option allows = the VM system to stage whole processes into and out of memory easily. >> END QUOTE >>=20 >> The defaults seem to be: >>=20 >> # sysctl vm.swap_idle_enabled vm.swap_idle_threshold1 = vm.swap_idle_threshold2 >> vm.swap_idle_enabled: 0 >> vm.swap_idle_threshold1: 2 >> vm.swap_idle_threshold2: 10 >>=20 >> Quoting the book again: >>=20 >> QUOTE >> If the swapping of idle processes is enabled and the pageout daemon = can find any >> processes that have been sleeping for more than 10 seconds = (swap_idle_threshold2, >> the cutoff for considering the time sleeping to be "a long time"), it = will swap >> them all out. [. . .] if none of these processes are available, the = pageout >> daemon will swap out all processes that has been sleeping for as = briefly as 2 >> seconds (swap_idle_threshold1). >> END QUOTE. >>=20 >> I'd not normally expect a compile or link to sleep for such long = periods >> (unless I/O has long delays). Having, say, 4 such processes active at = the >> same time may be unlikely to have any of them swap out on the default = scale. >> (Clang is less I/O bound and more memory bound than GCC as I remember = what >> I've observed. That statement ignores paging/swapping by the system.) >>=20 >> Such would likely be true on the scale of any positive integer = seconds >> figures? >=20 > The point is to more aggressively swap out OTHER processes so that > there is more memory available. I guess I'm relying on what I've seen in top to indicate that most all of the space from other processes has been paged out: not much of Active is for non-compiles/non-links during the problem times. For example, in = http://www.catflap.org/jamie/rpi3/rpi3-mmc-swap-failure-stats.txt it lists (last before the kill): last pid: 30806; load averages: 4.05, 4.04, 4.00 up 0+02:03:06 = 10:39:59 42 processes: 5 running, 37 sleeping CPU: 88.5% user, 0.0% nice, 6.1% system, 0.4% interrupt, 5.0% idle Mem: 564M Active, 2M Inact, 68M Laundry, 162M Wired, 97M Buf, 104M Free Swap: 4G Total, 76M Used, 4G Free, 1% Inuse PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU = COMMAND 30762 root 1 101 0 175M 119M CPU2 2 0:39 99.07% c++ 30613 root 1 101 0 342M 191M CPU0 0 2:02 95.17% c++ 30689 root 1 101 0 302M 226M CPU3 3 1:28 94.48% c++ 22226 root 1 20 0 19M 2M select 0 0:31 0.00% = make 1021 root 1 20 0 12M 340K wait 2 0:07 0.00% sh Rule of thumb figures: 564M Active vs. RES for the 3 c++'s: 119M+191M+226M =3D 536M for the 3 c++'s. So: 564M - 536M =3D 28M (approx. active for other processes) It appears to me that some c++ would likely need to swap out given that this context lead to OOM kills. (It might be that this rule of thumb is not good enough for such judgments.) [Personally I normally limit myself to -jN figures that have N*512 = MiBytes or more on the board. -j4 on a rpi3 or rpi2 has only 4*256 MiBytes.] =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AD5BF62D-DE8F-488A-8F10-75C7485D0061>