From owner-freebsd-arm@freebsd.org Thu Sep 6 08:04:49 2018 Return-Path: Delivered-To: freebsd-arm@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 46D02FF2E50 for ; Thu, 6 Sep 2018 08:04:49 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic310-20.consmr.mail.gq1.yahoo.com (sonic310-20.consmr.mail.gq1.yahoo.com [98.137.69.146]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C58E38BF1A for ; Thu, 6 Sep 2018 08:04:48 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: MVKxjuIVM1ndo3VVqTY666UY98MifbuVumWaHWse33x7fb.yLc6XHj1EvZ78nQq .84fTow3GPYn0KeykyFrmGg8AGxoigWKTCXmUq1X97oXU.HKVTETunp6BqwzINOv5ABaL5VJyTTo 7TI96OZpqevvha9pgSO8Gn6YsE9P.thzMOEKf2TAbtji3Lla4SVEmOYTeU2VkkIH3cNoU1tpi2mU fCdlanBeRfUQ1_LmW_8.pH0Zw2OBz7Rup1fahWdIs0YCFgiqDe_l26V4LBUXLFCNLlmPUsdP3wYu Un5.xzUljoxrCfyV9IrSJNX0LnF7N5o2nAsanA5GNZwFTlCSYMC4oostMnDZb6PdyLAgLE0LcAjO uEVTh9e0yAqgD6E98mTtGXMwl13vaAdUVmWIJXimnunk2K0ZmVV5AGO0R.tC1yaxjFQHfBu.kQlK Jx2DfH4fP_XZKX6dbPNDDCWb6e15Gzg.r4NK14iItPy44GvG2mZKsBGOH55C7oY2IadtqgikeH1D qXfxJ69XGl7b9E2W9eT4xV11RLrWlEvWd5AunW3Zwqy0K8CFO2ztnAnvL1GzgMJt.Wi3fsrCWKng p1abL8sYAqc75g19CIrDCaxAMt5TPyCZCqTmkrDiTUyD4YwrFH0tvqp02LCAZ80Z9teEWS9W6RWy pe7H9JCIROUfHomXUZKiccRGf2miR8sQMPHuC1O1s1XPlotBTsiF4vjCbKmXHy0mNtYtTrOZU4xd 4rxsfExua6ic0Lz6Zf6840y1AC5yCjS7b8kjtpzkzRIohP.SkQlw4fWNJoLeZjTydWnJaZ0JZIjS pRFaOSHfVuO83JmEWdjXJQ27Crkr8XunqfHLJB_YisOJiEOawxaREBu4ozxvxAlnkdN.OKyKQeIQ 20WbOnl6VfsYHcjbEHICHeUo4pFslRo5G7Joz8p.jwreo5sMDU1cqVPSuKsK0okBkRbcykF4sJzA K7CZWVPDw..G_HU38CCxXg9xt8_UB2S4kP_2LRivP4Mh3pNtTu4tJt6YtlC7apgmUHKn8EceDh_r XZ9Na_.iI Received: from sonic.gate.mail.ne1.yahoo.com by sonic310.consmr.mail.gq1.yahoo.com with HTTP; Thu, 6 Sep 2018 08:04:42 +0000 Received: from ip70-189-131-151.lv.lv.cox.net (EHLO [192.168.0.105]) ([70.189.131.151]) by smtp431.mail.gq1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA ID 51ddb76204aa819a1b0fca688b2eb7ef; Thu, 06 Sep 2018 08:04:38 +0000 (UTC) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\)) Subject: Re: RPI3 swap experiments (r338342 with vm.pageout_oom_seq="1024" and 6 GB swap) From: Mark Millard In-Reply-To: <20180906070828.GC3482@www.zefox.net> Date: Thu, 6 Sep 2018 01:04:37 -0700 Cc: "Rodney W. Grimes" , freebsd-arm@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: References: <20180906003829.GC818@www.zefox.net> <201809060243.w862hq7o058504@pdx.rh.CN85.dnsmgr.net> <20180906042353.GA3482@www.zefox.net> <20180906070828.GC3482@www.zefox.net> To: bob prohaska X-Mailer: Apple Mail (2.3445.9.1) X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: "Porting FreeBSD to ARM processors." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 Sep 2018 08:04:49 -0000 On 2018-Sep-6, at 12:08 AM, bob prohaska wrote: > On Wed, Sep 05, 2018 at 11:20:14PM -0700, Mark Millard wrote: >>=20 >>=20 >> On 2018-Sep-5, at 9:23 PM, bob prohaska = wrote: >>=20 >>> On Wed, Sep 05, 2018 at 07:43:52PM -0700, Rodney W. Grimes wrote: >>>>=20 >>>> What makes you believe that the VM system has any concept about >>>> the speed of swap devices? IIRC it simply uses them in a round >>>> robbin fashion with no knowlege of them being fast or slow, or >>>> shared with files systems or other stuff. >>>>=20 >>>=20 >>> Mostly the assertion that OOMA kills happening when the system had >>> plenty of free swap were caused by the swap being "too slow". If the >>> machine knows some swap is slow, it seems capable of discerning = other >>> swap is faster.=20 >>=20 >> If an RPI3 magically had a full-speed/low-latency optane context >> as its swap space, it would still get process kills for buildworld >> buildkernel for vm.pageout_oom_seq=3D12 for -j4 as I understand >> things at this point. (Presumes still having 1 GiByte of RAM.) >>=20 >> In other words: the long latency issues you have in your rpi3 >> configuration may contribute to the detailed "just when did it >> fail" but low-latency/high-speed I/O would be unlikely to prevent >> kills from eventually happening during the llvm parts of buildworld . >> Free RAM would still be low for "long periods". Increasing >> vm.pageout_oom_seq is essential from what I can tell. >>=20 > Understood and accepted. I'm using vm.pageout_oom_seq=3D1024 at = present. > The system struggles mightily, but it keeps going and finishes. >=20 >> vm.pageout_oom_seq is about controlling "how long". -j1 builds are >> about keeping less RAM active. (That is also the intent for use of >> LDFLAGS.lld+=3D-Wl,--no-threads .) Of course, for the workload = involved, >> using a context with more RAM can avoid having "low RAM" for >> as long. An aarch64 board with 4 GiBYte of RAM and 4 cores possibly >> has no problem for -j4 buildworld buildkernel for head at this >> point: Free RAM might well never be low during such a build in such >> a context. >>=20 >> (The quotes like "how long" are because I refer to the time >> consequences, the units are not time but I'm avoiding the detail.) >>=20 >> The killing criteria do not directly measure and test swapping I/O >> latencies or other such as far as I know. Such things are only >> involved indirectly via other consequences of the delays involved >> (when they are involved at all). That is my understanding. >>=20 > Perhaps I'm being naive here, but when one sees two devices holding > swap, one at ~25% busy and one at ~150% busy, it seems to beg for > a little selective pressure for diverting traffic to the less busy > device from the more busy one. Maybe it's impossible, maybe it's more > trouble than the VM folks want to invest. Just maybe, it's doable > and worthwhile, to take advantage of a cheap, power efficient = platform. Continuously reorganize were things page out to in the swap partitions on the fly? (The reads have to be from where it was paged out.) What virtual memory pages will be written out and read in and how often is not known up front and is not predicable and varies over time. That some partitions end up with more active paging than others is not all that surprising. Avoiding such would have overhead that was always involved --and would use even more RAM for the extra tracking. It would also involve trying to react as fast as the demand changes: the full speed of the programs that are keeping "cores" and RAM in active use. > I too am unsure of the metric for "too slow". =46rom earlier = discussion > I got the impression it was something like a count of how many cycles > of request and rejection (more likely, deferral) for swap space were > made; after a certain count is reached, OOMA is invoked. That picture > is sure to be simplistic, and may well be flat-out wrong. Note that writing out a now-dirty RAM page that was already written out before need not allocate any new swap space: reuse the place it used before. But if the RAM page is in active enough use, writing it out and freeing the page could just lead to it being read back in (allocating a free RAM page nearly immediately). Such can be viewed be a waste of time. Another wording for this is something like "the system working set" can be so large that paging becomes ineffective (performs too poorly). Having virtual memory "thrashing" can slow things by orders of magnitude. It has more to do with something like counting attempts at moving dirty RAM pages out that have not been used as recently in order to hopefully free the RAM pages written out. (But the RAM pages may become active again before they are freed.) Clean RAM pages can more directly be freed but there is still an issue of such a page being in active use to the point that freeing it would just lead to reading it back in to a newly allocated RAM page. The handling of this need not use up all the swap space before not making progress at freeing RAM and so not keeping a sufficient amount of it free. Thrashing does not require the swap space all be used. There might be no swap space set up at all. (I've done this on a 128 GiByte RAM configuration.) "OOM" kills still can happen: dirty pages and clean pages have no place to be written out to in order to increase free memory. If free RAM gets low, OOM kills start. > If my picture is not wholly incorrect, it isn't a huge leap to ask for > swap device-by-device, and accept swap from the device that offers it = first. > In the da0 vs mmcsd0 case, ask for swap on each in turn, first to say = yes gets > the business. The busier one will will get beaten in the race by the = more > idle device, relieving the bottleneck to the extent of the faster = device's > capacity. It isn't perfect, but it's an improvement. See my comments above about needing to reorganizes where the RAM pages go in the swap partitions on the fly over time. =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)