From owner-freebsd-arm@freebsd.org Thu Aug 9 16:21:47 2018 Return-Path: Delivered-To: freebsd-arm@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 11569106BB8F for ; Thu, 9 Aug 2018 16:21:47 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic314-13.consmr.mail.bf2.yahoo.com (sonic314-13.consmr.mail.bf2.yahoo.com [74.6.132.123]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id A8EFE7C497 for ; Thu, 9 Aug 2018 16:21:46 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: vyMTP9IVM1ldpne6ypiiq036MQcoKK3YdDqlC4hH6ZJKk9OXbI979P5ZUjMdXbQ lNpAx6BHD6BKUDtOKTYdkqQpoRr0H0qonU2XwVU1FqaGz0CpHJbsrC.WmV0Vy3YNt5DtvIJMkYvt AztaVzwUEm2JoX1BA0ydxLWuAn6wnZkMuYgQksyi7qgvVEhsc_NzqH4bLS54wXx7vQ.ay_efIMVN nGH3e_oeI7ac.nO2zYouNxUv6gl7adA4DXM6hrOPmuIBD9aS7hQG5V4XXy06zdL4xU6VmFrYGnGb VpSl0w.cq_dEvZikPV3t7rGjahsoUaVagNqjqcH2e.O.yVsk9Yjssiihjidr.8A9Pp5sztOx30dG LeXb927nt4JWc7ldIfu_QZs76CZ_K1iBYXggv4NKAov6R4t1WOqwVKQMj9V1oJRgOuFAxJmj7dJ6 xPetbBIYKhyJpJvfEi_5F4YxUXSb8G5iZ3uWMCcwJMKavuISpe90mVae4LmsGzWr8UdyYtjooW1x V0qpSqAHmUxRNo_ZHqBGAkeYRx84VQNWXStXot5yZfZCYeVskvSY6VfrpKpmASP6k0FD27gnOgIK ijlDzUOc7HrUmnpfnibQSWCR7qC30D6Y.UH8XZVbZR7abLcLOSKZyKK47SuwIkmt0qb6zsvhUwMl Oda.3fPHlUO0bypx2DBzBMYkKppODHqLl_tC2L3.k2sZNMyFxT31S.W8WBDWkcaATOvRIxbVqUre ryrBKGKhMDdaR8JChni_xjis6VCaZMIdtf_Qyd2uiV9GeuGeXpwvlox18eidhfWEVHzjSdURTUT7 AGQ40N0hZdSEK27ps.YDcKnXrJhTAiEzihVw_SrF0u1yuY6ezWpkDbmI5jNqlZUVeVwhhTq7oID. lsRqG2HDsmTV1qxMBe2Do965QWnMt7iqHsDLREYucI26C1GKdhpMhycWsBZlCjPxnUxwF5cEZe6t ObxsJUM0yU7FH8OEgpQwFAxlNYwgRlihpqKIS25wZu188ZuN0CS35r3tnAs71FagxAOcpqw8oL9F 36Sxm0gYY_Q-- Received: from sonic.gate.mail.ne1.yahoo.com by sonic314.consmr.mail.bf2.yahoo.com with HTTP; Thu, 9 Aug 2018 16:21:46 +0000 Received: from ip70-189-131-151.lv.lv.cox.net (EHLO [192.168.0.105]) ([70.189.131.151]) by smtp427.mail.bf1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA ID c1dcf9cfb6dab5a36d7d0d77290481e6; Thu, 09 Aug 2018 16:21:41 +0000 (UTC) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\)) Subject: Re: RPI3 swap experiments ["was killed: out of swap space" with: "v_free_count: 5439, v_inactive_count: 1"] From: Mark Millard In-Reply-To: <20180809152152.GC68459@raichu> Date: Thu, 9 Aug 2018 09:21:38 -0700 Cc: bob prohaska , freebsd-arm@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <915DFC7F-7AC9-484F-8619-C386FF077769@yahoo.com> References: <20180801034511.GA96616@www.zefox.net> <201808010405.w7145RS6086730@donotpassgo.dyslexicfish.net> <6BFE7B77-A0E2-4FAF-9C68-81951D2F6627@yahoo.com> <20180802002841.GB99523@www.zefox.net> <20180802015135.GC99523@www.zefox.net> <20180806155837.GA6277@raichu> <20180808153800.GF26133@www.zefox.net> <20180808204841.GA19379@raichu> <20180809065648.GB30347@www.zefox.net> <20180809152152.GC68459@raichu> To: Mark Johnston X-Mailer: Apple Mail (2.3445.9.1) X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: "Porting FreeBSD to ARM processors." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Aug 2018 16:21:47 -0000 On 2018-Aug-9, at 8:21 AM, Mark Johnston wrote: > On Wed, Aug 08, 2018 at 11:56:48PM -0700, bob prohaska wrote: >> On Wed, Aug 08, 2018 at 04:48:41PM -0400, Mark Johnston wrote: >>> On Wed, Aug 08, 2018 at 08:38:00AM -0700, bob prohaska wrote: >>>> The patched kernel ran longer than default but OOMA still halted = buildworld around >>>> 13 MB. That's considerably farther than a default build world have = run but less than >>>> observed when setting vm.pageout_oom_seq=3D120 alone. Log files are = at >>>> = http://www.zefox.net/~fbsd/rpi3/swaptests/r337226M/1gbsdflash_1gbusbflash/= batchqueue/ >>>>=20 >>>> Both changes are now in place and -j4 buildworld has been = restarted.=20 >>>=20 >>> Looking through the gstat output, I'm seeing some pretty abysmal = average >>> write latencies for da0, the flash drive. I also realized that my >>> reference to r329882 lowering the pagedaemon sleep period was wrong = - >>> things have been this way for much longer than that. Moreover, as = you >>> pointed out, bumping oom_seq to a much larger value wasn't quite >>> sufficient. >>>=20 >>> I'm curious as to what the worst case swap I/O latencies are in your >>> test, since the average latencies reported in your logs are high = enough >>> to trigger OOM kills even with the increased oom_seq value. When = the >>> current test finishes, could you try repeating it with this patch >>> applied on top? = https://people.freebsd.org/~markj/patches/slow_swap.diff >>> That is, keep the non-default oom_seq setting and modification to >>> VM_BATCHQUEUE_SIZE, and apply this patch on top. It'll cause the = kernel >>> to print messages to the console under certain conditions, so a log = of >>> console output will be interesting. >>=20 >> The run finished with a panic, I've collected the logs and terminal = output at >> = http://www.zefox.net/~fbsd/rpi3/swaptests/r337226M/1gbsdflash_1gbusbflash/= batchqueue/pageout120/slow_swap/ >>=20 >> There seems to be a considerable discrepancy between the wait times = reported >> by the patch and the wait times reported by gstat in the first couple = of=20 >> occurrences. The fun begins at timestamp Wed Aug 8 21:26:03 PDT 2018 = in >> swapscript.log.=20 >=20 > The reports of "waited for swap buffer" are especially bad: during = those > periods, the laundry thread is blocked waiting for in-flight swap = writes > to finish before sending any more. Because the system is generally > quite starved for clean pages that it can reuse, it's relying on swap > I/O to clean more. If that fails, the system eventually has no choice > but to start killing processes (where the time period corresponding to > "eventually" is determined by vm.pageout_oom_seq). >=20 > Based on these latencies, I think the system is behaving more or less = as > expected from the VM's perspective. I do think the default oom_seq = value > is too low and will get that addressed in 12.0. Would something like the patch that produced the messages like: waited 3s for async swap write waited 3s for swap buffer be appropriate as able to be enabled via a sysctl or in some other way? In other words: in the source code by standard, off by default, but able to be enabled without patching, possibly without rebuilding? I ask because I've been thinking of having such on the FreeBSD's where I buildworld buildkernel and use poudriere-devel for port builds. It might warning me of marginal contexts and help explain any OOM kills that might occur. (Somethings things are difficult or time consuming to reproduce.) If monitored at the time, it might even help identify contexts that "machine-gun down requests" in environments were such can be a problem for swapping. =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)