Date: Thu, 9 Aug 2018 09:42:59 -0600 From: Warner Losh <imp@bsdimp.com> To: bob prohaska <fbsd@www.zefox.net> Cc: Mark Johnston <markj@freebsd.org>, "freebsd-arm@freebsd.org" <freebsd-arm@freebsd.org> Subject: Re: RPI3 swap experiments ["was killed: out of swap space" with: "v_free_count: 5439, v_inactive_count: 1"] Message-ID: <CANCZdfrC0s8X-LxJmrDmkxmz%2BGUMNsHSMpBEQmp1S5ahcvptpg@mail.gmail.com> In-Reply-To: <20180809153710.GC30347@www.zefox.net> References: <6BFE7B77-A0E2-4FAF-9C68-81951D2F6627@yahoo.com> <20180802002841.GB99523@www.zefox.net> <20180802015135.GC99523@www.zefox.net> <EC74A5A6-0DF4-48EB-88DA-543FD70FEA07@yahoo.com> <20180806155837.GA6277@raichu> <20180808153800.GF26133@www.zefox.net> <20180808204841.GA19379@raichu> <20180809065648.GB30347@www.zefox.net> <20180809152152.GC68459@raichu> <CANCZdfpKOTBrxiNhaeHHRp-2iw5a4eXt%2Bmd_1LTD-c0%2BAE6qxg@mail.gmail.com> <20180809153710.GC30347@www.zefox.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Aug 9, 2018 at 9:37 AM, bob prohaska <fbsd@www.zefox.net> wrote: > On Thu, Aug 09, 2018 at 09:28:09AM -0600, Warner Losh wrote: > > On Thu, Aug 9, 2018 at 9:21 AM, Mark Johnston <markj@freebsd.org> wrote: > > > > > On Wed, Aug 08, 2018 at 11:56:48PM -0700, bob prohaska wrote: > > > > On Wed, Aug 08, 2018 at 04:48:41PM -0400, Mark Johnston wrote: > > > > > On Wed, Aug 08, 2018 at 08:38:00AM -0700, bob prohaska wrote: > > > > > > The patched kernel ran longer than default but OOMA still halted > > > buildworld around > > > > > > 13 MB. That's considerably farther than a default build world > have > > > run but less than > > > > > > observed when setting vm.pageout_oom_seq=120 alone. Log files > are at > > > > > > http://www.zefox.net/~fbsd/rpi3/swaptests/r337226M/ > > > 1gbsdflash_1gbusbflash/batchqueue/ > > > > > > > > > > > > Both changes are now in place and -j4 buildworld has been > restarted. > > > > > > > > > > Looking through the gstat output, I'm seeing some pretty abysmal > > > average > > > > > write latencies for da0, the flash drive. I also realized that my > > > > > reference to r329882 lowering the pagedaemon sleep period was > wrong - > > > > > things have been this way for much longer than that. Moreover, as > you > > > > > pointed out, bumping oom_seq to a much larger value wasn't quite > > > > > sufficient. > > > > > > > > > > I'm curious as to what the worst case swap I/O latencies are in > your > > > > > test, since the average latencies reported in your logs are high > enough > > > > > to trigger OOM kills even with the increased oom_seq value. When > the > > > > > current test finishes, could you try repeating it with this patch > > > > > applied on top? https://people.freebsd.org/~ > > > markj/patches/slow_swap.diff > > > > > That is, keep the non-default oom_seq setting and modification to > > > > > VM_BATCHQUEUE_SIZE, and apply this patch on top. It'll cause the > > > kernel > > > > > to print messages to the console under certain conditions, so a > log of > > > > > console output will be interesting. > > > > > > > > The run finished with a panic, I've collected the logs and terminal > > > output at > > > > http://www.zefox.net/~fbsd/rpi3/swaptests/r337226M/ > > > 1gbsdflash_1gbusbflash/batchqueue/pageout120/slow_swap/ > > > > > > > > There seems to be a considerable discrepancy between the wait times > > > reported > > > > by the patch and the wait times reported by gstat in the first > couple of > > > > occurrences. The fun begins at timestamp Wed Aug 8 21:26:03 PDT > 2018 in > > > > swapscript.log. > > > > > > The reports of "waited for swap buffer" are especially bad: during > those > > > periods, the laundry thread is blocked waiting for in-flight swap > writes > > > to finish before sending any more. Because the system is generally > > > quite starved for clean pages that it can reuse, it's relying on swap > > > I/O to clean more. If that fails, the system eventually has no choice > > > but to start killing processes (where the time period corresponding to > > > "eventually" is determined by vm.pageout_oom_seq). > > > > > > > > > Based on these latencies, I think the system is behaving more or less as > > > expected from the VM's perspective. I do think the default oom_seq > value > > > is too low and will get that addressed in 12.0. > > > > > > Yea. I think we need to take a more active role in managing latencies on > > some cards. Properly managed, they won't climb that high. Since there's > no > > tagged queueing to these devices, there's an I/O depth of one. The > default > > policy is to do them in order (since it's flash) which means that > processes > > that machine-gun down requests swamp everybody else and do > > back-to-back-to-back writes which, at least for the few drives I have > > looked at in detail tends to induce pathological behavior. > > > > There's a kernel building now with > options CAM_IOSCHED_DYNAMIC > in the config file. Is it still worth trying? Anything else to try? > It won't be a cure-all, out of the box, I don't think. However, the read biasing code may help sneak a few 'reads' in between writes which may help keep away from the pathological behavior.... Or not, it's hard to say... I've not looked at swapping to super-crappy nand (I mean thumb drives) in as much detail as the drives we use for work. Warner
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfrC0s8X-LxJmrDmkxmz%2BGUMNsHSMpBEQmp1S5ahcvptpg>