Date: Sun, 12 Aug 2018 20:05:06 -0700 From: John Kennedy <warlock@phouka.net> To: bob prohaska <fbsd@www.zefox.net> Cc: freebsd-arm <freebsd-arm@freebsd.org> Subject: Re: RPI3 swap experiments ["was killed: out of swap space" with: "v_free_count: 5439, v_inactive_count: 1"] Message-ID: <20180813030506.GC81324@phouka1.phouka.net> In-Reply-To: <20180812224021.GA46372@www.zefox.net> References: <20180802015135.GC99523@www.zefox.net> <EC74A5A6-0DF4-48EB-88DA-543FD70FEA07@yahoo.com> <20180806155837.GA6277@raichu> <20180808153800.GF26133@www.zefox.net> <20180808204841.GA19379@raichu> <2DC1A479-92A0-48E6-9245-3FF5CFD89DEF@yahoo.com> <20180809033735.GJ30738@phouka1.phouka.net> <20180809175802.GA32974@www.zefox.net> <20180812173248.GA81324@phouka1.phouka.net> <20180812224021.GA46372@www.zefox.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Aug 12, 2018 at 03:40:21PM -0700, bob prohaska wrote: > The closest thing to clever is the logging script, started by Mark M., ... I was thinking more of the heavily distilled contents of top. > The script surely isn't "lightweight", but in my case the crashes came before the script > and haven't changed much since it arrived. Still, you make a good point and I should do > a test occasionally to see if the script contributes to the crashes. I don't think the > script has ever been killed by OOMA. I think we're probably chasing this the wrong way around. Going OOM is to be expected in some types of situations. I think we're mostly saying that the buildworld/buildkernel process shouldn't be one of those places, and for most of us (at least the verbal ones) it presumably isn't. Bob P has an interesting situation that triggers it when it arguably shouldn't, that perhaps reveals a problem. OOMing when swap is unresponsive? And then we need to decide what is reasonably responsive (with possibly a "tweak this tunable knob" note if you have some hardware that isn't tall enough to ride the rollercoaster. In my case, I didn't have my normal resources available, so I was basically watching it swap a lot more than run. If I was having issues and my swap wasn't fast enough (assuming swap-speed issue), that might be helpful and I should have left it as-is. To that end, I've applied the patches that tell me more about what was going on when things were going OOM, but not necessarily trying to avoid it. Once I can get things to fail reliably, figuring out to fix it reliably starts. So for my part, can I guarantee that some arbitrary process kicked off on my box during build*, used up all swap and kicked off an OOM massacre. The solution there is to not do it (or re-engineer it). The build* process seems like a pretty constant load, but I bet you that if you looked at it from the scheduler or swap, it isn't. For you, CAM_IOSCHED_DYNAMIC seems to hurt. That looks like it might tweak the number of read-vs-write traffic. They were worrying about SSDs, I can only imagine how much worse SD cards or USB2 devices much seem. I guess if you're cutting corners on price, you might fine-tune the suck that far down the line. Tuning vm.pageout_oom_seq increases the number of back-to-back passes the pagedaemon (?) makes while waiting for usable pages. That sounds like it lets us dig a deeper hole, which is fine as long as we can dig ourselves out of it. You might just be (un)lucky, which isn't reproduced reliably. (https://lists.freebsd.org/pipermail/svn-src-head/2015-November/078968.html) I'm not sure what Bob's ultimate problem is. My gut feeling is a slow disk, but I had the impression that he's tried similar hardware. I've got a RPI3B+ in a 77-degree-F room, a Sandisk Extreme Plus (V30-rated) SD card with the swap on it and a heatsink + pi-fan case-mod to keep my system cool. That would seem easy enough to reproduce. Counterfeit hardware? Bad sectors that cause unpredictable delays when they get wear-balanced over them? Dodgy hardware doing the same? Thermal throttle that gets him closer to some invisible performance dropoff? How do divide and conquer this problem? What can we do to split this problem in half so we can figure out which of the two halves has the problem (and then rinse-n-repeat).
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180813030506.GC81324>