Date: Fri, 8 Dec 2017 08:18:21 +0000 From: Johannes Lundberg <johalun0@gmail.com> To: Larry McVoy <lm@mcvoy.com> Cc: freebsd-hackers@freebsd.org Subject: Re: OOM problem? Message-ID: <CAECmPwtcsHwiZpmx4%2BT_w3njEdUAjGZiRZKEX53m-QVJLSuY9Q@mail.gmail.com> In-Reply-To: <20171208011430.GA16016@mcvoy.com> References: <20171208011430.GA16016@mcvoy.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Regarding potential oom overhaul. Personally I like the idea of an oom signal. The idea comes from iOS where applications get a callback when system memory is low and they're given a chance to free unused resources or resources that can easily be recreated, before getting killed completely. On FreeBSD, occasionally my Firefox gets killed by the oom routine when it uses +5 GB and I run other heavy stuff like poudriere (yes, as I move around a lot I don't own a desktop, all work done on laptop). Wouldn't it be nice if Firefox instead could get a signal where it can free old tabs' contents and stay alive or at least shut down cleanly instead of being forcibly killed. Processes like poudriere could throttle down number of jails in oom situation. Having a rather small SSD like many laptops, I don't want to waste a lot of space on swap. Actually I rather not use swap at all on SSD due to wear. Just an idea how to improve the FreeBSD laptop experience and could as well solve some of OP's issues I think... On Fri, Dec 8, 2017 at 1:14 AM, Larry McVoy <lm@mcvoy.com> wrote: > Hi hackers, > > I've been playing around on a box that Netflix loaned me, I'm thinking > about novel ways to deal with NUMA issues. > > I ran into a problem with the kernel, wanted to check in and see if > anyone cares (I've got a couple different ways that it could be fixed > but if noone cares I'll drop it). It's sort of an ugly problem in that > when it happens your only recourse is to power cycle the machine, you > can't kill off the processes causing the problem. > > I was trying to create benchmarks that would show what the system could do > if you locked things down to different NUMA domains (BTW, the NUMA stuff > is a complete red herring, the problem I'm about to describe happens if > NUMA support isn't enabled). > > The machine is running 12.0-CURRENT FreeBSD 12.0-CURRENT #13 ce7b9882181 > with a few diffs I did for debugging and a tweak to the pageout daemon > suggested by Jeff. It is a 256GB of RAM machine configured with no swap > space (that detail is important). > > I created a set of 10 processes that malloced 25GB each and read it > repeatedly. That was enough memory pressure to use up all of free mem. > > Here is the problem. All of these "misbehaved" (by using lots of ram) > processes go to sleep, I believe in vm_wait(). They are all waiting > for more ram so the pageout daemon is kicked but to no avail, all the > ram is tied up in the processes that want more ram. The pageout daemon > kicks out what it can but it quickly gets to the point that it scans > everything and finds nothing (I know this because I added debugging to > show that's what it is doing). > > The OOM code kicks in and it behaves poorly. It doesn't kill any of > the big processes, those are all sleeping without PCATCH on so they are > skipped. The OOM code starts killing off anything it can find, it was > killing getty, ssh, bash, dhclient. One buglet is that, in my opinion, > it finds stuff to kill that it probably shouldn't. Anything that init > will respawn is fine, anything that would not be respawned should be > run as not killable. Seems like an audit of those processes might be > in order. > > I know that you'll ask why no swap? Just add swap and the problem > goes away. Does it? I don't think so, that's just kicking the can > down the road. If we add 256GB of swap now we have a 512GB bag to fill, > fill that and I think we're right back to where we started. > > What are the ideas for fixing it? I've got two. I think the first > one is a bit hard to get right and I'm not sure if the second one will > work (sorry, it's been a long time since I was a kernel hack, like SunOS > 4.x long time). > > A) Don't allocate more mem than you have. This problem exists simply > because the system allowed malloc to return more space than the > system had. If the system kept track of all the mem it has (ram > plus swap) and when processes asked for an allocation that pushed it > over that limit, fail that allocation. It's yet another globally > locked thing (though Jeff's NUMA stuff may make that better), you > have to keep track of allocations and frees (as in on exit(2) not > free(3)), that's why I think it's detail oriented to do it this way. > Probably the right way but has to be done carefully and someone has > to care enough to keep watching that this doesn't get broken. > > B) Sleep with PCATCH, if that doesn't work, loop sleeping for a period, > wake up and see if you are signaled. I'm rusty enough that I don't > remember if msleep() with PCATCH will catch signals or not (I don't > remember a msleep(), that might be a BSD thing and not a SunOS thing). > But whatever, either it catches signals or you replace that sleep with > a loop that sleeps for a second or so, wakes up and looks to see if it's > been signaled and if so dies, else goes back to sleep waiting for pageout > and/or OOM to free some mem. > > I kinda like B better because it seems harder to have that approach bit rot. > I'm wondering if anyone cares about this problem. If no, fine. If yes, > I can cons up a test case and hand that off to someone who wants to fix > the problem. If noone wants to fix it, I'll give it a try but I'd like > feedback on the above approaches, not interested in going down a rathole > for no good reason. > > Thanks, > > --lm > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAECmPwtcsHwiZpmx4%2BT_w3njEdUAjGZiRZKEX53m-QVJLSuY9Q>