Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 8 Dec 2017 07:01:21 -0800
From:      Larry McVoy <lm@mcvoy.com>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        Larry McVoy <lm@mcvoy.com>, freebsd-hackers@freebsd.org
Subject:   Re: OOM problem?
Message-ID:  <20171208150121.GH16028@mcvoy.com>
In-Reply-To: <20171208101543.GC2272@kib.kiev.ua>
References:  <20171208011430.GA16016@mcvoy.com> <20171208101543.GC2272@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Dec 08, 2017 at 12:15:43PM +0200, Konstantin Belousov wrote:
> > The OOM code kicks in and it behaves poorly.  It doesn't kill any of
> > the big processes, those are all sleeping without PCATCH on so they are
> > skipped.
> What is the proof for this statement ?

I let the system run overnight trying to find more memory and it never
killed any of the big processes.

I am able to log in and kill -9 would not kill them.

I tried a reboot and that hung.

It took a power cycle to get the machine back.

I've done this multiple times and always get the same result.

> A process waiting for a page in the fault handler must receive the page
> to get out of the handler, even if the system is in OOM.  

I may be confusing you because this is not the normal page fault on a file
code path (at least I think it is not).  The process is indeed faulting
in pages but they are pages that were allocated via whatever malloc calls
these days (in SunOS it mmapped /dev/zero, before that it was sbrk(2),
I dunno what FreeBSD does, I couldn't find malloc in src/lib, I see that
it's jemalloc but /usr/src/lib/libc/stdlib/jemalloc has no files?)

I think we are landing in vm_wait() but I can put some debugging in there
and confirm that if that helps.

> > A) Don't allocate more mem than you have.  This problem exists simply
> >    because the system allowed malloc to return more space than the
> >    system had.  If the system kept track of all the mem it has (ram
> >    plus swap) and when processes asked for an allocation that pushed it
> >    over that limit, fail that allocation.  It's yet another globally
> >    locked thing (though Jeff's NUMA stuff may make that better), you
> >    have to keep track of allocations and frees (as in on exit(2) not
> >    free(3)), that's why I think it's detail oriented to do it this way.
> >    Probably the right way but has to be done carefully and someone has
> >    to care enough to keep watching that this doesn't get broken.
> This behaviour can be requested by disabling overcommit.   See tuning(7).
> The code might rot from the time it was done, because this feature often
> asked for, but rarely used for real.

Seems like that should be on by default, no?

> > B) Sleep with PCATCH, if that doesn't work, loop sleeping for a period, 
> >    wake up and see if you are signaled.  I'm rusty enough that I don't
> >    remember if msleep() with PCATCH will catch signals or not (I don't
> >    remember a msleep(), that might be a BSD thing and not a SunOS thing).
> >    But whatever, either it catches signals or you replace that sleep with
> >    a loop that sleeps for a second or so, wakes up and looks to see if it's
> >    been signaled and if so dies, else goes back to sleep waiting for pageout
> >    and/or OOM to free some mem.
> Not exactly this, but something close, was done by the patch I provided to
> you already.

I need to double check but I'm pretty sure I'm running with your patch at
least some version of it.  Doesn't help.  Would it help if I packaged up
a test case?  Right now I'm using something like this:

    cd LMbench2+/src
    for i in 1 2 3 4 5 6 7 8 9 0
    do	../bin/*/lat_mem_rd 25g 4096 &
    done

but I could make something simpler.  I'm willing to keep pushing on this
if that's helpful but if you'd prefer to debug it yourself I can package
up a test case.  Should probably do that anyway.

The diffs against head are in http://mcvoy.com/lm/D if you want to see if 
I am running the right patch.

--lm



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20171208150121.GH16028>