From owner-freebsd-hackers@freebsd.org  Fri Dec  8 08:19:04 2017
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id C375BE9D609
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Fri,  8 Dec 2017 08:19:04 +0000 (UTC)
 (envelope-from johalun0@gmail.com)
Received: from mail-wr0-x22c.google.com (mail-wr0-x22c.google.com
 [IPv6:2a00:1450:400c:c0c::22c])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 597D868A29
 for <freebsd-hackers@freebsd.org>; Fri,  8 Dec 2017 08:19:04 +0000 (UTC)
 (envelope-from johalun0@gmail.com)
Received: by mail-wr0-x22c.google.com with SMTP id h1so9966500wre.12
 for <freebsd-hackers@freebsd.org>; Fri, 08 Dec 2017 00:19:04 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc; bh=jE38ALhqnMq9v8eID8HEi3/3Fsu3lj1by1QzsW22Fzo=;
 b=GE0nMG5fbHVqI3Ucv42GNzG/iCztRbGe0rR7c72IfgomCouCY10/hLW6beicdHQDyX
 Pof4X1WVww8W0qtxpSyRKQLHnulaNtxcdlAJigBPYEpdleS2K34ONQo2FxBnFh+sORzP
 390adC40LJpQBW6UhqKg3vvZ7tGPQxxvA1sBN4HpH47CIs/gIQ9o/68YU4765Zst/uSv
 Z5f8iHXPAuDCt0xKnI1f3SFJTo/UTF3aI7arcY6y84YzIhedJ8b9cXFqnZII017+0jVu
 gXXHdv09V0cXGWWiVHXDMhnpJM76suD7GisP6u7C9wd/tPPnA+8vJ5ysYA+rjwnu1pf/
 RKag==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:in-reply-to:references:from:date
 :message-id:subject:to:cc;
 bh=jE38ALhqnMq9v8eID8HEi3/3Fsu3lj1by1QzsW22Fzo=;
 b=Lkka0vWME3Q0TtUEUy3KACp9YnIjQP8Kxns8UhfQtnUFmRhZYhGLY/43FwnuyE1nP5
 p78/7rFOCATOQ5np9UwGIa/DpEq63+F+O7k+aytVwdv5OB0h01yhp7p/1LqJUXx9cz2J
 4qno2A/QCMAyIeGh5HYUKC6Ctc1qsII/xuU5sSGMOQs1s/d6WOUspvFICrIdhCK3285e
 t3aXNpLcFzt17gOO7iMTGcYBPEzDFet4d51xfdJGSqKrbemQe6gPFvkYST7uRbVoXIEd
 Ox24JbFizFWayZwjxnDQANHGTalilwnS5kQRIX1qxlM8FWmmyhCBBMQIIB+CA1OLma5R
 Hgfw==
X-Gm-Message-State: AJaThX7MoK2WgoB9S7A7v6EwhFIDKJd4eI4PLc/8urGpxXtPxW1jiULH
 8oCd36GmKMQPGAPPK5RTk6Ac+gsmvxvYebthVEo=
X-Google-Smtp-Source: AGs4zMZhnoWbpA6YnCwqY38FHMImWF1GsqFC4nHlyOAV/AY58NzYrPvdGQbM6lDNbnplMinA3rhe0KhZufF2c5q/WmY=
X-Received: by 10.223.152.234 with SMTP id w97mr25339949wrb.215.1512721142475; 
 Fri, 08 Dec 2017 00:19:02 -0800 (PST)
MIME-Version: 1.0
Received: by 10.223.197.68 with HTTP; Fri, 8 Dec 2017 00:18:21 -0800 (PST)
In-Reply-To: <20171208011430.GA16016@mcvoy.com>
References: <20171208011430.GA16016@mcvoy.com>
From: Johannes Lundberg <johalun0@gmail.com>
Date: Fri, 8 Dec 2017 08:18:21 +0000
Message-ID: <CAECmPwtcsHwiZpmx4+T_w3njEdUAjGZiRZKEX53m-QVJLSuY9Q@mail.gmail.com>
Subject: Re: OOM problem?
To: Larry McVoy <lm@mcvoy.com>
Cc: freebsd-hackers@freebsd.org
Content-Type: text/plain; charset="UTF-8"
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.25
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 08 Dec 2017 08:19:04 -0000

Regarding potential oom overhaul. Personally I like the idea of an oom
signal. The idea comes from iOS where applications get a callback when
system memory is low and they're given a chance to free unused
resources or resources that can easily be recreated, before getting
killed completely.

On FreeBSD, occasionally my Firefox gets killed by the oom routine
when it uses +5 GB and I run other heavy stuff like poudriere (yes, as
I move around a lot I don't own a desktop, all work done on laptop).
Wouldn't it be nice if Firefox instead could get a signal where it can
free old tabs' contents and stay alive or at least shut down cleanly
instead of being forcibly killed. Processes like poudriere could
throttle down number of jails in oom situation.

Having a rather small SSD like many laptops, I don't want to waste a
lot of space on swap. Actually I rather not use swap at all on SSD due
to wear.

Just an idea how to improve the FreeBSD laptop experience and could as
well solve some of OP's issues I think...

On Fri, Dec 8, 2017 at 1:14 AM, Larry McVoy <lm@mcvoy.com> wrote:
> Hi hackers,
>
> I've been playing around on a box that Netflix loaned me, I'm thinking
> about novel ways to deal with NUMA issues.
>
> I ran into a problem with the kernel, wanted to check in and see if
> anyone cares (I've got a couple different ways that it could be fixed
> but if noone cares I'll drop it).  It's sort of an ugly problem in that
> when it happens your only recourse is to power cycle the machine, you
> can't kill off the processes causing the problem.
>
> I was trying to create benchmarks that would show what the system could do
> if you locked things down to different NUMA domains (BTW, the NUMA stuff
> is a complete red herring, the problem I'm about to describe happens if
> NUMA support isn't enabled).
>
> The machine is running 12.0-CURRENT FreeBSD 12.0-CURRENT #13 ce7b9882181
> with a few diffs I did for debugging and a tweak to the pageout daemon
> suggested by Jeff.  It is a 256GB of RAM machine configured with no swap
> space (that detail is important).
>
> I created a set of 10 processes that malloced 25GB each and read it
> repeatedly.  That was enough memory pressure to use up all of free mem.
>
> Here is the problem.  All of these "misbehaved" (by using lots of ram)
> processes go to sleep, I believe in vm_wait().  They are all waiting
> for more ram so the pageout daemon is kicked but to no avail, all the
> ram is tied up in the processes that want more ram.  The pageout daemon
> kicks out what it can but it quickly gets to the point that it scans
> everything and finds nothing (I know this because I added debugging to
> show that's what it is doing).
>
> The OOM code kicks in and it behaves poorly.  It doesn't kill any of
> the big processes, those are all sleeping without PCATCH on so they are
> skipped.  The OOM code starts killing off anything it can find, it was
> killing getty, ssh, bash, dhclient.  One buglet is that, in my opinion,
> it finds stuff to kill that it probably shouldn't.  Anything that init
> will respawn is fine, anything that would not be respawned should be
> run as not killable.  Seems like an audit of those processes might be
> in order.
>
> I know that you'll ask why no swap?  Just add swap and the problem
> goes away.  Does it?  I don't think so, that's just kicking the can
> down the road.  If we add 256GB of swap now we have a 512GB bag to fill,
> fill that and I think we're right back to where we started.
>
> What are the ideas for fixing it?  I've got two.  I think the first
> one is a bit hard to get right and I'm not sure if the second one will
> work (sorry, it's been a long time since I was a kernel hack, like SunOS
> 4.x long time).
>
> A) Don't allocate more mem than you have.  This problem exists simply
>    because the system allowed malloc to return more space than the
>    system had.  If the system kept track of all the mem it has (ram
>    plus swap) and when processes asked for an allocation that pushed it
>    over that limit, fail that allocation.  It's yet another globally
>    locked thing (though Jeff's NUMA stuff may make that better), you
>    have to keep track of allocations and frees (as in on exit(2) not
>    free(3)), that's why I think it's detail oriented to do it this way.
>    Probably the right way but has to be done carefully and someone has
>    to care enough to keep watching that this doesn't get broken.
>
> B) Sleep with PCATCH, if that doesn't work, loop sleeping for a period,
>    wake up and see if you are signaled.  I'm rusty enough that I don't
>    remember if msleep() with PCATCH will catch signals or not (I don't
>    remember a msleep(), that might be a BSD thing and not a SunOS thing).
>    But whatever, either it catches signals or you replace that sleep with
>    a loop that sleeps for a second or so, wakes up and looks to see if it's
>    been signaled and if so dies, else goes back to sleep waiting for pageout
>    and/or OOM to free some mem.
>
> I kinda like B better because it seems harder to have that approach bit rot.
> I'm wondering if anyone cares about this problem.  If no, fine.  If yes,
> I can cons up a test case and hand that off to someone who wants to fix
> the problem.  If noone wants to fix it, I'll give it a try but I'd like
> feedback on the above approaches, not interested in going down a rathole
> for no good reason.
>
> Thanks,
>
> --lm
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"