From owner-freebsd-current@FreeBSD.ORG  Thu Oct 16 06:11:07 2014
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id A3A54DCF;
 Thu, 16 Oct 2014 06:11:07 +0000 (UTC)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id 4D59ACC8;
 Thu, 16 Oct 2014 06:11:06 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
 [212.40.38.100])
 by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id JAA13005;
 Thu, 16 Oct 2014 09:11:04 +0300 (EEST)
 (envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
 by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
 id 1XeeHA-000AnF-LX; Thu, 16 Oct 2014 09:11:04 +0300
Message-ID: <543F612A.3060304@FreeBSD.org>
Date: Thu, 16 Oct 2014 09:09:46 +0300
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:31.0) Gecko/20100101 Thunderbird/31.1.2
MIME-Version: 1.0
To: "Justin T. Gibbs" <gibbs@FreeBSD.org>, freebsd-current@FreeBSD.org
Subject: Re: OOM killer and kernel cache reclamation rate limit in
 vm_pageout_scan()
References: <C64FB06B-AC9D-4A84-9CBB-8ED45CA6A315@FreeBSD.org>
In-Reply-To: <C64FB06B-AC9D-4A84-9CBB-8ED45CA6A315@FreeBSD.org>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 8bit
Cc: alc@FreeBSD.org
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
 <freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current/>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
 <mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 16 Oct 2014 06:11:07 -0000

On 16/10/2014 08:56, Justin T. Gibbs wrote:
> avg pointed out the rate limiting code in vm_pageout_scan() during discussion
> about PR 187594.  While it certainly can contribute to the problems discussed
> in that PR, a bigger problem is that it can allow the OOM killer to be
> triggered even though there is plenty of reclaimable memory available in the
> system.  Any load that can consume enough pages within the polling interval
> to hit the v_free_min threshold (e.g. multiple 'dd if=/dev/zero
> of=/file/on/zfs') can make this happen.
> 
> The product I’m working on does not have swap configured and treats any OOM
> trigger as fatal, so it is very obvious when this happens. :-)
> 
> I’ve tried several things to mitigate the problem.  The first was to ignore
> rate limiting for pass 2.  However, even though ZFS is guaranteed to receive
> some feedback prior to OOM being declared, my testing showed that a trivial
> load (a couple dd operations) could still consume enough of the reclaimed
> space to leave the system below its target at the end of pass 2.  After
> removing the rate limiting entirely, I’ve so far been unable to kill the
> system via a ZFS induced load.
> 
> I understand the motivation behind the rate limiting, but the current
> implementation seems too simplistic to be safe.  The documentation for the
> Solaris slab allocator provides good motivation for their approach of using a
> “sliding average” to reign in temporary bursts of usage without unduly
> harming efficient service for the recorded steady-state memory demand.
> Regardless of the approach taken, I believe that the OOM killer must be a
> last resort and shouldn’t be called when there are caches that can be
> culled.

FWIW, I have this toy branch:
https://github.com/avg-I/freebsd/compare/experiment/uma-cache-trimming

Not all commits are relevant to the problem and some things are unfinished.
Not sure if the changes would help your case either...

> One other thing I’ve noticed in my testing with ZFS is that it needs feedback
> and a little time to react to memory pressure.  Calling it’s lowmem handler
> just once isn’t enough for it to limit in-flight writes so it can avoid reuse
> of pages that it just freed up.  But, it doesn’t take too long to react (>

I've been thinking about this and maybe we need to make arc_memory_throttle()
more aggressive on FreeBSD.  I can't say that I really follow the logic of that
code, though.

> 1sec in the profiling I’ve done).  Is there a way in vm_pageout_scan() that
> we can better record that progress is being made (pages were freed in the
> pass, even if some/all of them were consumed again) and allow more passes
> before the OOM killer is invoked in this case?

-- 
Andriy Gapon