From owner-freebsd-current@FreeBSD.ORG Thu Oct 16 06:11:07 2014 Return-Path: Delivered-To: freebsd-current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A3A54DCF; Thu, 16 Oct 2014 06:11:07 +0000 (UTC) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 4D59ACC8; Thu, 16 Oct 2014 06:11:06 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id JAA13005; Thu, 16 Oct 2014 09:11:04 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1XeeHA-000AnF-LX; Thu, 16 Oct 2014 09:11:04 +0300 Message-ID: <543F612A.3060304@FreeBSD.org> Date: Thu, 16 Oct 2014 09:09:46 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.1.2 MIME-Version: 1.0 To: "Justin T. Gibbs" , freebsd-current@FreeBSD.org Subject: Re: OOM killer and kernel cache reclamation rate limit in vm_pageout_scan() References: In-Reply-To: Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit Cc: alc@FreeBSD.org X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Oct 2014 06:11:07 -0000 On 16/10/2014 08:56, Justin T. Gibbs wrote: > avg pointed out the rate limiting code in vm_pageout_scan() during discussion > about PR 187594. While it certainly can contribute to the problems discussed > in that PR, a bigger problem is that it can allow the OOM killer to be > triggered even though there is plenty of reclaimable memory available in the > system. Any load that can consume enough pages within the polling interval > to hit the v_free_min threshold (e.g. multiple 'dd if=/dev/zero > of=/file/on/zfs') can make this happen. > > The product I’m working on does not have swap configured and treats any OOM > trigger as fatal, so it is very obvious when this happens. :-) > > I’ve tried several things to mitigate the problem. The first was to ignore > rate limiting for pass 2. However, even though ZFS is guaranteed to receive > some feedback prior to OOM being declared, my testing showed that a trivial > load (a couple dd operations) could still consume enough of the reclaimed > space to leave the system below its target at the end of pass 2. After > removing the rate limiting entirely, I’ve so far been unable to kill the > system via a ZFS induced load. > > I understand the motivation behind the rate limiting, but the current > implementation seems too simplistic to be safe. The documentation for the > Solaris slab allocator provides good motivation for their approach of using a > “sliding average” to reign in temporary bursts of usage without unduly > harming efficient service for the recorded steady-state memory demand. > Regardless of the approach taken, I believe that the OOM killer must be a > last resort and shouldn’t be called when there are caches that can be > culled. FWIW, I have this toy branch: https://github.com/avg-I/freebsd/compare/experiment/uma-cache-trimming Not all commits are relevant to the problem and some things are unfinished. Not sure if the changes would help your case either... > One other thing I’ve noticed in my testing with ZFS is that it needs feedback > and a little time to react to memory pressure. Calling it’s lowmem handler > just once isn’t enough for it to limit in-flight writes so it can avoid reuse > of pages that it just freed up. But, it doesn’t take too long to react (> I've been thinking about this and maybe we need to make arc_memory_throttle() more aggressive on FreeBSD. I can't say that I really follow the logic of that code, though. > 1sec in the profiling I’ve done). Is there a way in vm_pageout_scan() that > we can better record that progress is being made (pages were freed in the > pass, even if some/all of them were consumed again) and allow more passes > before the OOM killer is invoked in this case? -- Andriy Gapon