From owner-freebsd-current@FreeBSD.ORG Thu Oct 16 05:56:44 2014 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id F0AFF89F; Thu, 16 Oct 2014 05:56:43 +0000 (UTC) Received: from aslan.scsiguy.com (www.scsiguy.com [70.89.174.89]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id A781BBBF; Thu, 16 Oct 2014 05:56:43 +0000 (UTC) Received: from [192.168.0.61] (jt-mbp.home.scsiguy.org [192.168.0.61]) (authenticated bits=0) by aslan.scsiguy.com (8.14.9/8.14.9) with ESMTP id s9G5uXK8078988 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Wed, 15 Oct 2014 23:56:35 -0600 (MDT) (envelope-from gibbs@FreeBSD.org) From: "Justin T. Gibbs" Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Subject: OOM killer and kernel cache reclamation rate limit in vm_pageout_scan() Date: Wed, 15 Oct 2014 23:56:33 -0600 Message-Id: To: freebsd-current@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) X-Mailer: Apple Mail (2.1878.6) Cc: alc@FreeBSD.org, Andriy Gapon X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Oct 2014 05:56:44 -0000 avg pointed out the rate limiting code in vm_pageout_scan() during = discussion about PR 187594. While it certainly can contribute to the = problems discussed in that PR, a bigger problem is that it can allow the = OOM killer to be triggered even though there is plenty of reclaimable = memory available in the system. Any load that can consume enough pages = within the polling interval to hit the v_free_min threshold (e.g. = multiple 'dd if=3D/dev/zero of=3D/file/on/zfs') can make this happen. The product I=92m working on does not have swap configured and treats = any OOM trigger as fatal, so it is very obvious when this happens. :-) I=92ve tried several things to mitigate the problem. The first was to = ignore rate limiting for pass 2. However, even though ZFS is guaranteed = to receive some feedback prior to OOM being declared, my testing showed = that a trivial load (a couple dd operations) could still consume enough = of the reclaimed space to leave the system below its target at the end = of pass 2. After removing the rate limiting entirely, I=92ve so far = been unable to kill the system via a ZFS induced load. I understand the motivation behind the rate limiting, but the current = implementation seems too simplistic to be safe. The documentation for = the Solaris slab allocator provides good motivation for their approach = of using a =93sliding average=94 to reign in temporary bursts of usage = without unduly harming efficient service for the recorded steady-state = memory demand. Regardless of the approach taken, I believe that the OOM = killer must be a last resort and shouldn=92t be called when there are = caches that can be culled. One other thing I=92ve noticed in my testing with ZFS is that it needs = feedback and a little time to react to memory pressure. Calling it=92s = lowmem handler just once isn=92t enough for it to limit in-flight writes = so it can avoid reuse of pages that it just freed up. But, it doesn=92t = take too long to react (> 1sec in the profiling I=92ve done). Is there = a way in vm_pageout_scan() that we can better record that progress is = being made (pages were freed in the pass, even if some/all of them were = consumed again) and allow more passes before the OOM killer is invoked = in this case? =97 Justin