Date: Wed, 15 Oct 2014 23:56:33 -0600 From: "Justin T. Gibbs" <gibbs@FreeBSD.org> To: freebsd-current@freebsd.org Cc: alc@FreeBSD.org, Andriy Gapon <avg@freebsd.org> Subject: OOM killer and kernel cache reclamation rate limit in vm_pageout_scan() Message-ID: <C64FB06B-AC9D-4A84-9CBB-8ED45CA6A315@FreeBSD.org>
next in thread | raw e-mail | index | archive | help
avg pointed out the rate limiting code in vm_pageout_scan() during = discussion about PR 187594. While it certainly can contribute to the = problems discussed in that PR, a bigger problem is that it can allow the = OOM killer to be triggered even though there is plenty of reclaimable = memory available in the system. Any load that can consume enough pages = within the polling interval to hit the v_free_min threshold (e.g. = multiple 'dd if=3D/dev/zero of=3D/file/on/zfs') can make this happen. The product I=92m working on does not have swap configured and treats = any OOM trigger as fatal, so it is very obvious when this happens. :-) I=92ve tried several things to mitigate the problem. The first was to = ignore rate limiting for pass 2. However, even though ZFS is guaranteed = to receive some feedback prior to OOM being declared, my testing showed = that a trivial load (a couple dd operations) could still consume enough = of the reclaimed space to leave the system below its target at the end = of pass 2. After removing the rate limiting entirely, I=92ve so far = been unable to kill the system via a ZFS induced load. I understand the motivation behind the rate limiting, but the current = implementation seems too simplistic to be safe. The documentation for = the Solaris slab allocator provides good motivation for their approach = of using a =93sliding average=94 to reign in temporary bursts of usage = without unduly harming efficient service for the recorded steady-state = memory demand. Regardless of the approach taken, I believe that the OOM = killer must be a last resort and shouldn=92t be called when there are = caches that can be culled. One other thing I=92ve noticed in my testing with ZFS is that it needs = feedback and a little time to react to memory pressure. Calling it=92s = lowmem handler just once isn=92t enough for it to limit in-flight writes = so it can avoid reuse of pages that it just freed up. But, it doesn=92t = take too long to react (> 1sec in the profiling I=92ve done). Is there = a way in vm_pageout_scan() that we can better record that progress is = being made (pages were freed in the pass, even if some/all of them were = consumed again) and allow more passes before the OOM killer is invoked = in this case? =97 Justin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?C64FB06B-AC9D-4A84-9CBB-8ED45CA6A315>