Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 15 Oct 2014 23:56:33 -0600
From:      "Justin T. Gibbs" <gibbs@FreeBSD.org>
To:        freebsd-current@freebsd.org
Cc:        alc@FreeBSD.org, Andriy Gapon <avg@freebsd.org>
Subject:   OOM killer and kernel cache reclamation rate limit in vm_pageout_scan()
Message-ID:  <C64FB06B-AC9D-4A84-9CBB-8ED45CA6A315@FreeBSD.org>

next in thread | raw e-mail | index | archive | help
avg pointed out the rate limiting code in vm_pageout_scan() during =
discussion about PR 187594.  While it certainly can contribute to the =
problems discussed in that PR, a bigger problem is that it can allow the =
OOM killer to be triggered even though there is plenty of reclaimable =
memory available in the system.  Any load that can consume enough pages =
within the polling interval to hit the v_free_min threshold (e.g. =
multiple 'dd if=3D/dev/zero of=3D/file/on/zfs') can make this happen.

The product I=92m working on does not have swap configured and treats =
any OOM trigger as fatal, so it is very obvious when this happens. :-)

I=92ve tried several things to mitigate the problem.  The first was to =
ignore rate limiting for pass 2.  However, even though ZFS is guaranteed =
to receive some feedback prior to OOM being declared, my testing showed =
that a trivial load (a couple dd operations) could still consume enough =
of the reclaimed space to leave the system below its target at the end =
of pass 2.  After removing the rate limiting entirely, I=92ve so far =
been unable to kill the system via a ZFS induced load.

I understand the motivation behind the rate limiting, but the current =
implementation seems too simplistic to be safe.  The documentation for =
the Solaris slab allocator provides good motivation for their approach =
of using a =93sliding average=94 to reign in temporary bursts of usage =
without unduly harming efficient service for the recorded steady-state =
memory demand.  Regardless of the approach taken, I believe that the OOM =
killer must be a last resort and shouldn=92t be called when there are =
caches that can be culled.

One other thing I=92ve noticed in my testing with ZFS is that it needs =
feedback and a little time to react to memory pressure.  Calling it=92s =
lowmem handler just once isn=92t enough for it to limit in-flight writes =
so it can avoid reuse of pages that it just freed up.  But, it doesn=92t =
take too long to react (> 1sec in the profiling I=92ve done).  Is there =
a way in vm_pageout_scan() that we can better record that progress is =
being made (pages were freed in the pass, even if some/all of them were =
consumed again) and allow more passes before the OOM killer is invoked =
in this case?

=97
Justin




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?C64FB06B-AC9D-4A84-9CBB-8ED45CA6A315>