Date: Mon, 23 Aug 2010 00:46:33 +0300 From: Andriy Gapon <avg@freebsd.org> To: zfs-devel@freebsd.org, freebsd-hackers@freebsd.org Subject: ZFS arc_reclaim_needed: better cooperation with pagedaemon Message-ID: <4C719AB9.9020006@freebsd.org>
next in thread | raw e-mail | index | archive | help
I propose that the following code in arc_reclaim_needed (sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c) /* * If pages are needed or we're within 2048 pages * of needing to page need to reclaim */ if (vm_pages_needed || (vm_paging_target() > -2048)) be changed to if (vm_paging_needed()) Rationale. 1. Why not current checks. ARC sizing should cooperate with pagedaemon in freeing pages. If ARC starts shrinking "prematurely", before pagedaemon is waked up then no potentially eligible inactive pages will be recycled and no potentially eligible active pages will be inactive (subject to v_inactive_target). This would lead to ARC size going to its minimum value (which could hurt ZFS performance). Only after that there is a chance that pagedaemon would be waked up to do its cleaning. And conversely, if ARC doesn't shrink in time, then pagedaemon would have to recycle pages with data that could be needed again soon and that would lead to excessive swapping and disk I/O. vm_paging_target() is used only by pagedaemon internally, it effectively sets _upper_ limit on how many pages pagedaemon would free when it's activated. It is no indication of whether pagedaemon should be scanning/freeing pages. Thus check of vm_paging_target() leads to premature ARC shrinking. I believe that many people observe this behavior on sufficiently active systems (not dedicated file servers) with few GB of RAM (1-8). vm_pages_needed check is redundant, because this is a flag that is used to wake up pagedaemon. So when it is set vm_paging_needed() is true and vm_paging_target() is "way" above zero. And this flag is reset to zero when vm_page_count_min() becomes false, which corresponds to even fewer free pages than when vm_paging_needed() is true. 2. Why the new check. vm_paging_needed() is the (earliest) condition that wakes up pagedaemon (see vm_page_alloc). pagedaemon would first of all run vm_lowmem event for which ARC already has a handler and which would cause ARC size to shrink. It would seems like having vm_paging_needed() check would be redundant then. Almost - if memory pressure is significant, then vm_paging_needed() may stay true for a while and that would cause additional ARC reduction by arc_reclaim_thread. Final notes. I think that vm_paging_target() > -2048 check was modeled after the check in the original OpenSolaris code: freemem < lotsfree + needfree + extra The issue is that in my understanding OpenSolaris pagedaemon works differently from FreeBSD pagedaemon. OpenSolaris pagedaemon is activated when freemem (equivalent of our free + cache) falls down to a certain higher mark (lotsfree). Initially it scans pages at a slow rate. If freemem falls further the rate linearly increases until it reaches its maximum when freemem goes to or below certain lower mark. Our pagedaemon is activated when free + cache falls down to a value when vm_paging_needed() is true (see definition of this function). When it is activated it makes a scan pass though inactive and active pages setting a certain target for free+cache, but that target is "soft" and actually is an upper limit of how many pages could be freed during the pass. pagedaemon would make the second (or subsequent) pass only if free+cache falls to value that is even lower than the threshold in vm_paging_needed(), which means significant (severe even) memory pressure/shortage. So on sufficiently active system free+cache would typically oscillate between v_free_reserved+v_cache_min at the bottom and some semi-random values "near" v_free_target+v_cache_min at the tops. That's when excluding ARC from the picture. And about pictures :-) Behavior of free+cache with current arc_reclaim_needed code: http://people.freebsd.org/~avg/avail-mem-before.png and its behavior after the patch: http://people.freebsd.org/~avg/avail-mem-after.png The legends on the pictures are incorrect, sorry, my mastery of drraw is not good yet. Correct legends: "aqua" color - v_free_target+v_cache_min (vm_paging_target() == 0) "fuchsia" color - v_free_reserved+v_cache_min (vm_paging_needed() threshold) "lime" color - v_free_count+v_cache_count indeed :) Y axis - % of total page count. I think the graphs speak for themselves. -- Andriy Gapon
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4C719AB9.9020006>