Date: Sun, 22 Aug 2010 16:52:00 -0700 From: Artem Belevich <fbsdlist@src.cx> To: Andriy Gapon <avg@freebsd.org> Cc: freebsd-hackers@freebsd.org, zfs-devel@freebsd.org Subject: Re: ZFS arc_reclaim_needed: better cooperation with pagedaemon Message-ID: <AANLkTinreSt_Dk_J5vpZ6xrs=snqYu8zKfO0X6H-x_n3@mail.gmail.com> In-Reply-To: <4C719AB9.9020006@freebsd.org> References: <4C719AB9.9020006@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Do you by any chance have a graph showing kstat.zfs.misc.arcstats.size behavior in addition to the stuff included on your graphs now? All I can tell from your graphs is that v_free_count+v_cache_count shifted a bit lower relative to v_free_target+v_cache_min. It would be interesting to see what effect your patch has on ARC itself, especially when ARC will start giving up memory and when does it stop shrinking. --Artem On Sun, Aug 22, 2010 at 2:46 PM, Andriy Gapon <avg@freebsd.org> wrote: > > I propose that the following code in arc_reclaim_needed > (sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c) > /* > =A0* If pages are needed or we're within 2048 pages > =A0* of needing to page need to reclaim > =A0*/ > if (vm_pages_needed || (vm_paging_target() > -2048)) > > be changed to > > if (vm_paging_needed()) > > Rationale. > > 1. Why not current checks. > > ARC sizing should cooperate with pagedaemon in freeing pages. > If ARC starts shrinking "prematurely", before pagedaemon is waked up then= no > potentially eligible inactive pages will be recycled and no potentially e= ligible > active pages will be inactive (subject to v_inactive_target). > This would lead to ARC size going to its minimum value (which could hurt = ZFS > performance). =A0Only after that there is a chance that pagedaemon would = be waked > up to do its cleaning. > And conversely, if ARC doesn't shrink in time, then pagedaemon would have= to > recycle pages with data that could be needed again soon and that would le= ad to > excessive swapping and disk I/O. > > vm_paging_target() is used only by pagedaemon internally, it effectively = sets > _upper_ limit on how many pages pagedaemon would free when it's activated= . > It is no indication of whether pagedaemon should be scanning/freeing page= s. > Thus check of vm_paging_target() leads to premature ARC shrinking. > I believe that many people observe this behavior on sufficiently active s= ystems > (not dedicated file servers) with few GB of RAM (1-8). > > vm_pages_needed check is redundant, because this is a flag that is used t= o wake > up pagedaemon. =A0So when it is set vm_paging_needed() is true and > vm_paging_target() is "way" above zero. =A0And this flag is reset to zero= when > vm_page_count_min() becomes false, which corresponds to even fewer free p= ages > than when vm_paging_needed() is true. > > > 2. Why the new check. > > vm_paging_needed() is the (earliest) condition that wakes up pagedaemon (= see > vm_page_alloc). =A0pagedaemon would first of all run vm_lowmem event for = which ARC > already has a handler and which would cause ARC size to shrink. > It would seems like having vm_paging_needed() check would be redundant th= en. > Almost - if memory pressure is significant, then vm_paging_needed() may s= tay > true for a while and that would cause additional ARC reduction by > arc_reclaim_thread. > > > Final notes. > > I think that > vm_paging_target() > -2048 > check was modeled after the check in the original OpenSolaris code: > freemem < lotsfree + needfree + extra > > The issue is that in my understanding OpenSolaris pagedaemon works differ= ently > from FreeBSD pagedaemon. > > OpenSolaris pagedaemon is activated when freemem (equivalent of our free = + > cache) falls down to a certain higher mark (lotsfree). =A0Initially it sc= ans pages > at a slow rate. =A0If freemem falls further the rate linearly increases u= ntil it > reaches its maximum when freemem goes to or below certain lower mark. > > Our pagedaemon is activated when free + cache falls down to a value when > vm_paging_needed() is true (see definition of this function). =A0When it = is > activated it makes a scan pass though inactive and active pages setting a > certain target for free+cache, but that target is "soft" and actually is = an > upper limit of how many pages could be freed during the pass. pagedaemon = would > make the second (or subsequent) pass only if free+cache falls to value th= at is > even lower than the threshold in vm_paging_needed(), which means signific= ant > (severe even) memory pressure/shortage. > So on sufficiently active system free+cache would typically oscillate bet= ween > v_free_reserved+v_cache_min at the bottom and some semi-random values "ne= ar" > v_free_target+v_cache_min at the tops. =A0That's when excluding ARC from = the picture. > > And about pictures :-) > Behavior of free+cache with current arc_reclaim_needed code: > http://people.freebsd.org/~avg/avail-mem-before.png > and its behavior after the patch: > http://people.freebsd.org/~avg/avail-mem-after.png > > The legends on the pictures are incorrect, sorry, my mastery of drraw is = not > good yet. > Correct legends: > "aqua" color - v_free_target+v_cache_min (vm_paging_target() =3D=3D 0) > "fuchsia" color - v_free_reserved+v_cache_min (vm_paging_needed() thresho= ld) > "lime" color - v_free_count+v_cache_count indeed :) > Y axis - % of total page count. > > I think the graphs speak for themselves. > > -- > Andriy Gapon > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org= " >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AANLkTinreSt_Dk_J5vpZ6xrs=snqYu8zKfO0X6H-x_n3>