From owner-freebsd-hackers@FreeBSD.ORG Mon Aug 23 00:14:26 2010 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 02FDF106564A; Mon, 23 Aug 2010 00:14:26 +0000 (UTC) (envelope-from artemb@gmail.com) Received: from mail-vw0-f54.google.com (mail-vw0-f54.google.com [209.85.212.54]) by mx1.freebsd.org (Postfix) with ESMTP id 82BE48FC08; Mon, 23 Aug 2010 00:14:25 +0000 (UTC) Received: by vws7 with SMTP id 7so5554391vws.13 for ; Sun, 22 Aug 2010 17:14:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:cc:content-type:content-transfer-encoding; bh=r9FCry+Eg1eohNrqtsr3DynwTWe0jWjx7Vt0CEW3rtM=; b=tahhLFARIOcsPuGUqF2+WVbxIvOKEvNSYnueVaMLVf9SHe2ZahUuH9Ng77/VW7HCVF 6GEBNj/D1FbQLaTj8c0SmdXw7Tzw9LBonpnB7P0zFiw1HeitrtaAjK5SI5RyS4uyrZxD N8tRMl0hAhOwAgOhRwR6AKi6ZuKxU1cTsM0eE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=jT/RK5jBCFGn6ftRc9YPLpbB2iGax8UlLmw8LxaQQ+DrY+SqvaM8KOOFM98lYmViqz ilQqKyNFVEH+V/4B3ywsW/HZp/lltHuGqGPRibGzhepiAOOIo0U++4A43V2vU9BZz94m /kXNdMv70WAsawhYAYQk09aG9rOKnXdeEAJ9A= MIME-Version: 1.0 Received: by 10.220.88.167 with SMTP id a39mr2792109vcm.73.1282521120809; Sun, 22 Aug 2010 16:52:00 -0700 (PDT) Sender: artemb@gmail.com Received: by 10.220.49.70 with HTTP; Sun, 22 Aug 2010 16:52:00 -0700 (PDT) In-Reply-To: <4C719AB9.9020006@freebsd.org> References: <4C719AB9.9020006@freebsd.org> Date: Sun, 22 Aug 2010 16:52:00 -0700 X-Google-Sender-Auth: 5-hwxrj3geu8D5AWbwn-u5kaWJ8 Message-ID: From: Artem Belevich To: Andriy Gapon Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-hackers@freebsd.org, zfs-devel@freebsd.org Subject: Re: ZFS arc_reclaim_needed: better cooperation with pagedaemon X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Aug 2010 00:14:26 -0000 Do you by any chance have a graph showing kstat.zfs.misc.arcstats.size behavior in addition to the stuff included on your graphs now? All I can tell from your graphs is that v_free_count+v_cache_count shifted a bit lower relative to v_free_target+v_cache_min. It would be interesting to see what effect your patch has on ARC itself, especially when ARC will start giving up memory and when does it stop shrinking. --Artem On Sun, Aug 22, 2010 at 2:46 PM, Andriy Gapon wrote: > > I propose that the following code in arc_reclaim_needed > (sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c) > /* > =A0* If pages are needed or we're within 2048 pages > =A0* of needing to page need to reclaim > =A0*/ > if (vm_pages_needed || (vm_paging_target() > -2048)) > > be changed to > > if (vm_paging_needed()) > > Rationale. > > 1. Why not current checks. > > ARC sizing should cooperate with pagedaemon in freeing pages. > If ARC starts shrinking "prematurely", before pagedaemon is waked up then= no > potentially eligible inactive pages will be recycled and no potentially e= ligible > active pages will be inactive (subject to v_inactive_target). > This would lead to ARC size going to its minimum value (which could hurt = ZFS > performance). =A0Only after that there is a chance that pagedaemon would = be waked > up to do its cleaning. > And conversely, if ARC doesn't shrink in time, then pagedaemon would have= to > recycle pages with data that could be needed again soon and that would le= ad to > excessive swapping and disk I/O. > > vm_paging_target() is used only by pagedaemon internally, it effectively = sets > _upper_ limit on how many pages pagedaemon would free when it's activated= . > It is no indication of whether pagedaemon should be scanning/freeing page= s. > Thus check of vm_paging_target() leads to premature ARC shrinking. > I believe that many people observe this behavior on sufficiently active s= ystems > (not dedicated file servers) with few GB of RAM (1-8). > > vm_pages_needed check is redundant, because this is a flag that is used t= o wake > up pagedaemon. =A0So when it is set vm_paging_needed() is true and > vm_paging_target() is "way" above zero. =A0And this flag is reset to zero= when > vm_page_count_min() becomes false, which corresponds to even fewer free p= ages > than when vm_paging_needed() is true. > > > 2. Why the new check. > > vm_paging_needed() is the (earliest) condition that wakes up pagedaemon (= see > vm_page_alloc). =A0pagedaemon would first of all run vm_lowmem event for = which ARC > already has a handler and which would cause ARC size to shrink. > It would seems like having vm_paging_needed() check would be redundant th= en. > Almost - if memory pressure is significant, then vm_paging_needed() may s= tay > true for a while and that would cause additional ARC reduction by > arc_reclaim_thread. > > > Final notes. > > I think that > vm_paging_target() > -2048 > check was modeled after the check in the original OpenSolaris code: > freemem < lotsfree + needfree + extra > > The issue is that in my understanding OpenSolaris pagedaemon works differ= ently > from FreeBSD pagedaemon. > > OpenSolaris pagedaemon is activated when freemem (equivalent of our free = + > cache) falls down to a certain higher mark (lotsfree). =A0Initially it sc= ans pages > at a slow rate. =A0If freemem falls further the rate linearly increases u= ntil it > reaches its maximum when freemem goes to or below certain lower mark. > > Our pagedaemon is activated when free + cache falls down to a value when > vm_paging_needed() is true (see definition of this function). =A0When it = is > activated it makes a scan pass though inactive and active pages setting a > certain target for free+cache, but that target is "soft" and actually is = an > upper limit of how many pages could be freed during the pass. pagedaemon = would > make the second (or subsequent) pass only if free+cache falls to value th= at is > even lower than the threshold in vm_paging_needed(), which means signific= ant > (severe even) memory pressure/shortage. > So on sufficiently active system free+cache would typically oscillate bet= ween > v_free_reserved+v_cache_min at the bottom and some semi-random values "ne= ar" > v_free_target+v_cache_min at the tops. =A0That's when excluding ARC from = the picture. > > And about pictures :-) > Behavior of free+cache with current arc_reclaim_needed code: > http://people.freebsd.org/~avg/avail-mem-before.png > and its behavior after the patch: > http://people.freebsd.org/~avg/avail-mem-after.png > > The legends on the pictures are incorrect, sorry, my mastery of drraw is = not > good yet. > Correct legends: > "aqua" color - v_free_target+v_cache_min (vm_paging_target() =3D=3D 0) > "fuchsia" color - v_free_reserved+v_cache_min (vm_paging_needed() thresho= ld) > "lime" color - v_free_count+v_cache_count indeed :) > Y axis - % of total page count. > > I think the graphs speak for themselves. > > -- > Andriy Gapon > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org= " >