From owner-freebsd-arch@FreeBSD.ORG Mon Oct 22 15:20:53 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id D75D9AD0; Mon, 22 Oct 2012 15:20:53 +0000 (UTC) (envelope-from alc@rice.edu) Received: from mh11.mail.rice.edu (mh11.mail.rice.edu [128.42.199.30]) by mx1.freebsd.org (Postfix) with ESMTP id 9FC378FC0A; Mon, 22 Oct 2012 15:20:53 +0000 (UTC) Received: from mh11.mail.rice.edu (localhost.localdomain [127.0.0.1]) by mh11.mail.rice.edu (Postfix) with ESMTP id 888204C03EA; Mon, 22 Oct 2012 10:20:52 -0500 (CDT) Received: from mh11.mail.rice.edu (localhost.localdomain [127.0.0.1]) by mh11.mail.rice.edu (Postfix) with ESMTP id 86E564C03AC; Mon, 22 Oct 2012 10:20:52 -0500 (CDT) X-Virus-Scanned: by amavis-2.7.0 at mh11.mail.rice.edu, auth channel Received: from mh11.mail.rice.edu ([127.0.0.1]) by mh11.mail.rice.edu (mh11.mail.rice.edu [127.0.0.1]) (amavis, port 10026) with ESMTP id Qml3ogMGiWEC; Mon, 22 Oct 2012 10:20:52 -0500 (CDT) Received: from adsl-216-63-78-18.dsl.hstntx.swbell.net (adsl-216-63-78-18.dsl.hstntx.swbell.net [216.63.78.18]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) (Authenticated sender: alc) by mh11.mail.rice.edu (Postfix) with ESMTPSA id E9D854C0274; Mon, 22 Oct 2012 10:20:51 -0500 (CDT) Message-ID: <50856452.40902@rice.edu> Date: Mon, 22 Oct 2012 10:20:50 -0500 From: Alan Cox User-Agent: Mozilla/5.0 (X11; FreeBSD i386; rv:8.0) Gecko/20111113 Thunderbird/8.0 MIME-Version: 1.0 To: Marcel Moolenaar Subject: Re: Behavior of madvise(MADV_FREE) References: <9FEBC10C-C453-41BE-8829-34E830585E90@xcllnt.net> <4835.1350062021@critter.freebsd.dk> <5082F0F3.1070102@rice.edu> <8D2E1B5A-6DD1-49E3-8F55-B3B816449FFB@xcllnt.net> In-Reply-To: <8D2E1B5A-6DD1-49E3-8F55-B3B816449FFB@xcllnt.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Poul-Henning Kamp , "freebsd-arch@freebsd.org Arch" , Tim LaBerge , Jason Evans X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Oct 2012 15:20:53 -0000 On 10/20/2012 21:33, Marcel Moolenaar wrote: > On Oct 20, 2012, at 11:44 AM, Alan Cox wrote: >>> Also, moving the complexity of exactly which hint to give the >>> kernel under different scenarios isn't that appealing at all. >>> It just doesn't scale. >> >> I think that you're being a bit too pessimistic here. If your use case really corresponds to "this memory is free and will not be reused (or reallocated for a very long time)", then that is qualitatively very different from the way malloc(3) uses MADV_FREE. malloc(3)'s use of MADV_FREE is highly speculative. It doesn't really know what the application is going to do in the future. I don't think that having two distinct hints that distinguish between "speculative" and "non-speculative" uses would be problematic. The distinction is real and also easy to explain. The only danger is that application writers really don't understand their application and use the wrong hint. > Maybe. I need to think about this. On the surface it's hard to > belief that any allocator can reliably predict the future, so > all hints are speculative in that sense. I do buy into the fact > that malloc(3) has no a priori knowledge of the behavior of an > application and an application with a special-purpose allocator > has an allocator with more knowledge of the behavior. I'm just > not sure this warrants different hints. > > I agree that the more complicated the hints the more likely they > are not being used at all or they are used the wrong way. > >>> ... If some VM changes warrant a new hint >>> to madvise(), you may end up changing multiple daemons. It >>> seems better to have just 1 hint (i.e. MADV_FREE) and have the >>> kernel change its behaviour depending on the situation. When >>> there's plenty of memory, you may even ignore the hint. Under >>> severe memory pressure you may want to free up the page right >>> away so that you can give it to some thread that's waiting >>> for a page. >> >> How is this really different from the existing behavior? If a thread is waiting for a page, then the page daemon is running. In particular, it is moving pages from the head of the inactive queue, where they were placed by MADV_FREE, to the cache/free queue and waking up the waiting thread when the aggregate cache/free target is met. > What we see with FreeBSD 6.1 is that memory remains inactive > indefinitely. If the behaviour has changed in more recent > versions, then we'll reap the benefits soon. If not, then we > (= Juniper) may want to look into this. You might have a look at the commit message for svn revision 172317. It describes a problem that existed in FreeBSD from 5.x up until 7.0, when the physical memory allocator was replaced. Effectively, the page daemon could wind up spinning its wheels. However, one thing hasn't changed we only move pages from the inactive queue to the cache queue as required to meet the aggregate target on cache and free pages. If the demand for memory is static, we won't move pages from the inactive queue to the cache queue. > >>> At the edge of needing to swap, complex algorithms >>> may be worthwhile -- or maybe not. I don't know. >>> >>> This leads to: >>> 1. Keep MADV_FREE as it behaves in FreeBSD right now or make >>> it even more sloppy. >> >> I'm not sure that I understand what you mean by "sloppy" here. Can you elaborate? > It's just a sloppy way of saying that the hint can be ignored > altogether or that we simply mark the page as clean and not > do anything else. The point was mostly that the performance > argument is more important. > > >> 2. Have an idle thread that moves inactive pages to the cache >>> or free queue if they've been inactive for X minutes, for >>> some tunable X. Have it back off when the pageout daemon >>> kicks in. >> >> The existing page daemon already wakes up periodically and looks around for something to do. In particular, have a look at vm_pageout_page_stats(). That function tries to do something analogous to what you propose. In part, it tries to prevent munmap(2)ed file-backed pages from getting stuck in the active queue. > I'll take a look. That's good to know. > > \begin{disclaimer} > Juniper's problem is being stuck with an obsolete version of > FreeBSD and we're likely to look for solutions to problems that > don't exist anymore in recent versions. Just bear with us for > a while :-) > \end{disclaimer} >