Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 20 Oct 2012 13:44:03 -0500
From:      Alan Cox <alc@rice.edu>
To:        Marcel Moolenaar <marcel@xcllnt.net>
Cc:        Poul-Henning Kamp <phk@phk.freebsd.dk>, Tim LaBerge <tlaberge@juniper.net>, Jason Evans <jasone@FreeBSD.org>, Alan Cox <alc@rice.edu>, "freebsd-arch@freebsd.org Arch" <freebsd-arch@FreeBSD.org>
Subject:   Re: Behavior of madvise(MADV_FREE)
Message-ID:  <5082F0F3.1070102@rice.edu>
In-Reply-To: <F67D539D-8BE3-4817-8466-C76DE43AE252@xcllnt.net>
References:  <9FEBC10C-C453-41BE-8829-34E830585E90@xcllnt.net> <4835.1350062021@critter.freebsd.dk> <E6A52D27-0D6A-4175-9ECA-ADE25BFF35C2@xcllnt.net> <F71ACE9D-297E-4565-BB8D-D95D46D90708@freebsd.org> <F67D539D-8BE3-4817-8466-C76DE43AE252@xcllnt.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On 10/15/2012 11:01, Marcel Moolenaar wrote:
> On Oct 12, 2012, at 3:05 PM, Jason Evans<jasone@FreeBSD.org>  wrote:
>
>> On Oct 12, 2012, at 1:54 PM, Marcel Moolenaar wrote:
>>> BTW: MADV_DONTNEED in Linux seems to behave like MADV_FREE
>>> in FreeBSD -- at least according to the manpage. Which makes
>>> me wonder how standard madvise(2) is anyway.
>> MADV_DONTNEED on Linux immediately dissociates the physical page from the VM mapping, such that subsequent access results in a zero-filled page being soft-faulted into place.
>>
>> MADV_FREE is *way* nicer than MADV_DONTNEED in the context of malloc.  jemalloc has a really discouraging amount of complexity that is directly a result of working around the performance overhead of MADV_DONTNEED.
> I've been letting this thread sink in -- responding to last.
>
> Vendors, like Juniper want reliable VM statistics to prevent
> over-provisioning. While the stats don't need to be exact at
> all times (i.e. instantaneous), having the stats catch up to
> a new steady state is very desirable. In other words: it's
> not that helpful to have lots of memory on the inactive queue
> indefinitely.


I'm sympathetic.  Once upon a time, I was often called upon to explain 
to network administrators why their idle web cache didn't have oodles of 
"free" memory and how this wasn't a problem.


> Also, moving the complexity of exactly which hint to give the
> kernel under different scenarios isn't that appealing at all.
> It just doesn't scale.


I think that you're being a bit too pessimistic here.  If your use case 
really corresponds to "this memory is free and will not be reused (or 
reallocated for a very long time)", then that is qualitatively very 
different from the way malloc(3) uses MADV_FREE.  malloc(3)'s use of 
MADV_FREE is highly speculative.  It doesn't really know what the 
application is going to do in the future.  I don't think that having two 
distinct hints that distinguish between "speculative" and 
"non-speculative" uses would be problematic.  The distinction is real 
and also easy to explain.  The only danger is that application writers 
really don't understand their application and use the wrong hint.


> ... If some VM changes warrant a new hint
> to madvise(), you may end up changing multiple daemons. It
> seems better to have just 1 hint (i.e. MADV_FREE) and have the
> kernel change its behaviour depending on the situation. When
> there's plenty of memory, you may even ignore the hint. Under
> severe memory pressure you may want to free up the page right
> away so that you can give it to some thread that's waiting
> for a page.


How is this really different from the existing behavior?  If a thread is 
waiting for a page, then the page daemon is running.  In particular, it 
is moving pages from the head of the inactive queue, where they were 
placed by MADV_FREE, to the cache/free queue and waking up the waiting 
thread when the aggregate cache/free target is met.


>   At the edge of needing to swap, complex algorithms
> may be worthwhile -- or maybe not. I don't know.
>
> This leads to:
> 1.  Keep MADV_FREE as it behaves in FreeBSD right now or make
>      it even more sloppy.


I'm not sure that I understand what you mean by "sloppy" here.  Can you 
elaborate?


> 2.  Have an idle thread that moves inactive pages to the cache
>      or free queue if they've been inactive for X minutes, for
>      some tunable X. Have it back off when the pageout daemon
>      kicks in.


The existing page daemon already wakes up periodically and looks around 
for something to do.  In particular, have a look at 
vm_pageout_page_stats().  That function tries to do something analogous 
to what you propose.  In part, it tries to prevent munmap(2)ed 
file-backed pages from getting stuck in the active queue.


> 3.  Have MADV_FREE behave like Linux's MADV_DONTNEED when the
>      machine is under significant/severe/some) memory pressure.
>
> Thoughts?
>




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5082F0F3.1070102>