Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 12 Oct 2012 14:00:20 -0700
From:      Marcel Moolenaar <marcel@xcllnt.net>
To:        Alan Cox <alc@rice.edu>
Cc:        Tim LaBerge <tlaberge@juniper.net>, "freebsd-arch@freebsd.org Arch" <freebsd-arch@freebsd.org>
Subject:   Re: Behavior of madvise(MADV_FREE)
Message-ID:  <186E5ECB-120E-4E49-96B4-485E2676C05F@xcllnt.net>
In-Reply-To: <50785A62.5050603@rice.edu>
References:  <9FEBC10C-C453-41BE-8829-34E830585E90@xcllnt.net> <50785A62.5050603@rice.edu>

next in thread | previous in thread | raw e-mail | index | archive | help

On Oct 12, 2012, at 10:58 AM, Alan Cox <alc@rice.edu> wrote:

>> Now on to the questions:
>> 1.  madvise(MADV_FREE) marks the pages as clean and moves
>>     them to the inactive queue. Why isn't the reference
>>     state cleared on either the page or the TLB?
>=20
> It is, at least 31 out of 32 times that vm_page_dontneed() is called.  =
=46rom vm_page_dontneed(), which is called by madvise(MADV_FREE):
>=20
>        /*
>         * Clear any references to the page.  Otherwise, the page =
daemon will
>         * immediately reactivate the page.
>         *
>         * Perform the pmap_clear_reference() first.  Otherwise, a =
concurrent
>         * pmap operation, such as pmap_remove(), could clear a =
reference in
>         * the pmap and set PGA_REFERENCED on the page before the
>         * pmap_clear_reference() had completed.  Consequently, the =
page would
>         * appear referenced based upon an old reference that occurred =
before
>         * this function ran.
>         */
>        pmap_clear_reference(m);
>        vm_page_aflag_clear(m, PGA_REFERENCED);

Ah... I missed this. I didn't look in vm_page_dontneed() for
this. I thought current FreeBSD behaved the same as 6.1-ish.

>> 2.  Why aren't the pages moved to the cache queue in the
>>     first place?
>=20
> Because this would make madvise(MADV_FREE) considerably more =
expensive, for example, the pages would have to be unmapped.  Your =
situation may be different, but more often than not, people call =
madvise(MADV_FREE) when memory is plentiful, and there is no need to do =
anything.  In other words, the page daemon isn't going to need to run =
anytime soon.  For example, when madvise(MADV_FREE) is used in =
implementations of malloc() and free(), the vast majority of calls to =
madvise(MADV_FREE) are pointless.  They are pointless in that soon after =
the madvise(MADV_FREE) call by the free() implementation, either (1) the =
application turns around and allocates more memory causing the =
MADV_FREE'd memory to be used once again or (2) the process terminates =
before the page daemon runs.  Consequently, the implementation of =
madvise(MADV_FREE) does the minimal necessary work so that if memory =
does become scarce and the page daemon has to run, that the MADV_FREE'd =
pages are first in line for reclamation.

Understood. Thanks.

>> Ad 2:
>> MADV_DONTNEED is there to signal that the pages contain
>> valid data, but that the page is not needed right now.
>> Using this, pages get moved to the inactive queue. That
>> makes sense. But MADV_FREE signals that there's no valid
>> data anymore and that the page may be demand zeroed on
>> next reference. The page is not inactive. It's free. If
>> the paged was zeroed before calling MADV_FREE, the page
>> really caches contents that that can be recreated later
>> (the demand zero).
>=20
> There is also another way of looking at it.  By leaving the pages =
allocated and mapped, you are saving time, i.e., CPU cycles, for the all =
to common case that the MADV_FREE'd pages are used again in the near =
future.
>=20
> It wouldn't be illogical to have to two variants of MADV_FREE.  One =
for use by folks like yourself who can say definitively that the pages =
won't be accessed again and should really be freed, and the current =
implementation for more speculative uses like in the malloc() and free() =
implementation.  Better yet, the second case would be replaced by a =
notification from the kernel to the process when memory is actually =
becoming scarce so that we won't waste cycles on any pointless madvise() =
calls by the process.

This aligns with phk@s suggestion of MADV_RECYCLE. We may
want to play with this.

--=20
Marcel Moolenaar
marcel@xcllnt.net





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?186E5ECB-120E-4E49-96B4-485E2676C05F>