Date: Fri, 12 Oct 2012 12:58:58 -0500 From: Alan Cox <alc@rice.edu> To: Marcel Moolenaar <marcel@xcllnt.net> Cc: Tim LaBerge <tlaberge@juniper.net>, "freebsd-arch@freebsd.org Arch" <freebsd-arch@freebsd.org> Subject: Re: Behavior of madvise(MADV_FREE) Message-ID: <50785A62.5050603@rice.edu> In-Reply-To: <9FEBC10C-C453-41BE-8829-34E830585E90@xcllnt.net> References: <9FEBC10C-C453-41BE-8829-34E830585E90@xcllnt.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On 10/12/2012 11:25, Marcel Moolenaar wrote: > All, > > Juniper has been intrigued for a while about the beahviour > of madvise(MADV_FREE). Let me give a bit of context before > asking questions: > > 1. We have an important daemon that needs lots of memory. > It uses sbrk()/brk() to extend its address space and > uses madvise(MADV_FREE) to inform the kernel when a > chunk of memory is effectively unused (the chunk is > first zeroed). > 2. Most of the time memory usage of the daemon is pretty > stable, but under certain conditions it can spike, > after which it drops back to a new stability point > (either higher or lower). > 3. Obviously the daemon is not the only component in the > system, so whatever it doesn't need we very likely > need somewhere else -- badly in some cases, so we do > like immediate recycling then. > > Now on to the questions: > 1. madvise(MADV_FREE) marks the pages as clean and moves > them to the inactive queue. Why isn't the reference > state cleared on either the page or the TLB? It is, at least 31 out of 32 times that vm_page_dontneed() is called. From vm_page_dontneed(), which is called by madvise(MADV_FREE): /* * Clear any references to the page. Otherwise, the page daemon will * immediately reactivate the page. * * Perform the pmap_clear_reference() first. Otherwise, a concurrent * pmap operation, such as pmap_remove(), could clear a reference in * the pmap and set PGA_REFERENCED on the page before the * pmap_clear_reference() had completed. Consequently, the page would * appear referenced based upon an old reference that occurred before * this function ran. */ pmap_clear_reference(m); vm_page_aflag_clear(m, PGA_REFERENCED); > 2. Why aren't the pages moved to the cache queue in the > first place? Because this would make madvise(MADV_FREE) considerably more expensive, for example, the pages would have to be unmapped. Your situation may be different, but more often than not, people call madvise(MADV_FREE) when memory is plentiful, and there is no need to do anything. In other words, the page daemon isn't going to need to run anytime soon. For example, when madvise(MADV_FREE) is used in implementations of malloc() and free(), the vast majority of calls to madvise(MADV_FREE) are pointless. They are pointless in that soon after the madvise(MADV_FREE) call by the free() implementation, either (1) the application turns around and allocates more memory causing the MADV_FREE'd memory to be used once again or (2) the process terminates before the page daemon runs. Consequently, the implementation of madvise(MADV_FREE) does the minimal necessary work so that if memory does become scarce and the page daemon has to run, that the MADV_FREE'd pages are first in line for reclamation. > 3. What would be the impact or consequence of changing > the behaviour of madvise(MADV_FREE) to mark the page > as clean and unreferenced and have the page moved to > the cache queue (or free queue even)? > > Ad 1: > When the system is under memory pressure, the pageout > daemon scans the inactive queue in order to try to move > pages to the cache or free queue. With the MADV_FREE'd > pages still having PG_REFERENCE or the underlying TLBs > still having the access flag set, these pages actually > get bumped to the active queue. I don't see this in recent versions. Maybe this is a bug in the version you're looking at. Perhaps, a bug in your pmap_clear_reference()? > Ad 2: > MADV_DONTNEED is there to signal that the pages contain > valid data, but that the page is not needed right now. > Using this, pages get moved to the inactive queue. That > makes sense. But MADV_FREE signals that there's no valid > data anymore and that the page may be demand zeroed on > next reference. The page is not inactive. It's free. If > the paged was zeroed before calling MADV_FREE, the page > really caches contents that that can be recreated later > (the demand zero). There is also another way of looking at it. By leaving the pages allocated and mapped, you are saving time, i.e., CPU cycles, for the all to common case that the MADV_FREE'd pages are used again in the near future. It wouldn't be illogical to have to two variants of MADV_FREE. One for use by folks like yourself who can say definitively that the pages won't be accessed again and should really be freed, and the current implementation for more speculative uses like in the malloc() and free() implementation. Better yet, the second case would be replaced by a notification from the kernel to the process when memory is actually becoming scarce so that we won't waste cycles on any pointless madvise() calls by the process. Regards, Alan
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?50785A62.5050603>