Date: Fri, 24 Feb 2012 16:39:11 -0500 From: Ryan Stone <rysto32@gmail.com> To: freebsd-hackers@freebsd.org Subject: vm_pageout_page_stats() calling pmap_remove_all() on pages that it deactivates Message-ID: <CAFMmRNwe5MK5q46xsdRg=SE_7tuEneHv2Ad8zauSEDn3n9-eaQ@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
Near the end of vm_pageout_page_stats() there is the following code: if (m->act_count == 0) { /* * We turn off page access, so that we have * more accurate RSS stats. We don't do this * in the normal page deactivation when the * system is loaded VM wise, because the * cost of the large number of page protect * operations would be higher than the value * of doing the operation. */ pmap_remove_all(m); vm_page_deactivate(m); } I question how useful it is to remove m from every pmap. The stated reasoning in the comment above is that it makes for more accurate RSS statistics. However, vm_pageout_page_stats() only does anything at all when there is a shortage of inactive+cache+free pages, so I find the assertion that this leads to a "more accurate" RSS accounting a bit specious when the rest of the VM subsystem isn't trying to provide accurate RSS stats at all. Besides, the page is still resident in memory if we've only deactivated it. This code seems to be conflating "resident set" with "working set". The page still being resident in memory is why I think removing the page from the pmap is the wrong thing to do here. The situation that lead me to looking at this code was pretty simple: I had a daemon running on a swapless system leaking memory. I was running a script that logged the output of top periodically. What I saw was the VSS and RSS of the daemon growing steadily over time until all of a sudden, its RSS dropped dramatically and the system's inactive page count dropped(this was vm_pageout_page_stats() kicking in due to the memory shortage). I was mislead into thinking that the daemon had just freed a lot of memory and that malloc had called madvise(..., MADV_FREE) to free the pages back to the kernel. Of course the newly deactivated pages could never be freed and I spent a day going in the entirely wrong direction before I figured out what vm_pageout_page_stats() was doing and stopped looking for bugs in madvise. In investigating this I did stumble upon one situation where removing the pages on deactivation lead to very bad behaviour. I had a system with a lot of wired memory and about 200MB free. I ran a test program to allocate nearly all of the free memory and then sit there sleeping, never touching it again. vm_pageout_page_stats() duly kicked in and deactivated the test program's memory, bringing its resident set down to a couple of KB. However that didn't actually succeed in freeing any memory, and so the next time I tried to ssh to it or something the OOM killer ended up having to be invoked. It went on a mass killing spree, but never even considered killing my test program because its RSS was so small(despite the fact that it was holding on to most of the memory in the system). It's a bit of a corner case: I think that you have to have a ton of wired memory, no swap and just the right amount of free memory(the fact that most of the unwired-but-allocated memory on that system was allocated to daemons that restarted themselves automatically probably didn't help the situation at all, as they kept running the system back to the edge). Anyway, given that I can't see any value to removing a page from a pmap just because we are deactivating it, and it seems to cause confusion and even less-than-ideal (and arguably incorrect) behaviour in certain corner cases, should it just be removed? Or is there some subtly to this that I'm missing?
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAFMmRNwe5MK5q46xsdRg=SE_7tuEneHv2Ad8zauSEDn3n9-eaQ>