From owner-freebsd-hackers@FreeBSD.ORG Fri Feb 24 21:39:13 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 155C4106566C for ; Fri, 24 Feb 2012 21:39:13 +0000 (UTC) (envelope-from rysto32@gmail.com) Received: from mail-we0-f182.google.com (mail-we0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id 995168FC14 for ; Fri, 24 Feb 2012 21:39:12 +0000 (UTC) Received: by werm13 with SMTP id m13so2558664wer.13 for ; Fri, 24 Feb 2012 13:39:11 -0800 (PST) Received-SPF: pass (google.com: domain of rysto32@gmail.com designates 10.180.92.165 as permitted sender) client-ip=10.180.92.165; Authentication-Results: mr.google.com; spf=pass (google.com: domain of rysto32@gmail.com designates 10.180.92.165 as permitted sender) smtp.mail=rysto32@gmail.com; dkim=pass header.i=rysto32@gmail.com Received: from mr.google.com ([10.180.92.165]) by 10.180.92.165 with SMTP id cn5mr8964594wib.2.1330119551721 (num_hops = 1); Fri, 24 Feb 2012 13:39:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; bh=NmRbrXULm1fHcnW5Lry7q9CM0pNXnyE40ogWhBP9lhI=; b=fwbkfi/13EN8rqHh41BUl5/JxPY2ajWtwgccyXHKBaibdD8791PGFyBWMocCgPMFCW R/jsBdqCy7RHWcGyFFHdSjz+yEMd9WailXtea+RS7FArU7H8CVfUZDqYm2xF4R8kQnqH lDCxfDVf/MWck0rV5tUYhb8Zow/wrJivZrPFA= MIME-Version: 1.0 Received: by 10.180.92.165 with SMTP id cn5mr7127127wib.2.1330119551671; Fri, 24 Feb 2012 13:39:11 -0800 (PST) Received: by 10.180.75.41 with HTTP; Fri, 24 Feb 2012 13:39:11 -0800 (PST) Date: Fri, 24 Feb 2012 16:39:11 -0500 Message-ID: From: Ryan Stone To: freebsd-hackers@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Subject: vm_pageout_page_stats() calling pmap_remove_all() on pages that it deactivates X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Feb 2012 21:39:13 -0000 Near the end of vm_pageout_page_stats() there is the following code: if (m->act_count == 0) { /* * We turn off page access, so that we have * more accurate RSS stats. We don't do this * in the normal page deactivation when the * system is loaded VM wise, because the * cost of the large number of page protect * operations would be higher than the value * of doing the operation. */ pmap_remove_all(m); vm_page_deactivate(m); } I question how useful it is to remove m from every pmap. The stated reasoning in the comment above is that it makes for more accurate RSS statistics. However, vm_pageout_page_stats() only does anything at all when there is a shortage of inactive+cache+free pages, so I find the assertion that this leads to a "more accurate" RSS accounting a bit specious when the rest of the VM subsystem isn't trying to provide accurate RSS stats at all. Besides, the page is still resident in memory if we've only deactivated it. This code seems to be conflating "resident set" with "working set". The page still being resident in memory is why I think removing the page from the pmap is the wrong thing to do here. The situation that lead me to looking at this code was pretty simple: I had a daemon running on a swapless system leaking memory. I was running a script that logged the output of top periodically. What I saw was the VSS and RSS of the daemon growing steadily over time until all of a sudden, its RSS dropped dramatically and the system's inactive page count dropped(this was vm_pageout_page_stats() kicking in due to the memory shortage). I was mislead into thinking that the daemon had just freed a lot of memory and that malloc had called madvise(..., MADV_FREE) to free the pages back to the kernel. Of course the newly deactivated pages could never be freed and I spent a day going in the entirely wrong direction before I figured out what vm_pageout_page_stats() was doing and stopped looking for bugs in madvise. In investigating this I did stumble upon one situation where removing the pages on deactivation lead to very bad behaviour. I had a system with a lot of wired memory and about 200MB free. I ran a test program to allocate nearly all of the free memory and then sit there sleeping, never touching it again. vm_pageout_page_stats() duly kicked in and deactivated the test program's memory, bringing its resident set down to a couple of KB. However that didn't actually succeed in freeing any memory, and so the next time I tried to ssh to it or something the OOM killer ended up having to be invoked. It went on a mass killing spree, but never even considered killing my test program because its RSS was so small(despite the fact that it was holding on to most of the memory in the system). It's a bit of a corner case: I think that you have to have a ton of wired memory, no swap and just the right amount of free memory(the fact that most of the unwired-but-allocated memory on that system was allocated to daemons that restarted themselves automatically probably didn't help the situation at all, as they kept running the system back to the edge). Anyway, given that I can't see any value to removing a page from a pmap just because we are deactivating it, and it seems to cause confusion and even less-than-ideal (and arguably incorrect) behaviour in certain corner cases, should it just be removed? Or is there some subtly to this that I'm missing?