From owner-freebsd-hackers Fri Jan 22 00:31:54 1999 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id AAA08847 for freebsd-hackers-outgoing; Fri, 22 Jan 1999 00:31:54 -0800 (PST) (envelope-from owner-freebsd-hackers@FreeBSD.ORG) Received: from iquest3.iquest.net (iquest3.iquest.net [209.43.20.203]) by hub.freebsd.org (8.8.8/8.8.8) with SMTP id AAA08839 for ; Fri, 22 Jan 1999 00:31:51 -0800 (PST) (envelope-from toor@y.dyson.net) Received: (qmail 373 invoked from network); 22 Jan 1999 08:31:37 -0000 Received: from dyson.iquest.net (HELO y.dyson.net) (198.70.144.127) by iquest3.iquest.net with SMTP; 22 Jan 1999 08:31:37 -0000 Received: (from root@localhost) by y.dyson.net (8.9.1/8.9.1) id DAA00487; Fri, 22 Jan 1999 03:31:37 -0500 (EST) Message-Id: <199901220831.DAA00487@y.dyson.net> Subject: Re: Error in vm_fault change In-Reply-To: <199901220656.WAA48081@apollo.backplane.com> from Matthew Dillon at "Jan 21, 99 10:56:49 pm" To: dillon@apollo.backplane.com (Matthew Dillon) Date: Fri, 22 Jan 1999 03:31:37 -0500 (EST) Cc: dyson@iquest.net, hackers@FreeBSD.ORG From: "John S. Dyson" Reply-To: dyson@iquest.net X-Mailer: ELM [version 2.4ME+ PL38 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Matthew Dillon said: > > The problem is that in a heavy paging situation, which can be > brought about by a single memory-hogging process, OTHER > process's pages can wind up in the cache very easily. > There are other mechanisms that really do work (but are soft) in severe cases by setting the memory limit. The other purpose of that wakeup is to tell the pageout daemon that memory is low. If the other process's pages are placed onto the cache queue, then you have a severe thrashing situation. In that case, by eliminating this code, you are pessimizing the normal case of slight amount of paging, and making it work better for thrashing by a hoggy process. However, thrashing by a hoggy process should be managed by limits. :-(. You are breaking any notion of the pageout daemon keeping stats in normal low memory conditions, but thrashing is hopeless anyway without limits (am I repeating myself?) > > The memory-hogging process, on the otherhand, tends to be > allocating new pages or touching very recently allocated > pages. This code does nothing to block the memory hogging > process because those pages are not in the cache. > In that case, then it won't help, but then the soft memory limit code does (or perhaps a new set of hard memory limit code.) In severe thrashing conditions, then the argument that you make is silly anyway -- you do want to deactivate the hoggy processes pages then. In normal paging conditions, the simple block of the process is fine. Again, it seems that you might be testing in severe conditions, and applying a fix that breaks something for less severe conditions. (If you didn't play with the memory limits mechanisms, then any change that you have made like this is premature!!!) I think that you'll find that hard limits will kill system performance, but the soft limit code does work too softly. Maybe a scheme of high and low water mark will work. You really do NOT want to apply too many wierd heuristics, because they quickly become policy (and bad policy in some cases.) Given that, all you can do is to make the code work well for total throughput, and then add the biases with other mechanisms. You are breaking the "total throughput" situation, when you should be applying a specific bias to the bad process. Think global managment regarding the VM code, and let policy reside at another layer. If you try to put policy at the layer that you are trying to (by breaking an important wakeup), you'll just pessimize other applications. You want to push the knee out as far as you can, but total thrashing can be dealt with by limiting the errant process. When doing your test, set the soft limit to 1/5 or less of system memory. Memory will still get stolen from "nice" processes, but not as severely. The maximum memory per process shouldn't be set very high anyway -- it doesn't do anything until paging is needed anyway. > > Additionally, by sleeping, these process's pages then became eligable > for getting thrown back into the cache. > Only in severe thrashing situations, if you have pushed the pages through 10000's of pages of inactive and active queue then you are really running programs like testswap. That needs to be solved by a limits thing. The VM code is doing the correct thing by maximizing global performance. If you want to do policy things, then dont do it by breaking the rest of the system. > > When I removed this piece of code, the machine remained useable > with four memory hogging processes running. > QED. You are not running a real load. This is a perfect example of optimizing for the wrong thing. At this level, you want to maximize system throughput, even if the evil processes are the ones that provide that. Limiting the RSS of the evil processes should be done explicitly in a different way. (It could be called by vm_fault, but effectively, it would be at a different layer.) This is exactly the opposite kind of problem with Linux -- they have historically optimized for low level latency, and now, you are making the mistake that I made at first: optimize for a synthetically, rediculous high load. Think of the VM code as trying to maximize GLOBAL throughput. If you want to diddle things, that should be done differently (with an explicit RSS limiting code section.) My statement and assertion stands that by removing that code, then you are pessimizing the performance of the system under normal, light paging conditions. You have a good goal, but wrong fix. If you want to play with that, then you should consider trimming the RSS of the process while faulting the new page in. There is a soft trimming capability in the VM daemon, but it is slow. By having a higher hard limit, then the problem is solved (but as I implied above, might be tricky, due to the direct and sudden impact on the evil process performance.) This is a perfect example of working around an (artificial) problem with something that on the surface helps, but actually hurts in other ways. The negative feedback in the VM code, (which upon code inspection, might also be damaged by the way that the swap pager doesn't appear to block), is a way that the system can properly maximize global performance. Again, local peformance (truly penalizing really really bad processes) needs to be handled with explicit resource mgmt. Properly implemented, the explicit resource mgmt will leave more of the global resource available for the rest of the system, and again the system will adjust properly. By removing that piece of code, you will have removed a part of that proper adjustment mechanism. > This code is punishing the wrong processes. > I don't think so, but you have forgotten about the soft limit code. (BTW, I did have hard limit code, but it was so severe that the evil process paged too heavily.) Since we were mostly interested in global performance and not penalizing evil, errant processes that malloc and touch large numbers of pages, then I never installed the hard limit code. You really do have to wake-up the pagedaemon when the free memory is low, or you'll cause severe problems. If you have an evil process, then set the memory limit to 64M or less (much less than 1/2 of memory.) I just played with that earlier today, and it did work. -- John | Never try to teach a pig to sing, dyson@iquest.net | it makes one look stupid jdyson@nc.com | and it irritates the pig. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message