From owner-freebsd-hackers Fri Jan 22 15:45:18 1999 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id PAA03192 for freebsd-hackers-outgoing; Fri, 22 Jan 1999 15:45:18 -0800 (PST) (envelope-from owner-freebsd-hackers@FreeBSD.ORG) Received: from smtp01.primenet.com (smtp01.primenet.com [206.165.6.131]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id PAA03160 for ; Fri, 22 Jan 1999 15:45:14 -0800 (PST) (envelope-from tlambert@usr09.primenet.com) Received: (from daemon@localhost) by smtp01.primenet.com (8.8.8/8.8.8) id QAA15051; Fri, 22 Jan 1999 16:45:02 -0700 (MST) Received: from usr09.primenet.com(206.165.6.209) via SMTP by smtp01.primenet.com, id smtpd014934; Fri Jan 22 16:44:47 1999 Received: (from tlambert@localhost) by usr09.primenet.com (8.8.5/8.8.5) id QAA12241; Fri, 22 Jan 1999 16:44:35 -0700 (MST) From: Terry Lambert Message-Id: <199901222344.QAA12241@usr09.primenet.com> Subject: Re: Error in vm_fault change To: dillon@apollo.backplane.com (Matthew Dillon) Date: Fri, 22 Jan 1999 23:44:35 +0000 (GMT) Cc: dyson@iquest.net, hackers@FreeBSD.ORG In-Reply-To: <199901220656.WAA48081@apollo.backplane.com> from "Matthew Dillon" at Jan 21, 99 10:56:49 pm X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG I think that both the approach that Matt has suggested and the current RSS code that John suggested as a replacement are both contrary to the goals the John and David have put forth. The bix issue here seems to be data cache thrashing. In a non-unified VM and buffer cache system, this was hard limited by the quota placed on the total in-use of physical memory by both the VM and buffer cache; frequently, this was implemented using watermarking, and the high watermark on each, when agregated, almost always exceeded the total available physical memory. The question is really one of "how do I make processes behave", not one of "how do I punish/reward badly/well behaved processes". I think the RSS fix is needlessly complex. I offer a suggestion that is vastly simpler, amenable to policy exception via madvise, and otherwise altogether more in line with a real soloution to the problem. What I suggest is that vnodes with more than a certain number of pages associated with them be forced to steal pages from their own usage, instead of obtaining them from the system page pool. The limitation should be based on the available memory divided by the number of active vnodes, plus some additional fudge factors. In this way, vnodes do not compete with each other for real resources, except in low memory conditions. In general, when we talk about badly behaved processes, we are talking about processes with large working sets that are directly mapped to vnode backing objects. In effect, the suggested soloution is a soft working set quota that attempts to minimize swap usage under normal circumstances. The fudge factors are to account for non-vnode page usage, and for the relative average fill of a vnode associated VM object's page list. This soloution was tried, and worked very well, in a UnixWare 2.0 kernel, in an attempt to resolve the "bad" ld behaviour of mapping the object files to be linked, and then randomly accessing the symbol tables, which effectively thrashed everything but clean object file pages out of cache at the expense of backing store for things like the mouse management code in the X server. The end result was a big disconnect in the "move mouse, wiggle cursor" feedback that a human needs to be confident that the system is working (and, in effect, made X an impossible to use developement environment on UnixWare). Note that this is unrelated to the actual "fix" that was eventually part of the UnixWare release (a "fixed" scheduling class that gives the X server a certain percentage of the CPU to let it thrash its own pages back in). Obviously, this won't resolve the "huge number of files in one badly behaved process" problem. A more general soloution would require process-based limits instead, and would need to consider process "vesting" in files (e.g., one file is opened by two processes, for example, libc.so; do you thrash pages of libc.so out of core because one of the processes is an idiot, and therefore has exceeded its per process working set quota with a bunc of other files? No...). The simplest "best case" that can be arrived at with a small amount of code is to set per-vnode limits, and then allow certain madvise (like the one used by ld.so) parameters to ignore the limits on a per-vnode basis. Hell, you could do it with a chflags on /usr/lib/*.so, if you wanted to approach it that way... Anyway, that's my 2 cents; back on my head... er, back to work. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message