From owner-freebsd-hackers  Fri Jan 22 15:45:18 1999
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id PAA03192
          for freebsd-hackers-outgoing; Fri, 22 Jan 1999 15:45:18 -0800 (PST)
          (envelope-from owner-freebsd-hackers@FreeBSD.ORG)
Received: from smtp01.primenet.com (smtp01.primenet.com [206.165.6.131])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id PAA03160
          for <hackers@FreeBSD.ORG>; Fri, 22 Jan 1999 15:45:14 -0800 (PST)
          (envelope-from tlambert@usr09.primenet.com)
Received: (from daemon@localhost)
	by smtp01.primenet.com (8.8.8/8.8.8) id QAA15051;
	Fri, 22 Jan 1999 16:45:02 -0700 (MST)
Received: from usr09.primenet.com(206.165.6.209)
 via SMTP by smtp01.primenet.com, id smtpd014934; Fri Jan 22 16:44:47 1999
Received: (from tlambert@localhost)
	by usr09.primenet.com (8.8.5/8.8.5) id QAA12241;
	Fri, 22 Jan 1999 16:44:35 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <199901222344.QAA12241@usr09.primenet.com>
Subject: Re: Error in vm_fault change
To: dillon@apollo.backplane.com (Matthew Dillon)
Date: Fri, 22 Jan 1999 23:44:35 +0000 (GMT)
Cc: dyson@iquest.net, hackers@FreeBSD.ORG
In-Reply-To: <199901220656.WAA48081@apollo.backplane.com> from "Matthew Dillon" at Jan 21, 99 10:56:49 pm
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


I think that both the approach that Matt has suggested and the
current RSS code that John suggested as a replacement are both
contrary to the goals the John and David have put forth.

The bix issue here seems to be data cache thrashing.

In a non-unified VM and buffer cache system, this was hard
limited by the quota placed on the total in-use of physical
memory by both the VM and buffer cache; frequently, this was
implemented using watermarking, and the high watermark on
each, when agregated, almost always exceeded the total available
physical memory.

The question is really one of "how do I make processes behave",
not one of "how do I punish/reward badly/well behaved processes".


I think the RSS fix is needlessly complex.  I offer a suggestion
that is vastly simpler, amenable to policy exception via madvise,
and otherwise altogether more in line with a real soloution to the
problem.


What I suggest is that vnodes with more than a certain number of
pages associated with them be forced to steal pages from their
own usage, instead of obtaining them from the system page pool.

The limitation should be based on the available memory divided by
the number of active vnodes, plus some additional fudge factors.

In this way, vnodes do not compete with each other for real
resources, except in low memory conditions.

In general, when we talk about badly behaved processes, we are
talking about processes with large working sets that are directly
mapped to vnode backing objects.

In effect, the suggested soloution is a soft working set quota that
attempts to minimize swap usage under normal circumstances.  The
fudge factors are to account for non-vnode page usage, and for the
relative average fill of a vnode associated VM object's page list.

This soloution was tried, and worked very well, in a UnixWare 2.0
kernel, in an attempt to resolve the "bad" ld behaviour of mapping
the object files to be linked, and then randomly accessing the symbol
tables, which effectively thrashed everything but clean object file
pages out of cache at the expense of backing store for things like
the mouse management code in the X server.  The end result was a big
disconnect in the "move mouse, wiggle cursor" feedback that a human
needs to be confident that the system is working (and, in effect,
made X an impossible to use developement environment on UnixWare).
Note that this is unrelated to the actual "fix" that was eventually
part of the UnixWare release (a "fixed" scheduling class that gives
the X server a certain percentage of the CPU to let it thrash its own
pages back in).


Obviously, this won't resolve the "huge number of files in one badly
behaved process" problem.  A more general soloution would require
process-based limits instead, and would need to consider process
"vesting" in files (e.g., one file is opened by two processes, for
example, libc.so; do you thrash pages of libc.so out of core because
one of the processes is an idiot, and therefore has exceeded its per
process working set quota with a bunc of other files?  No...).

The simplest "best case" that can be arrived at with a small amount
of code is to set per-vnode limits, and then allow certain madvise
(like the one used by ld.so) parameters to ignore the limits on a
per-vnode basis.  Hell, you could do it with a chflags on /usr/lib/*.so,
if you wanted to approach it that way...


Anyway, that's my 2 cents; back on my head... er, back to work.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message