From owner-freebsd-hackers  Tue Jul 13 16: 2:11 1999
Delivered-To: freebsd-hackers@freebsd.org
Received: from mail.netbsd.org (redmail.netbsd.org [155.53.200.193])
	by hub.freebsd.org (Postfix) with SMTP id 45356152CC
	for <freebsd-hackers@FreeBSD.ORG>; Tue, 13 Jul 1999 16:02:00 -0700 (PDT)
	(envelope-from cgd@netbsd.org)
Received: (qmail 17606 invoked by uid 1000); 13 Jul 1999 23:01:48 -0000
To: Matthew Dillon <dillon@apollo.backplane.com>
Cc: Jason Thorpe <thorpej@nas.nasa.gov>,
	"Brian F. Feldman" <green@FreeBSD.ORG>,
	Noriyuki Soda <soda@sra.co.jp>, bright@rush.net, dcs@newsguy.com,
	freebsd-hackers@FreeBSD.ORG, jon@oaktree.co.uk,
	tech-userlevel@netbsd.org
Subject: Re: Replacement for grep(1) (part 2)
References: <199907132110.OAA23817@lestat.nas.nasa.gov> <199907132114.OAA80781@apollo.backplane.com> <877lo4z0pe.fsf@redmail.redback.com> <199907132212.PAA81234@apollo.backplane.com>
From: cgd@netbsd.org (Chris G. Demetriou)
Date: 13 Jul 1999 16:01:47 -0700
In-Reply-To: Matthew Dillon's message of Tue, 13 Jul 1999 15:12:14 -0700 (PDT)
Message-ID: <871zecyx0k.fsf@redmail.redback.com>
Lines: 211
X-Mailer: Gnus v5.5/Emacs 20.2
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Matthew Dillon <dillon@apollo.backplane.com> writes:
>     The text size of a program is irrelevant, because swap is never
>     allocated for it.  The data and BSS are only relevant when they
>     are modified.
> 
>     The only thing swap is ever used for is the dynamic allocation of memory.
>     There are three ways to do it:  sbrk(), mmap(... MAP_ANON), or
>     mmap(... MAP_PRIVATE).

yup, almost: not all MAP_PRIVATE mappings need backing store, only
MAP_PRIVATE and writeable mappings.  (MAP_PRIVATE does _not_ guarantee
that you won't see modifications made via other MAP_SHARED mappings.)


>     Dynamic allocation of memory can occur under a huge number of 
>     conditions.  The actual physical allocation of dynamic memory - what is
>     known as a copy-on-write - cannot be predicted from the potential
>     allocation of memory.  The most obvious example of this is a fork().

yup.


>     There is a lot of hidden 'potential' VM that you haven't considered.
>     For example, if the resource limit for a process's stack is 8MB, then
>     the process can potentially allocate 8MB of stack even though it may
>     actually only allocate 32K of stack.

Yes, this is a good example.  In general, however, i believe it's safe
to say that it's eaiser to constrain stack usage than heap usage.
Many large consumers of heap usage are bad style to begin with ('real'
embedded environments typically have very limited stacks), and
mechanisms such as alloca() can be frobbed to use heap allocations
(with some run-time cost).


> When a process forks, the child
>     process can potentially modify just about every single non-text page that
>     was owned by the parent process, causing a copy-on-write to occur.
>     The dynamic potential can run into the megabytes but most child processes
>     only modify a small percentage of those pages.

... and, well written applications which are just going to execve()
_should_ use vfork() for this case.  If they use fork(), they want the
ability to COW the entire address space, and should be charged for
that data.


> :Nowhere did I see what amounts to anything other than hand-waving
> :claims that you'll have to allocate much, much more backing store than
> :you currently need to, and claims that that's unacceptable for general
> :purpose computing environments.  If you have a more specific analysis
> :that you'd like me/us to read, please point us at it more specifically.
> 
>     You are welcome to bring up real-life situations as examples.  That
>     is what I do.  But look who is doing the hand-waving now?

Huh?  You've made the claim that non-overcommit is useless and should
not be implemented because it requires for normal workloads 8 or more
times the actual backing store usage.

You have yet to justify it.  What workload are you talking about?
What systems?  You start to allude to this later, but you simply say
"SGIs."


> :* while you certainly need to allocate more backing store than you
> :would with overcommit, it's _not_ ridiculously more for most
> :applications(+), and, finally,
> 
>     Based on what?  I am basing my information on the VM reservation made
>     by programs, and I am demonstrating specific points showing how those
>     reservations occur.  For example, the default 8MB resource limit for
>     the stack segment.

Actually, only now have you brought that up.  And, that's very system
dependent.  On NetBSD/i386 the default is 2MB, and, it's worth noting
that you only need to reserve as much as the current stack limit
allows (after that, you're going to get a signal anyway, and if more
reservations need to be done they can be done on a page-by-page basis
and if it fails you deliver a signal and if it still fails deliver a
nastier signal).

Stack limits are pretty much the one odd case (and they're already
handled oddly by the VM code.)


> :* even if you are not willing to pay that price, there _are_ people
> :who are quite willing to pay that price to get the benefits that they
> :see (whether it's a matter of perception or not, from their
> :perspective they may as well be real) of such a scheme.
> 
>     Quite true.  In the embedded world we preallocate memory and shape
>     the programs to what is available in the system.  But if we run out
>     of memory we usually panic and reboot - because the code is designed
>     to NOT run out of memory and thus running out of memory is a catastrophic
>     situation.

There's a whole spectrum of embedded devices, and applications that
run on them.  That definition works for some of them, but definitely
not all.

A controller in my car's engine behaves as you describe.  That's one
type of embedded system, and can have very well defined memory
requirements.  There, if you're out of memory, you have a Problem.

A web server appliance is another type of embedded system.  Its memory
requirements are quantifiable, but there's much more parameterization
necessation.  (# of clients being served vs. # of management sessions
open vs. background tasks that may need to be run periodically, for
instance.)  Basically, for this type of thing you need various types
of memory reservation and accounting for the various functions, and
indeed, it's not best done (entirely) with an no-overcommit-based or
resource-limit-based strategy.  However, for 'background tasks' that
might involve user-supplied code or that might have highly variable
memory requirements, devoting some set of the memory to be managed by
a no-overcommit or resource-limit strategy may be the right thing.
(and i'd probably prefer the latter.)

However, a web browser on the front of your microwave or on a handheld
tablet is also a type of embedded system, it's an 'appliance.'  The
user should never have to worry about it rebooting, or hanging, or
killing the program that they're looking at.  (It may be reasonable to
tell them that they're doing too much, and that they therefore need to
shut something done, or prevent them from starting up new tasks when
they're too close to a system limit.)  In this type of world, memory
needs are just too varied to control via blanket resource limits.
Further, an applications needs may be sufficiently variable that they
can't even be reasonably precomputed.  You're faced with two choices:
punt, and don't let the user exploit their system to its potential, or
un-"embed" parts of it an expose memory allocation limits to the user
(and do admission control based on them), like e.g. macintoshes used
to do.


In the latter two cases, no-overcommit and the proper "committed
memory" accounting that comes with it is a very useful, perhaps
critical, tool.  (Note that the difference between a no-overcommit
policy and between correct tracking of committed memory in a proper
design is simply turning a boolean switch: you should always be
tracking committed memory, and if you turn the switch you
enable/disable overcommit.)

(From personal experience, I've built 'embedded' systems in the latter
two categories, non in the former.  Ones like the latter have
_seriously_ suffered from an inability to disallow overcommit.)


>     It is not appropriate to test every single allocation for a failure...
>     that results in bulky code and provides no real benefit since the
>     code is already designed to not overcommit the embedded system's memory.
>     If it does, you reboot.  The most critical embedded systems, such as
>     those used in satellites, avoid dynamic allocation as much as possible
>     in order to guarentee that no memory failures occur.  This works under
>     UNIX as well.  If you run your programs and touch all your potentially
>     modifiable pages you have effectively reserved the memory.

Yup.  however, even in thise case, having to manually touch the pages
is a wasteful detail that the programmer should not have to think
about.  resource reservation should take care of it for them.  8-)
(the "wastefulness" there: (1) the programmer has to think about
"effectively reserving" the memory, (2) the actual code to
"effectively reserve" the memory in the application, and (3) any side
effects the system may suffer because of this, e.g. immediate and
unnecessary paging of a whole bunch of data, if it's a
workstation/server system.)

Certainly, it's better for many applications to allocate all the
memory that they'll use in advance.  But that's orthogonal to the
issue of resource reservation, except inasmuch as kludges are
necessary to get proper resource reservations.  8-)

> :I would honestly love to know: what do you see huge numbers of
> :reserved pages being reserved for, if they're not actually being
> :committed, by 'average' UNIX applications (for any definition of
> :average that excludes applications which do memory based computation
> :on sparse dasta).
>
>     Stack, hidden VM objects after a multi-way fork, MAP_PRIVATE memory maps,
>     private memory pools (e.g. for web servers, java VM's, and so forth).

And of those, for properly written applications (i.e. those that use
vfork() correctly), the _only_ actual difference between the pages
that are considered committed and those which are actually committed
are:

	* stack.  Yes, this can be a significant, but manageable,
	  source of difference.

	* writable private memory which hasn't yet been written
	  (e.g. data, bss) which compared to many/most applications
	  dynamic allocations is insignificant.

	* intentional sparse data allocation, which is the reason that
	  any general-purpose OS which supports a no-overcommit policy
	  _must_ be capable of supporting an overcommit-allowed policy
	  for some peoples' usage.

Note that I've never said that there aren't situations in which
allowing overcommit is correct.  In some situations, it's necessary.

However, you've claimed that nobody should _ever_ have the ability to
prevent overcommit, and that's simply unreasonable in certain
situations.


cgd
-- 
Chris Demetriou - cgd@netbsd.org - http://www.netbsd.org/People/Pages/cgd.html
Disclaimer: Not speaking for NetBSD, just expressing my own opinion.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message