Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 13 Mar 2006 11:48:06 -0800 (PST)
From:      Jon Dama <jd@ugcs.caltech.edu>
To:        Peter Jeremy <peterjeremy@optushome.com.au>
Cc:        Dmitry Pryanishnikov <dmitry@atlantis.dp.ua>, Kostik Belousov <kostikbel@gmail.com>, freebsd-stable@freebsd.org, Michael Proto <mike@jellydonut.org>
Subject:   Re: RELENG_4 on flash disk and swap
Message-ID:  <Pine.LNX.4.53.0603131119220.30166@hurl.ugcs.caltech.edu>
In-Reply-To: <20060310193248.GC688@turion.vk2pj.dyndns.org>
References:  <20060302181625.I3905@atlantis.atlantis.dp.ua> <76FAD2DB-CD18-42D4-95C8-F016CFB17B00@segpub.com.au> <20060303110936.R86586@atlantis.atlantis.dp.ua> <20060303185157.GB692@turion.vk2pj.dyndns.org> <20060304001224.G356@atlantis.atlantis.dp.ua> <20060304065138.GD692@turion.vk2pj.dyndns.org> <20060310121758.S80837@atlantis.atlantis.dp.ua> <20060310123942.GI37572@deviant.kiev.zoral.com.ua> <20060310153737.X40396@atlantis.atlantis.dp.ua> <20060310193248.GC688@turion.vk2pj.dyndns.org>

next in thread | previous in thread | raw e-mail | index | archive | help

If you feel this situation is undesirable, the first thing to do is to put
together the patches necessary to allow the kernel to actually track how
much ram+swap might be needed to cover the address-space allocations
that have been granted.  This isn't trivial: just start thinking about
shared allocations, forking, copy-on-writem, etc.

In order to make this change "costless" I suspect you'll have to hide it
behind a kernel config option.  Maybe you'll bill it as mere
instrumentation.

Then worry about convincing people that overcommit shouldn't be the only
option.  But once you have your kernel config option to enable proper
accounting it should be a short-hop to making a sysctl that can disable
overcommit and enforce limits based on the previously mentioned
"accounting".

Most importantly though you won't need to convince anyone that the
default ought to be changed.

SIGDANGER has essentially been rejected universally by everyone but its
creators (IBM), and as it is unusual, don't expect anyone to write a
program that uses it.  Ditto for any solution that involves madvise or expecting
programs to prefault their pages.

Other suggestion: build a time machine to go back to 1990 and get early
(pages guaranteed) and late (overcommitted) allocation written into POSIX.

Somewhat accepted is to ensure allocations must be backed but to also
support a M_NORESERVE flag in mmap to permit overcomitted allocations.
Anyways, no matter what you must first give the kernel the necessary
accounting code.

For the record: I believe in overcommit, but I recognize that it violates
the semantics people were (foolishly) taught in school.

Also, when the system is page-starved it kills the largest consumer of
pages that has the same UID as the process that pushed the system over the
limit---not merely the largest consumer of pages.  So you see, running
critical services that carefully pre-allocate and fault their memory is
possible within the overcommit framework.

           Jon




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.LNX.4.53.0603131119220.30166>