From owner-freebsd-stable@FreeBSD.ORG  Mon Dec 20 23:14:33 2010
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 73AC81065670
	for <freebsd-stable@freebsd.org>; Mon, 20 Dec 2010 23:14:33 +0000 (UTC)
	(envelope-from dillon@apollo.backplane.com)
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by mx1.freebsd.org (Postfix) with ESMTP id 162BF8FC0A
	for <freebsd-stable@freebsd.org>; Mon, 20 Dec 2010 23:14:32 +0000 (UTC)
Received: from apollo.backplane.com (localhost [127.0.0.1])
	by apollo.backplane.com (8.14.4/8.14.1) with ESMTP id oBKN0NCN087628
	for <freebsd-stable@freebsd.org>; Mon, 20 Dec 2010 15:00:23 -0800 (PST)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.14.4/8.13.4/Submit) id oBKN0Nrx087627;
	Mon, 20 Dec 2010 15:00:23 -0800 (PST)
Date: Mon, 20 Dec 2010 15:00:23 -0800 (PST)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <201012202300.oBKN0Nrx087627@apollo.backplane.com>
To: freebsd-stable@freebsd.org
Subject: Re: vm.swap_reserved toooooo large?
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 20 Dec 2010 23:14:33 -0000

    One of the problems with resource management in general is
    that it has traditionally been per-process, and due to the
    multiplicative effect (e.g. max-descriptors * limit-per-descriptor),
    per-process resources cannot be set such that any given user is
    prevented from DDOSing the system without making them so low that
    normal programs begin to fail for no good reason.

    Hence the advent of per-user and other more suitable resource
    limits, nominally set via sysctl.  Even with these, however,
    it is virtually impossible to protect against a user DDOS.
    The kernel itself has resource limitations which are fairly easy
    to blow out... mbufs are usually the easiest to blow up, followed
    by pipe KVM memory.  Filesytems can be blown up too by creating
    sparse files and mmap()ing them (thus circumventing normal overcommit
    limitations).

    Paging just itself, without running the system out of VM, can destroy
    a machine's performance and be just as effective a DDOS attack as
    resource starvation is.

    Virtual memory resources are similarly impacted.  Overcommit limiting
    features have as many downsides as they have upsides.  Its an endless
    argument but I've seen systems blow up with overcommit limits set even
    more readily than with no (overcommit) limits set.  Theoretically
    overcommit limits make the system more manageable but in actual practice
    they only work when the application base is written with such limits
    in mind (and most are not).  So for a general purpose unix environment
    putting limits on overcommit tends to create headaches.  To be sure, in
    a turn-key environment overcommit serves a very important function.  In
    a non-turn-key environment however it will likely create more problems
    than it will solve.

    The only way to realistically deal with the mess, if it is important
    to you, is to partition the systems' real resources and run stuff
    inside their own virtualized kernels each of which does its own
    independent resource management and whos I/O on the real system can
    be well-controlled as an aggregate.

    Alternatively, creating very large swap partitions work very well to
    mitigate the more common problems.  Swap itself is changing its function.
    Swap is no longer just used for real memory overcommit (in fact,
    real memory overcommit is quite uncommon these days).  It is now also
    used for things like tmpfs, temporary virtual disks, meta-data
    caching, and so forth.  These days the minimum amount of swap I
    configure is 32G and as efficient swap storage gets more cost effective
    (e.g. SSDs), significantly more.  70G, 110G, etc.

    It becomes more a matter of being able to detact and act on the
    DDOS/resource issue BEFORE it gets to the point of killing important
    processes (definition: whatever is important for the functioning of
    that particular machine, user-run or root-run), and less a matter of
    hoping the system will do the right thing when the resource limit is
    actually reached.  Having a lot of swap gives you more time to act.

						-Matt