From owner-freebsd-stable@FreeBSD.ORG Mon Dec 20 23:14:33 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 73AC81065670 for ; Mon, 20 Dec 2010 23:14:33 +0000 (UTC) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.freebsd.org (Postfix) with ESMTP id 162BF8FC0A for ; Mon, 20 Dec 2010 23:14:32 +0000 (UTC) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.14.4/8.14.1) with ESMTP id oBKN0NCN087628 for ; Mon, 20 Dec 2010 15:00:23 -0800 (PST) Received: (from dillon@localhost) by apollo.backplane.com (8.14.4/8.13.4/Submit) id oBKN0Nrx087627; Mon, 20 Dec 2010 15:00:23 -0800 (PST) Date: Mon, 20 Dec 2010 15:00:23 -0800 (PST) From: Matthew Dillon Message-Id: <201012202300.oBKN0Nrx087627@apollo.backplane.com> To: freebsd-stable@freebsd.org Subject: Re: vm.swap_reserved toooooo large? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Dec 2010 23:14:33 -0000 One of the problems with resource management in general is that it has traditionally been per-process, and due to the multiplicative effect (e.g. max-descriptors * limit-per-descriptor), per-process resources cannot be set such that any given user is prevented from DDOSing the system without making them so low that normal programs begin to fail for no good reason. Hence the advent of per-user and other more suitable resource limits, nominally set via sysctl. Even with these, however, it is virtually impossible to protect against a user DDOS. The kernel itself has resource limitations which are fairly easy to blow out... mbufs are usually the easiest to blow up, followed by pipe KVM memory. Filesytems can be blown up too by creating sparse files and mmap()ing them (thus circumventing normal overcommit limitations). Paging just itself, without running the system out of VM, can destroy a machine's performance and be just as effective a DDOS attack as resource starvation is. Virtual memory resources are similarly impacted. Overcommit limiting features have as many downsides as they have upsides. Its an endless argument but I've seen systems blow up with overcommit limits set even more readily than with no (overcommit) limits set. Theoretically overcommit limits make the system more manageable but in actual practice they only work when the application base is written with such limits in mind (and most are not). So for a general purpose unix environment putting limits on overcommit tends to create headaches. To be sure, in a turn-key environment overcommit serves a very important function. In a non-turn-key environment however it will likely create more problems than it will solve. The only way to realistically deal with the mess, if it is important to you, is to partition the systems' real resources and run stuff inside their own virtualized kernels each of which does its own independent resource management and whos I/O on the real system can be well-controlled as an aggregate. Alternatively, creating very large swap partitions work very well to mitigate the more common problems. Swap itself is changing its function. Swap is no longer just used for real memory overcommit (in fact, real memory overcommit is quite uncommon these days). It is now also used for things like tmpfs, temporary virtual disks, meta-data caching, and so forth. These days the minimum amount of swap I configure is 32G and as efficient swap storage gets more cost effective (e.g. SSDs), significantly more. 70G, 110G, etc. It becomes more a matter of being able to detact and act on the DDOS/resource issue BEFORE it gets to the point of killing important processes (definition: whatever is important for the functioning of that particular machine, user-run or root-run), and less a matter of hoping the system will do the right thing when the resource limit is actually reached. Having a lot of swap gives you more time to act. -Matt