From owner-svn-src-user@FreeBSD.ORG Mon Nov 12 10:49:08 2012 Return-Path: Delivered-To: svn-src-user@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id D56EF48F for ; Mon, 12 Nov 2012 10:49:08 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id 2B69B8FC15 for ; Mon, 12 Nov 2012 10:49:07 +0000 (UTC) Received: (qmail 95102 invoked from network); 12 Nov 2012 12:23:30 -0000 Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 12 Nov 2012 12:23:30 -0000 Message-ID: <50A0D420.4030106@freebsd.org> Date: Mon, 12 Nov 2012 11:49:04 +0100 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:16.0) Gecko/20121010 Thunderbird/16.0.1 MIME-Version: 1.0 To: Andre Oppermann Subject: Re: svn commit: r242910 - in user/andre/tcp_workqueue/sys: kern sys References: <201211120847.qAC8lEAM086331@svn.freebsd.org> In-Reply-To: <201211120847.qAC8lEAM086331@svn.freebsd.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: src-committers@freebsd.org, svn-src-user@freebsd.org X-BeenThere: svn-src-user@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "SVN commit messages for the experimental " user" src tree" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Nov 2012 10:49:09 -0000 On 12.11.2012 09:47, Andre Oppermann wrote: > Author: andre > Date: Mon Nov 12 08:47:13 2012 > New Revision: 242910 > URL: http://svnweb.freebsd.org/changeset/base/242910 > > Log: > Base the mbuf related limits on the available physical memory or > kernel memory, whichever is lower. The commit message is a bit terse so I'm going to explain in more detail: The overall mbuf related memory limit must be set so that mbufs (and clusters of various sizes) can't exhaust physical RAM or KVM. I've chosen a limit of half the physical RAM or KVM (whichever is lower) as the baseline. In any normal scenario we want to leave at least half of the physmem/kvm for other kernel functions and userspace to prevent it from swapping like hell. Via a tunable it can be upped to at most 3/4 of physmem/kvm. Out of the overall mbuf memory limit I've chosen 2K clusters, 4K (page size) clusters to get 1/4 each because these are the most heavily used mbuf sizes. 2K clusters are used for MTU 1500 ethernet inbound packets. 4K clusters are used whenever possible for sends on sockets and thus outbound packets. The larger cluster sizes of 9K and 16K are limited to 1/6 of the overall mbuf memory limit. Again, when jumbo MTU's are used these large clusters will end up only on the inbound path. They are not used on outbound, there it's still 4K. Yes, that will stay that way because otherwise we run into lots of complications in the stack. And it really isn't a problem, so don't make a scene. Previously the normal mbufs (256B) weren't limited at all. This is wrong as there are certain places in the kernel that on allocation failure of clusters try to piece together their packet from smaller mbufs. The mbuf limit is the number of all other mbuf sizes together plus some more to allow for standalone mbufs (ACK for example) and to send off a copy of a cluster. FYI: Every cluster eventually also has an mbuf associated with it. Unfortunately there isn't a way to set an overall limit for all mbuf memory together as UMA doesn't support such a limiting. Lets work out a few examples on sizing: 1GB KVM: 512MB limit for mbufs 419,430 mbufs 65,536 2K mbuf clusters 32,768 4K mbuf clusters 9,709 9K mbuf clusters 5,461 16K mbuf clusters 16GB RAM: 8GB limit for mbufs 33,554,432 mbufs 1,048,576 2K mbuf clusters 524,288 4K mbuf clusters 155,344 9K mbuf clusters 87,381 16K mbuf clusters These defaults should be sufficient for event the most demanding network loads. If you do run into these limits you probably know exactly what you are doing and you are expected to tune those values for your particular purpose. There is a side-issue with maxfiles as it relates to the maximum number of sockets that can be opened at the same time. With web servers and proxy caches of these days there may be some 100K or more sockets open. Hence I've divorced maxfiles from maxusers as well. There is a relationship of maxfiles with the callout callwheel though which has to be investigated some more to prevent ridiculous values from being chosen. -- Andre