From owner-svn-src-all@FreeBSD.ORG Mon Jan 14 15:00:54 2013 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 73183926 for ; Mon, 14 Jan 2013 15:00:54 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id C0F4EE02 for ; Mon, 14 Jan 2013 15:00:53 +0000 (UTC) Received: (qmail 61599 invoked from network); 14 Jan 2013 16:23:47 -0000 Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 14 Jan 2013 16:23:47 -0000 Message-ID: <50F41DA3.8060300@freebsd.org> Date: Mon, 14 Jan 2013 16:00:51 +0100 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/17.0 Thunderbird/17.0 MIME-Version: 1.0 To: Alan Cox Subject: Re: svn commit: r243631 - in head/sys: kern sys References: <201211272119.qARLJxXV061083@svn.freebsd.org> <50C1BC90.90106@freebsd.org> <50C25A27.4060007@bluezbox.com> <50C26331.6030504@freebsd.org> <50C26AE9.4020600@bluezbox.com> <50C3A3D3.9000804@freebsd.org> <50C3AF72.4010902@rice.edu> <330405A1-312A-45A5-BB86-4969478D8BBD@bluezbox.com> <50D03E83.8060908@rice.edu> <50DD081E.8000409@bluezbox.com> <50EB1841.5030006@bluezbox.com> <50F28806.10505@rice.edu> In-Reply-To: <50F28806.10505@rice.edu> Content-Type: multipart/mixed; boundary="------------060703050804030003000002" Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org, Oleksandr Tymoshenko X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Jan 2013 15:00:54 -0000 This is a multi-part message in MIME format. --------------060703050804030003000002 Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit On 13.01.2013 11:10, Alan Cox wrote: > On 01/07/2013 12:47, Oleksandr Tymoshenko wrote: >> On 12/27/2012 6:46 PM, Oleksandr Tymoshenko wrote: >>> On 12/18/2012 1:59 AM, Alan Cox wrote: >>>> On 12/17/2012 23:40, Oleksandr Tymoshenko wrote: >>>>> On 2012-12-08, at 1:21 PM, Alan Cox wrote: >>>> That makes sense. However, "virtual_avail" isn't the start of the >>>> kernel address space. The kernel map always starts at >>>> VM_MIN_KERNEL_ADDRESS. (See kmem_init().) "virtual_avail" represents >>>> the next unallocated virtual address in the kernel address space at an >>>> early point in initialization. "virtual_avail" and "virtual_end" >>>> aren't >>>> used after that, or outside the VM system. Please use >>>> vm_map_min(kernel_map) and vm_map_max(kernel_map) instead. >>> >>> I checked: kernel_map is not available (NULL) at this point. So we >>> can't use it to >>> determine real KVA size. Closest thing we can get is >>> virtual_avail/virtual_end pair. >>> >>> Andre, could you approve attached patch for commit or suggest better >>> solution? >> >> Any update on this one? Can I proceed with commit? >> > > Yes, I've now spent a little bit of time looking at this, and I don't > see why these calculations and tunable_mbinit() need to be performed > before the kernel map is initialized. > > Let me summarize what I found: > > 1. The function tunable_mbinit() now has a dependency on the global > variable maxmbufmem. tunable_mbinit() is executed under > SI_SUB_TUNABLES. tunable_mbinit() defines the global variable > nmbclusters. The statements made in the comment at the head of > tunable_mbinit() all appear to be false: > > /* > * tunable_mbinit() has to be run before init_maxsockets() thus > * the SYSINIT order below is SI_ORDER_MIDDLE while init_maxsockets() > * runs at SI_ORDER_ANY. > * > * NB: This has to be done before VM init. > */ > > I don't see anything in init_maxsockets() that depends on > tunable_mbinit(). Moreover, the statement about "VM init" is only > correct if you regard the initialization of the kernel's malloc as "VM > init". This seems to be historic cruft. The dependency on maxsockets was removed recently with the autotuning improvements. A patch moving the maxmbufmem calculation into tunable_mbinit() and changing it to SI_SUB_KMEM which comes after the VM initialization is attached. > 2. The function kmeminit() in kern/kern_malloc.c has a dependency on the > global variable nmbclusters. kmeminit() is executed under SI_SUB_KMEM, > which comes after the initialization of the virtual memory system, > including the kernel map. The use of nmbclusters in kmeminit seems to be bogus. I think it comes from the times when the mbuf allocator was directly layered on top of the VM, that is before UMA. kmeminit() should not use nmbclusters. The computations done in kmeminit() do not make a whole lot of sense to me. But I'm no expert in that area. > 3. The function vm_ksubmap_init() has a dependency on the global > variable maxpipekva. vm_ksubmap_init() is executed under SI_SUB_CPU, > which comes after SI_SUB_KMEM. > > Am I missing anything? > > I'm attaching a patch that defers the calculation of maxpipekva until we > actually need it in vm_ksubmap_init(). Any comments on this patch are > welcome. Looks good to me. Perhaps the whole calculation and setup of the pipe_map could be moved to kern/sys_pipe.c:pipeinit() to have it all together. -- Andre --------------060703050804030003000002 Content-Type: text/plain; charset=windows-1252; name="maxmbufmem.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="maxmbufmem.diff" Index: sys/mbuf.h =================================================================== --- sys/mbuf.h (revision 245423) +++ sys/mbuf.h (working copy) @@ -384,7 +384,6 @@ * * The rest of it is defined in kern/kern_mbuf.c */ -extern quad_t maxmbufmem; extern uma_zone_t zone_mbuf; extern uma_zone_t zone_clust; extern uma_zone_t zone_pack; Index: kern/kern_mbuf.c =================================================================== --- kern/kern_mbuf.c (revision 245423) +++ kern/kern_mbuf.c (working copy) @@ -47,6 +47,7 @@ #include #include #include +#include #include #include #include @@ -104,16 +105,25 @@ struct mbstat mbstat; /* - * tunable_mbinit() has to be run before init_maxsockets() thus - * the SYSINIT order below is SI_ORDER_MIDDLE while init_maxsockets() - * runs at SI_ORDER_ANY. - * - * NB: This has to be done before VM init. + * tunable_mbinit() has to be run before any mbuf allocations are done. */ static void tunable_mbinit(void *dummy) { + quad_t realmem, maxmbufmem; + /* + * The default limit for all mbuf related memory is 1/2 of all + * available kernel memory (physical or kmem). + * At most it can be 3/4 of available kernel memory. + */ + realmem = qmin((quad_t)physmem * PAGE_SIZE, + vm_map_max(kernel_map) - vm_map_min(kernel_map)); + maxmbufmem = realmem / 2; + TUNABLE_QUAD_FETCH("kern.maxmbufmem", &maxmbufmem); + if (maxmbufmem > realmem / 4 * 3) + maxmbufmem = realmem / 4 * 3; + TUNABLE_INT_FETCH("kern.ipc.nmbclusters", &nmbclusters); if (nmbclusters == 0) nmbclusters = maxmbufmem / MCLBYTES / 4; @@ -139,7 +149,7 @@ nmbufs = lmax(maxmbufmem / MSIZE / 5, nmbclusters + nmbjumbop + nmbjumbo9 + nmbjumbo16); } -SYSINIT(tunable_mbinit, SI_SUB_TUNABLES, SI_ORDER_MIDDLE, tunable_mbinit, NULL); +SYSINIT(tunable_mbinit, SI_SUB_KMEM, SI_ORDER_MIDDLE, tunable_mbinit, NULL); static int sysctl_nmbclusters(SYSCTL_HANDLER_ARGS) @@ -279,16 +289,14 @@ static void mb_zfini_pack(void *, int); static void mb_reclaim(void *); -static void mbuf_init(void *); static void *mbuf_jumbo_alloc(uma_zone_t, int, uint8_t *, int); -/* Ensure that MSIZE must be a power of 2. */ +/* Ensure that MSIZE is a power of 2. */ CTASSERT((((MSIZE - 1) ^ MSIZE) + 1) >> 1 == MSIZE); /* * Initialize FreeBSD Network buffer allocation. */ -SYSINIT(mbuf, SI_SUB_MBUF, SI_ORDER_FIRST, mbuf_init, NULL); static void mbuf_init(void *dummy) { @@ -396,6 +404,7 @@ mbstat.sf_iocnt = 0; mbstat.sf_allocwait = mbstat.sf_allocfail = 0; } +SYSINIT(mbuf, SI_SUB_MBUF, SI_ORDER_FIRST, mbuf_init, NULL); /* * UMA backend page allocator for the jumbo frame zones. Index: kern/subr_param.c =================================================================== --- kern/subr_param.c (revision 245423) +++ kern/subr_param.c (working copy) @@ -93,7 +93,6 @@ int nbuf; int ngroups_max; /* max # groups per process */ int nswbuf; -quad_t maxmbufmem; /* max mbuf memory */ pid_t pid_max = PID_MAX; long maxswzone; /* max swmeta KVA storage */ long maxbcache; /* max buffer cache KVA storage */ @@ -272,7 +271,6 @@ void init_param2(long physpages) { - quad_t realmem; /* Base parameters */ maxusers = MAXUSERS; @@ -329,18 +327,6 @@ TUNABLE_INT_FETCH("kern.ncallout", &ncallout); /* - * The default limit for all mbuf related memory is 1/2 of all - * available kernel memory (physical or kmem). - * At most it can be 3/4 of available kernel memory. - */ - realmem = qmin((quad_t)physpages * PAGE_SIZE, - VM_MAX_KERNEL_ADDRESS - VM_MIN_KERNEL_ADDRESS); - maxmbufmem = realmem / 2; - TUNABLE_QUAD_FETCH("kern.maxmbufmem", &maxmbufmem); - if (maxmbufmem > (realmem / 4) * 3) - maxmbufmem = (realmem / 4) * 3; - - /* * The default for maxpipekva is min(1/64 of the kernel address space, * max(1/64 of main memory, 512KB)). See sys_pipe.c for more details. */ --------------060703050804030003000002--