From owner-svn-src-all@FreeBSD.ORG Sun Jan 13 10:10:17 2013 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id A78C730B; Sun, 13 Jan 2013 10:10:17 +0000 (UTC) (envelope-from alc@rice.edu) Received: from mh11.mail.rice.edu (mh11.mail.rice.edu [128.42.199.30]) by mx1.freebsd.org (Postfix) with ESMTP id 7C37AE71; Sun, 13 Jan 2013 10:10:17 +0000 (UTC) Received: from mh11.mail.rice.edu (localhost.localdomain [127.0.0.1]) by mh11.mail.rice.edu (Postfix) with ESMTP id 81BE34C0245; Sun, 13 Jan 2013 04:10:16 -0600 (CST) Received: from mh11.mail.rice.edu (localhost.localdomain [127.0.0.1]) by mh11.mail.rice.edu (Postfix) with ESMTP id 62F924C026C; Sun, 13 Jan 2013 04:10:16 -0600 (CST) X-Virus-Scanned: by amavis-2.7.0 at mh11.mail.rice.edu, auth channel Received: from mh11.mail.rice.edu ([127.0.0.1]) by mh11.mail.rice.edu (mh11.mail.rice.edu [127.0.0.1]) (amavis, port 10026) with ESMTP id YV5ZnCWvqjQm; Sun, 13 Jan 2013 04:10:16 -0600 (CST) Received: from adsl-216-63-78-18.dsl.hstntx.swbell.net (adsl-216-63-78-18.dsl.hstntx.swbell.net [216.63.78.18]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) (Authenticated sender: alc) by mh11.mail.rice.edu (Postfix) with ESMTPSA id B26D84C0245; Sun, 13 Jan 2013 04:10:15 -0600 (CST) Message-ID: <50F28806.10505@rice.edu> Date: Sun, 13 Jan 2013 04:10:14 -0600 From: Alan Cox User-Agent: Mozilla/5.0 (X11; FreeBSD i386; rv:17.0) Gecko/17.0 Thunderbird/17.0 MIME-Version: 1.0 To: Oleksandr Tymoshenko Subject: Re: svn commit: r243631 - in head/sys: kern sys References: <201211272119.qARLJxXV061083@svn.freebsd.org> <50C1BC90.90106@freebsd.org> <50C25A27.4060007@bluezbox.com> <50C26331.6030504@freebsd.org> <50C26AE9.4020600@bluezbox.com> <50C3A3D3.9000804@freebsd.org> <50C3AF72.4010902@rice.edu> <330405A1-312A-45A5-BB86-4969478D8BBD@bluezbox.com> <50D03E83.8060908@rice.edu> <50DD081E.8000409@bluezbox.com> <50EB1841.5030006@bluezbox.com> In-Reply-To: <50EB1841.5030006@bluezbox.com> Content-Type: multipart/mixed; boundary="------------060309000103040501070209" Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org, Andre Oppermann X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Jan 2013 10:10:17 -0000 This is a multi-part message in MIME format. --------------060309000103040501070209 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit On 01/07/2013 12:47, Oleksandr Tymoshenko wrote: > On 12/27/2012 6:46 PM, Oleksandr Tymoshenko wrote: >> On 12/18/2012 1:59 AM, Alan Cox wrote: >>> On 12/17/2012 23:40, Oleksandr Tymoshenko wrote: >>>> On 2012-12-08, at 1:21 PM, Alan Cox wrote: >>>> >>>>> On 12/08/2012 14:32, Andre Oppermann wrote: >>>> .. skipped .. >>>> >>>>>> The trouble seems to come from NSFBUFS which is (512 + maxusers * >>>>>> 16) >>>>>> resulting in a kernel map of (512 + 400 * 16) * PAGE_SIZE = >>>>>> 27MB. This >>>>>> seem to be pushing it with the smaller ARM kmap layout. >>>>>> >>>>>> Does it boot and run when you set the tunable kern.ipc.nsfbufs=3500? >>>>>> >>>>>> ARM does have a direct map mode as well which doesn't require the >>>>>> allocation >>>>>> of sfbufs. I'm not sure which other problems that approach has. >>>>>> >>>>> Only a few (3?) platforms use it. It reduces the size of the user >>>>> address space, and translation between physical addresses and >>>>> direct map >>>>> addresses is not computationally trivial as it is on other >>>>> architectures, e.g., amd64, ia64. However, it does try to use large >>>>> page mappings. >>>>> >>>>> >>>>>> Hopefully alc@ (added to cc) can answer that and also why the >>>>>> kmap of >>>>>> 27MB >>>>>> manages to wrench the ARM kernel. >>>>>> >>>>> Arm does not define caps on either the buffer map size (param.h) >>>>> or the >>>>> kmem map size (vmparam.h). It would probably make sense to copy >>>>> these >>>>> definitions from i386. >>>> Adding caps didn't help. I did some digging and found out that >>>> although address range >>>> 0xc0000000 .. 0xffffffff is indeed valid for ARM in general actual >>>> KVA space varies for >>>> each specific hardware platform. This "real" KVA is defined by >>>> >>>> pair and ifI use them instead of >>> VM_MAX_KERNEL_ADDRESS> >>>> in init_param2 function my pandaboard successfully boots. Since >>>> former pair is used for defining >>>> kernel_map boundaries I believe it should be used for auto tuning >>>> as well. >>> >>> That makes sense. However, "virtual_avail" isn't the start of the >>> kernel address space. The kernel map always starts at >>> VM_MIN_KERNEL_ADDRESS. (See kmem_init().) "virtual_avail" represents >>> the next unallocated virtual address in the kernel address space at an >>> early point in initialization. "virtual_avail" and "virtual_end" >>> aren't >>> used after that, or outside the VM system. Please use >>> vm_map_min(kernel_map) and vm_map_max(kernel_map) instead. >> >> I checked: kernel_map is not available (NULL) at this point. So we >> can't use it to >> determine real KVA size. Closest thing we can get is >> virtual_avail/virtual_end pair. >> >> Andre, could you approve attached patch for commit or suggest better >> solution? > > Any update on this one? Can I proceed with commit? > Yes, I've now spent a little bit of time looking at this, and I don't see why these calculations and tunable_mbinit() need to be performed before the kernel map is initialized. Let me summarize what I found: 1. The function tunable_mbinit() now has a dependency on the global variable maxmbufmem. tunable_mbinit() is executed under SI_SUB_TUNABLES. tunable_mbinit() defines the global variable nmbclusters. The statements made in the comment at the head of tunable_mbinit() all appear to be false: /* * tunable_mbinit() has to be run before init_maxsockets() thus * the SYSINIT order below is SI_ORDER_MIDDLE while init_maxsockets() * runs at SI_ORDER_ANY. * * NB: This has to be done before VM init. */ I don't see anything in init_maxsockets() that depends on tunable_mbinit(). Moreover, the statement about "VM init" is only correct if you regard the initialization of the kernel's malloc as "VM init". 2. The function kmeminit() in kern/kern_malloc.c has a dependency on the global variable nmbclusters. kmeminit() is executed under SI_SUB_KMEM, which comes after the initialization of the virtual memory system, including the kernel map. 3. The function vm_ksubmap_init() has a dependency on the global variable maxpipekva. vm_ksubmap_init() is executed under SI_SUB_CPU, which comes after SI_SUB_KMEM. Am I missing anything? I'm attaching a patch that defers the calculation of maxpipekva until we actually need it in vm_ksubmap_init(). Any comments on this patch are welcome. Alan --------------060309000103040501070209 Content-Type: text/plain; charset=ISO-8859-15; name="maxpipekva2.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="maxpipekva2.patch" Index: kern/subr_param.c =================================================================== --- kern/subr_param.c (revision 245346) +++ kern/subr_param.c (working copy) @@ -97,7 +97,6 @@ quad_t maxmbufmem; /* max mbuf memory */ pid_t pid_max = PID_MAX; long maxswzone; /* max swmeta KVA storage */ long maxbcache; /* max buffer cache KVA storage */ -long maxpipekva; /* Limit on pipe KVA */ int vm_guest; /* Running as virtual machine guest? */ u_long maxtsiz; /* max text size */ u_long dfldsiz; /* initial data size limit */ @@ -339,18 +338,6 @@ init_param2(long physpages) TUNABLE_QUAD_FETCH("kern.maxmbufmem", &maxmbufmem); if (maxmbufmem > (realmem / 4) * 3) maxmbufmem = (realmem / 4) * 3; - - /* - * The default for maxpipekva is min(1/64 of the kernel address space, - * max(1/64 of main memory, 512KB)). See sys_pipe.c for more details. - */ - maxpipekva = (physpages / 64) * PAGE_SIZE; - TUNABLE_LONG_FETCH("kern.ipc.maxpipekva", &maxpipekva); - if (maxpipekva < 512 * 1024) - maxpipekva = 512 * 1024; - if (maxpipekva > (VM_MAX_KERNEL_ADDRESS - VM_MIN_KERNEL_ADDRESS) / 64) - maxpipekva = (VM_MAX_KERNEL_ADDRESS - VM_MIN_KERNEL_ADDRESS) / - 64; } /* Index: kern/sys_pipe.c =================================================================== --- kern/sys_pipe.c (revision 245346) +++ kern/sys_pipe.c (working copy) @@ -207,6 +207,8 @@ static int pipeallocfail; static int piperesizefail; static int piperesizeallowed = 1; +long maxpipekva; + SYSCTL_LONG(_kern_ipc, OID_AUTO, maxpipekva, CTLFLAG_RDTUN, &maxpipekva, 0, "Pipe KVA limit"); SYSCTL_LONG(_kern_ipc, OID_AUTO, pipekva, CTLFLAG_RD, Index: vm/vm_init.c =================================================================== --- vm/vm_init.c (revision 245346) +++ vm/vm_init.c (working copy) @@ -132,12 +132,14 @@ vm_ksubmap_init(struct kva_md_info *kmi) { vm_offset_t firstaddr; caddr_t v; - vm_size_t size = 0; + vm_size_t kernel_map_size, size = 0; long physmem_est; vm_offset_t minaddr; vm_offset_t maxaddr; vm_map_t clean_map; + kernel_map_size = kernel_map->max_offset - kernel_map->min_offset; + /* * Allocate space for system data structures. * The first available kernel virtual address is in "v". @@ -163,8 +165,7 @@ again: * Discount the physical memory larger than the size of kernel_map * to avoid eating up all of KVA space. */ - physmem_est = lmin(physmem, btoc(kernel_map->max_offset - - kernel_map->min_offset)); + physmem_est = lmin(physmem, btoc(kernel_map_size)); v = kern_vfs_bio_buffer_alloc(v, physmem_est); @@ -195,6 +196,18 @@ again: pager_map->system_map = 1; exec_map = kmem_suballoc(kernel_map, &minaddr, &maxaddr, exec_map_entries * round_page(PATH_MAX + ARG_MAX), FALSE); + + /* + * The default size for the pipe submap, "maxpipekva", is min(1/64 of + * the kernel virtual address space, max(1/64 of the physical memory, + * 512KB)). See sys_pipe.c for more details. + */ + maxpipekva = ctob(physmem / 64); + TUNABLE_LONG_FETCH("kern.ipc.maxpipekva", &maxpipekva); + if (maxpipekva < 512 * 1024) + maxpipekva = 512 * 1024; + if (maxpipekva > kernel_map_size / 64) + maxpipekva = kernel_map_size / 64; pipe_map = kmem_suballoc(kernel_map, &minaddr, &maxaddr, maxpipekva, FALSE); --------------060309000103040501070209--