Date: Sun, 13 Jan 2013 04:10:14 -0600 From: Alan Cox <alc@rice.edu> To: Oleksandr Tymoshenko <gonzo@bluezbox.com> Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org, Andre Oppermann <andre@freebsd.org> Subject: Re: svn commit: r243631 - in head/sys: kern sys Message-ID: <50F28806.10505@rice.edu> In-Reply-To: <50EB1841.5030006@bluezbox.com> References: <201211272119.qARLJxXV061083@svn.freebsd.org> <ABB3E29B-91F3-4C25-8FAB-869BBD7459E1@bluezbox.com> <50C1BC90.90106@freebsd.org> <50C25A27.4060007@bluezbox.com> <50C26331.6030504@freebsd.org> <50C26AE9.4020600@bluezbox.com> <50C3A3D3.9000804@freebsd.org> <50C3AF72.4010902@rice.edu> <330405A1-312A-45A5-BB86-4969478D8BBD@bluezbox.com> <50D03E83.8060908@rice.edu> <50DD081E.8000409@bluezbox.com> <50EB1841.5030006@bluezbox.com>
next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format. --------------060309000103040501070209 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit On 01/07/2013 12:47, Oleksandr Tymoshenko wrote: > On 12/27/2012 6:46 PM, Oleksandr Tymoshenko wrote: >> On 12/18/2012 1:59 AM, Alan Cox wrote: >>> On 12/17/2012 23:40, Oleksandr Tymoshenko wrote: >>>> On 2012-12-08, at 1:21 PM, Alan Cox <alc@rice.edu> wrote: >>>> >>>>> On 12/08/2012 14:32, Andre Oppermann wrote: >>>> .. skipped .. >>>> >>>>>> The trouble seems to come from NSFBUFS which is (512 + maxusers * >>>>>> 16) >>>>>> resulting in a kernel map of (512 + 400 * 16) * PAGE_SIZE = >>>>>> 27MB. This >>>>>> seem to be pushing it with the smaller ARM kmap layout. >>>>>> >>>>>> Does it boot and run when you set the tunable kern.ipc.nsfbufs=3500? >>>>>> >>>>>> ARM does have a direct map mode as well which doesn't require the >>>>>> allocation >>>>>> of sfbufs. I'm not sure which other problems that approach has. >>>>>> >>>>> Only a few (3?) platforms use it. It reduces the size of the user >>>>> address space, and translation between physical addresses and >>>>> direct map >>>>> addresses is not computationally trivial as it is on other >>>>> architectures, e.g., amd64, ia64. However, it does try to use large >>>>> page mappings. >>>>> >>>>> >>>>>> Hopefully alc@ (added to cc) can answer that and also why the >>>>>> kmap of >>>>>> 27MB >>>>>> manages to wrench the ARM kernel. >>>>>> >>>>> Arm does not define caps on either the buffer map size (param.h) >>>>> or the >>>>> kmem map size (vmparam.h). It would probably make sense to copy >>>>> these >>>>> definitions from i386. >>>> Adding caps didn't help. I did some digging and found out that >>>> although address range >>>> 0xc0000000 .. 0xffffffff is indeed valid for ARM in general actual >>>> KVA space varies for >>>> each specific hardware platform. This "real" KVA is defined by >>>> <virtual_avail, virtual_end> >>>> pair and ifI use them instead of <VM_MIN_KERNEL_ADDRESS, >>>> VM_MAX_KERNEL_ADDRESS> >>>> in init_param2 function my pandaboard successfully boots. Since >>>> former pair is used for defining >>>> kernel_map boundaries I believe it should be used for auto tuning >>>> as well. >>> >>> That makes sense. However, "virtual_avail" isn't the start of the >>> kernel address space. The kernel map always starts at >>> VM_MIN_KERNEL_ADDRESS. (See kmem_init().) "virtual_avail" represents >>> the next unallocated virtual address in the kernel address space at an >>> early point in initialization. "virtual_avail" and "virtual_end" >>> aren't >>> used after that, or outside the VM system. Please use >>> vm_map_min(kernel_map) and vm_map_max(kernel_map) instead. >> >> I checked: kernel_map is not available (NULL) at this point. So we >> can't use it to >> determine real KVA size. Closest thing we can get is >> virtual_avail/virtual_end pair. >> >> Andre, could you approve attached patch for commit or suggest better >> solution? > > Any update on this one? Can I proceed with commit? > Yes, I've now spent a little bit of time looking at this, and I don't see why these calculations and tunable_mbinit() need to be performed before the kernel map is initialized. Let me summarize what I found: 1. The function tunable_mbinit() now has a dependency on the global variable maxmbufmem. tunable_mbinit() is executed under SI_SUB_TUNABLES. tunable_mbinit() defines the global variable nmbclusters. The statements made in the comment at the head of tunable_mbinit() all appear to be false: /* * tunable_mbinit() has to be run before init_maxsockets() thus * the SYSINIT order below is SI_ORDER_MIDDLE while init_maxsockets() * runs at SI_ORDER_ANY. * * NB: This has to be done before VM init. */ I don't see anything in init_maxsockets() that depends on tunable_mbinit(). Moreover, the statement about "VM init" is only correct if you regard the initialization of the kernel's malloc as "VM init". 2. The function kmeminit() in kern/kern_malloc.c has a dependency on the global variable nmbclusters. kmeminit() is executed under SI_SUB_KMEM, which comes after the initialization of the virtual memory system, including the kernel map. 3. The function vm_ksubmap_init() has a dependency on the global variable maxpipekva. vm_ksubmap_init() is executed under SI_SUB_CPU, which comes after SI_SUB_KMEM. Am I missing anything? I'm attaching a patch that defers the calculation of maxpipekva until we actually need it in vm_ksubmap_init(). Any comments on this patch are welcome. Alan --------------060309000103040501070209 Content-Type: text/plain; charset=ISO-8859-15; name="maxpipekva2.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="maxpipekva2.patch" Index: kern/subr_param.c =================================================================== --- kern/subr_param.c (revision 245346) +++ kern/subr_param.c (working copy) @@ -97,7 +97,6 @@ quad_t maxmbufmem; /* max mbuf memory */ pid_t pid_max = PID_MAX; long maxswzone; /* max swmeta KVA storage */ long maxbcache; /* max buffer cache KVA storage */ -long maxpipekva; /* Limit on pipe KVA */ int vm_guest; /* Running as virtual machine guest? */ u_long maxtsiz; /* max text size */ u_long dfldsiz; /* initial data size limit */ @@ -339,18 +338,6 @@ init_param2(long physpages) TUNABLE_QUAD_FETCH("kern.maxmbufmem", &maxmbufmem); if (maxmbufmem > (realmem / 4) * 3) maxmbufmem = (realmem / 4) * 3; - - /* - * The default for maxpipekva is min(1/64 of the kernel address space, - * max(1/64 of main memory, 512KB)). See sys_pipe.c for more details. - */ - maxpipekva = (physpages / 64) * PAGE_SIZE; - TUNABLE_LONG_FETCH("kern.ipc.maxpipekva", &maxpipekva); - if (maxpipekva < 512 * 1024) - maxpipekva = 512 * 1024; - if (maxpipekva > (VM_MAX_KERNEL_ADDRESS - VM_MIN_KERNEL_ADDRESS) / 64) - maxpipekva = (VM_MAX_KERNEL_ADDRESS - VM_MIN_KERNEL_ADDRESS) / - 64; } /* Index: kern/sys_pipe.c =================================================================== --- kern/sys_pipe.c (revision 245346) +++ kern/sys_pipe.c (working copy) @@ -207,6 +207,8 @@ static int pipeallocfail; static int piperesizefail; static int piperesizeallowed = 1; +long maxpipekva; + SYSCTL_LONG(_kern_ipc, OID_AUTO, maxpipekva, CTLFLAG_RDTUN, &maxpipekva, 0, "Pipe KVA limit"); SYSCTL_LONG(_kern_ipc, OID_AUTO, pipekva, CTLFLAG_RD, Index: vm/vm_init.c =================================================================== --- vm/vm_init.c (revision 245346) +++ vm/vm_init.c (working copy) @@ -132,12 +132,14 @@ vm_ksubmap_init(struct kva_md_info *kmi) { vm_offset_t firstaddr; caddr_t v; - vm_size_t size = 0; + vm_size_t kernel_map_size, size = 0; long physmem_est; vm_offset_t minaddr; vm_offset_t maxaddr; vm_map_t clean_map; + kernel_map_size = kernel_map->max_offset - kernel_map->min_offset; + /* * Allocate space for system data structures. * The first available kernel virtual address is in "v". @@ -163,8 +165,7 @@ again: * Discount the physical memory larger than the size of kernel_map * to avoid eating up all of KVA space. */ - physmem_est = lmin(physmem, btoc(kernel_map->max_offset - - kernel_map->min_offset)); + physmem_est = lmin(physmem, btoc(kernel_map_size)); v = kern_vfs_bio_buffer_alloc(v, physmem_est); @@ -195,6 +196,18 @@ again: pager_map->system_map = 1; exec_map = kmem_suballoc(kernel_map, &minaddr, &maxaddr, exec_map_entries * round_page(PATH_MAX + ARG_MAX), FALSE); + + /* + * The default size for the pipe submap, "maxpipekva", is min(1/64 of + * the kernel virtual address space, max(1/64 of the physical memory, + * 512KB)). See sys_pipe.c for more details. + */ + maxpipekva = ctob(physmem / 64); + TUNABLE_LONG_FETCH("kern.ipc.maxpipekva", &maxpipekva); + if (maxpipekva < 512 * 1024) + maxpipekva = 512 * 1024; + if (maxpipekva > kernel_map_size / 64) + maxpipekva = kernel_map_size / 64; pipe_map = kmem_suballoc(kernel_map, &minaddr, &maxaddr, maxpipekva, FALSE); --------------060309000103040501070209--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?50F28806.10505>