From owner-freebsd-arch@FreeBSD.ORG Thu Jan 31 18:30:33 2013 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id E5DB1EA6; Thu, 31 Jan 2013 18:30:33 +0000 (UTC) (envelope-from alc@rice.edu) Received: from mh3.mail.rice.edu (mh3.mail.rice.edu [128.42.199.10]) by mx1.freebsd.org (Postfix) with ESMTP id AF7EE9B4; Thu, 31 Jan 2013 18:30:33 +0000 (UTC) Received: from mh3.mail.rice.edu (localhost.localdomain [127.0.0.1]) by mh3.mail.rice.edu (Postfix) with ESMTP id 592A4401A9; Thu, 31 Jan 2013 12:30:33 -0600 (CST) Received: from mh3.mail.rice.edu (localhost.localdomain [127.0.0.1]) by mh3.mail.rice.edu (Postfix) with ESMTP id 57F82401A8; Thu, 31 Jan 2013 12:30:33 -0600 (CST) X-Virus-Scanned: by amavis-2.7.0 at mh3.mail.rice.edu, auth channel Received: from mh3.mail.rice.edu ([127.0.0.1]) by mh3.mail.rice.edu (mh3.mail.rice.edu [127.0.0.1]) (amavis, port 10026) with ESMTP id 2QwlumBZVafD; Thu, 31 Jan 2013 12:30:33 -0600 (CST) Received: from adsl-216-63-78-18.dsl.hstntx.swbell.net (adsl-216-63-78-18.dsl.hstntx.swbell.net [216.63.78.18]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) (Authenticated sender: alc) by mh3.mail.rice.edu (Postfix) with ESMTPSA id DF89F401A0; Thu, 31 Jan 2013 12:30:32 -0600 (CST) Message-ID: <510AB848.3010806@rice.edu> Date: Thu, 31 Jan 2013 12:30:32 -0600 From: Alan Cox User-Agent: Mozilla/5.0 (X11; FreeBSD i386; rv:17.0) Gecko/20130127 Thunderbird/17.0.2 MIME-Version: 1.0 To: Andriy Gapon Subject: Re: kva size on amd64 References: <507E7E59.8060201@FreeBSD.org> <51098743.2050603@FreeBSD.org> <510A2C09.6030709@FreeBSD.org> In-Reply-To: <510A2C09.6030709@FreeBSD.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: alc@FreeBSD.org, Alan Cox , freebsd-arch@FreeBSD.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 31 Jan 2013 18:30:34 -0000 On 01/31/2013 02:32, Andriy Gapon wrote: > on 31/01/2013 10:10 Alan Cox said the following: >> In short, it will waste a non-trivial amount of physical memory. Unlike user >> virtual address spaces, page table pages are preallocated for the kernel virtual >> address space. More precisely, they are preallocated for the reserved (or >> defined) regions of the kernel map, i.e., every range of addresses that has a >> corresponding vm_map_entry. The kmem map is one such reserved region. So, if >> you always set your kmem map to its maximum possible size of ~300GB, then you >> are preallocating about 600MB of physical memory for page table pages that will >> never be used (except on machines with 300+ GB of DRAM). > > Alan, > > thank you very much for this information! > > Would it make sense then to do either of the following: > - add some (non-trivial) code to auto-grow kmem map upon kva shortage > - set default vm_kmem_size to min(2 * mem_size, vm_kmem_size_max) > ? > > Perhaps something else?.. Try developing a different allocation strategy for the kmem_map. First-fit is clearly not working well for the ZFS ARC, because of fragmentation. For example, instead of further enlarging the kmem_map, try splitting it into multiple submaps of the same total size, kmem_map1, kmem_map2, etc. Then, utilize these akin to the "old" and "new" spaces of a copying garbage collector or storage segments in a log-structured file system. However, actual copying from an "old" space to a "new" space may not be necessary. By the time that the "new" space from which you are currently allocating fills up or becomes sufficiently fragmented that you can't satisfy an allocation, you've likely created enough contiguous space in an "old" space. I'll hypothesize that just a couple kmem_map submaps that are .625 of physical memory size would suffice. The bottom line is that the total virtual address space should be less than 2x physical memory. In fact, maybe the system starts off with just a single kmem_map, and you only create additional kmem_maps on demand. As someone who doesn't use ZFS that would actually save me physical memory that is currently being wasted on unnecessary preallocated page table pages for my kmem_map. This begins to sound like option (1) that you propose above. This might also help to keep physical memory fragmentation in check. > BTW, it seems that in OpenSolaris they do not limit kva size, but probably they > allocate kernel page tables in some different way (on demand perhaps). >