From owner-freebsd-arch@FreeBSD.ORG Wed Jan 30 20:49:19 2013 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id DA51BC51 for ; Wed, 30 Jan 2013 20:49:19 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 2CF9663B for ; Wed, 30 Jan 2013 20:49:18 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id WAA29687 for ; Wed, 30 Jan 2013 22:49:10 +0200 (EET) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1U0eak-000OOT-1Q for freebsd-arch@FreeBSD.org; Wed, 30 Jan 2013 22:49:10 +0200 Message-ID: <51098743.2050603@FreeBSD.org> Date: Wed, 30 Jan 2013 22:49:07 +0200 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130121 Thunderbird/17.0.2 MIME-Version: 1.0 To: freebsd-arch@FreeBSD.org Subject: Re: kva size on amd64 References: <507E7E59.8060201@FreeBSD.org> In-Reply-To: <507E7E59.8060201@FreeBSD.org> X-Enigmail-Version: 1.4.6 Content-Type: text/plain; charset=x-viet-vps Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 30 Jan 2013 20:49:19 -0000 on 17/10/2012 12:46 Andriy Gapon said the following: > > What are the main benefits, if any, of limiting KVA space size - or in fact > tying it to physical memory size - on amd64? > This question is perhaps relevant to other platforms with "unlimited kva" too. I actually already have patch that auto-sets kmem_size to kmem_size_max on amd64. My primary motivation is that I from time to time still see reports about too small kmem_map on non-tuned amd64 systems. This is really ridiculous regardless of whether there is ZFS in use or not. Another motivation is that I really see no reason at all to artificially limit KVA. This creates no benefits, increases fragility and reduces flexibility. -- Andriy Gapon From owner-freebsd-arch@FreeBSD.ORG Wed Jan 30 20:58:35 2013 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 4E72CE08 for ; Wed, 30 Jan 2013 20:58:35 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 96A796A5 for ; Wed, 30 Jan 2013 20:58:34 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id WAA29772 for ; Wed, 30 Jan 2013 22:58:33 +0200 (EET) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1U0ejo-000OP0-OM for freebsd-arch@freebsd.org; Wed, 30 Jan 2013 22:58:32 +0200 Message-ID: <51098977.4000603@FreeBSD.org> Date: Wed, 30 Jan 2013 22:58:31 +0200 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130121 Thunderbird/17.0.2 MIME-Version: 1.0 CC: freebsd-arch@FreeBSD.org Subject: axe vm.max_wired [Was: Allow small amount of memory be mlock()'ed by unprivileged process?] References: <4FAC3EAB.6050303@delphij.net> <861umkurt8.fsf@ds4.des.no> <20120517055425.GA802@infradead.org> <4FC762DD.90101@FreeBSD.org> <4FC81D9C.2080801@FreeBSD.org> <4FC8E29F.2010806@shatow.net> <4FC95A10.7000806@freebsd.org> <4FC9F94B.8060708@FreeBSD.org> In-Reply-To: <4FC9F94B.8060708@FreeBSD.org> X-Enigmail-Version: 1.4.6 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 30 Jan 2013 20:58:35 -0000 on 02/06/2012 14:30 Andriy Gapon said the following: > o There is also vm.max_wired sysctl (with no equivalent tunable), which > specifies number of _pages_ that can be wired system wide (by both kernel and > userland). But note that the limit applies only to userland requests, the > kernel is allowed to wire new pages even when the limit is exceeded. By default > the limit is set to 1/3 of available pages. I would like to propose to axe vm.max_wired limit. It is not good when too many pages are wired, but... This limit is quite arbitrary (why 1/3). It's no good for ZFS systems where e.g. 90% of memory can be normally wired by ZFS in kernel. So this limit should be either axed or perhaps replaced with some much higher limit like e.g. v_page_count - 2 * v_free_target or some such number "close" to v_page_count. -- Andriy Gapon From owner-freebsd-arch@FreeBSD.ORG Thu Jan 31 08:10:14 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id F015DB80; Thu, 31 Jan 2013 08:10:14 +0000 (UTC) (envelope-from alan.l.cox@gmail.com) Received: from mail-oa0-f45.google.com (mail-oa0-f45.google.com [209.85.219.45]) by mx1.freebsd.org (Postfix) with ESMTP id B55171B0; Thu, 31 Jan 2013 08:10:14 +0000 (UTC) Received: by mail-oa0-f45.google.com with SMTP id o6so2732609oag.32 for ; Thu, 31 Jan 2013 00:10:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:reply-to:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=PkgrAgwYtKMZX2tpGLcZrm0DMa25FIyIPtJ6Z2fsq6E=; b=OXmByQt+wka/OdY5S6h8ueAgxjcYr0skHCzsvpl1fcw1Dhf00dV42i2i+AgAlm3D9L KqUzWZsTZ3qevmGKs8+D8ue/11++t/CLNlqzYJ5S9YbqtZzKUbU3kLk0Hxr9xnKlgkeW DzJipOAcCH5S8R7HZlWtNqh25M+DPpxnXzIzR75YvFUlW1PjwO4U7eCwFqIhMNLD7Ecd UFt+zzIELPdLNVy9vSd/TscjpmGdKmQU/TsiEBRlAXRVVjZHBEPf15gP6c4OeluT3G6V +6jzGLfw8Yg0ySSRajIVen/mUSfAg6Vvg2qjDNVp8fzaJUirSilal09zCgJ2lsU9q0+G PWPg== MIME-Version: 1.0 X-Received: by 10.182.43.103 with SMTP id v7mr5775361obl.17.1359619813954; Thu, 31 Jan 2013 00:10:13 -0800 (PST) Received: by 10.182.102.69 with HTTP; Thu, 31 Jan 2013 00:10:13 -0800 (PST) In-Reply-To: <51098743.2050603@FreeBSD.org> References: <507E7E59.8060201@FreeBSD.org> <51098743.2050603@FreeBSD.org> Date: Thu, 31 Jan 2013 02:10:13 -0600 Message-ID: Subject: Re: kva size on amd64 From: Alan Cox To: Andriy Gapon Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: alc@freebsd.org List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 31 Jan 2013 08:10:15 -0000 On Wed, Jan 30, 2013 at 2:49 PM, Andriy Gapon wrote: > on 17/10/2012 12:46 Andriy Gapon said the following: > > > > What are the main benefits, if any, of limiting KVA space size - or in > fact > > tying it to physical memory size - on amd64? > > This question is perhaps relevant to other platforms with "unlimited > kva" too. > > I actually already have patch that auto-sets kmem_size to kmem_size_max on > amd64. > > My primary motivation is that I from time to time still see reports about > too > small kmem_map on non-tuned amd64 systems. This is really ridiculous > regardless of whether there is ZFS in use or not. > > Another motivation is that I really see no reason at all to artificially > limit > KVA. This creates no benefits, increases fragility and reduces > flexibility. > > > In short, it will waste a non-trivial amount of physical memory. Unlike user virtual address spaces, page table pages are preallocated for the kernel virtual address space. More precisely, they are preallocated for the reserved (or defined) regions of the kernel map, i.e., every range of addresses that has a corresponding vm_map_entry. The kmem map is one such reserved region. So, if you always set your kmem map to its maximum possible size of ~300GB, then you are preallocating about 600MB of physical memory for page table pages that will never be used (except on machines with 300+ GB of DRAM). From owner-freebsd-arch@FreeBSD.ORG Thu Jan 31 08:32:14 2013 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 8EA2C10A; Thu, 31 Jan 2013 08:32:14 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id A8ECD290; Thu, 31 Jan 2013 08:32:13 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id KAA05436; Thu, 31 Jan 2013 10:32:12 +0200 (EET) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1U0pZ5-000PCy-Mm; Thu, 31 Jan 2013 10:32:11 +0200 Message-ID: <510A2C09.6030709@FreeBSD.org> Date: Thu, 31 Jan 2013 10:32:09 +0200 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130121 Thunderbird/17.0.2 MIME-Version: 1.0 To: alc@FreeBSD.org Subject: Re: kva size on amd64 References: <507E7E59.8060201@FreeBSD.org> <51098743.2050603@FreeBSD.org> In-Reply-To: X-Enigmail-Version: 1.4.6 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Alan Cox , freebsd-arch@FreeBSD.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 31 Jan 2013 08:32:14 -0000 on 31/01/2013 10:10 Alan Cox said the following: > In short, it will waste a non-trivial amount of physical memory. Unlike user > virtual address spaces, page table pages are preallocated for the kernel virtual > address space. More precisely, they are preallocated for the reserved (or > defined) regions of the kernel map, i.e., every range of addresses that has a > corresponding vm_map_entry. The kmem map is one such reserved region. So, if > you always set your kmem map to its maximum possible size of ~300GB, then you > are preallocating about 600MB of physical memory for page table pages that will > never be used (except on machines with 300+ GB of DRAM). Alan, thank you very much for this information! Would it make sense then to do either of the following: - add some (non-trivial) code to auto-grow kmem map upon kva shortage - set default vm_kmem_size to min(2 * mem_size, vm_kmem_size_max) ? Perhaps something else?.. BTW, it seems that in OpenSolaris they do not limit kva size, but probably they allocate kernel page tables in some different way (on demand perhaps). -- Andriy Gapon From owner-freebsd-arch@FreeBSD.ORG Thu Jan 31 09:18:58 2013 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id B175EDA9; Thu, 31 Jan 2013 09:18:58 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) by mx1.freebsd.org (Postfix) with ESMTP id 3D7566F0; Thu, 31 Jan 2013 09:18:58 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.6/8.14.6) with ESMTP id r0V9IrwN026281; Thu, 31 Jan 2013 11:18:53 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.7.4 kib.kiev.ua r0V9IrwN026281 Received: (from kostik@localhost) by tom.home (8.14.6/8.14.6/Submit) id r0V9IrmJ026280; Thu, 31 Jan 2013 11:18:53 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 31 Jan 2013 11:18:53 +0200 From: Konstantin Belousov To: Andriy Gapon Subject: Re: axe vm.max_wired [Was: Allow small amount of memory be mlock()'ed by unprivileged process?] Message-ID: <20130131091853.GI2522@kib.kiev.ua> References: <861umkurt8.fsf@ds4.des.no> <20120517055425.GA802@infradead.org> <4FC762DD.90101@FreeBSD.org> <4FC81D9C.2080801@FreeBSD.org> <4FC8E29F.2010806@shatow.net> <4FC95A10.7000806@freebsd.org> <4FC9F94B.8060708@FreeBSD.org> <51098977.4000603@FreeBSD.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="puRvKymbJtNrbugC" Content-Disposition: inline In-Reply-To: <51098977.4000603@FreeBSD.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: freebsd-arch@FreeBSD.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 31 Jan 2013 09:18:58 -0000 --puRvKymbJtNrbugC Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Jan 30, 2013 at 10:58:31PM +0200, Andriy Gapon wrote: > on 02/06/2012 14:30 Andriy Gapon said the following: > > o There is also vm.max_wired sysctl (with no equivalent tunable), which > > specifies number of _pages_ that can be wired system wide (by both kern= el and > > userland). But note that the limit applies only to userland requests, = the > > kernel is allowed to wire new pages even when the limit is exceeded. B= y default > > the limit is set to 1/3 of available pages. >=20 > I would like to propose to axe vm.max_wired limit. > It is not good when too many pages are wired, but... >=20 > This limit is quite arbitrary (why 1/3). > It's no good for ZFS systems where e.g. 90% of memory can be normally wir= ed by > ZFS in kernel. >=20 > So this limit should be either axed or perhaps replaced with some much hi= gher > limit like e.g. v_page_count - 2 * v_free_target or some such number "clo= se" to > v_page_count. >=20 I dislike your proposal. The limit is useful to prevent the system from entering live-lock. ZFS-using machines should be tuned. Or finally the ZFS caches should communicate the fact that the pages used are for caches and provide easy way for the VM to request flush. This would be big project indeed. E.g., could ZFS make an impression that zfs-cached pages are cached, to VM ? --puRvKymbJtNrbugC Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iQIcBAEBAgAGBQJRCjb9AAoJEJDCuSvBvK1BcREP/A+ujR05ARhoSg+ogHAQmi1X EZl/OngRKYd3SW/WeJV9eCFyPWDVIpnBm+nV4MBn8yVSUM2JCUb5DfPKQoAUUR/6 BV68ruyJrY05sgdbV4LpBHvvp5mT3D3W+eVzsKDad6QNIaCrrHXxrwtUR2G6GlHz n2WXi9h3qQpcCzhicUUNeBs8cjyWp3I8Nz4n52s8d2A1k8ndmtgs6x3bz+Gw7QrI S1xiwEnnKdCUQohKCegjBIyKNDBPWqNEPvPjULoidTBKwlo1uS/91gwSCyCYBN6i KYO3pDfxCU20whMNeVnDdoJ/CN597zZRl7kZIDOPq05kcgOJzrkdC7KpGt0iJ4KK o2q3srY+cPkBL/l5OzqlWhgit1Uc324GhRCPjskhz03S4NsI25I/YMVkzJ+KaxyH BYqJj1ItZcQDJgNEbmI80NiYr0kaJzdltgPiSvlIHBKsLWnCBCYrHQV9Mi8/MsEG uBcVFZWB2NU3RkNv16n7wxxtbsZWDzcW6mThTwHhgZ2QPqL7GnJRDlwcgLwZpznL QOR1PcppPXUhfsLQxXHAeHvlNl2M3e4s93W0xMyhSN00I5eEEfPNIrW9x0Gss8II C6Aq1D42IvjQH1NmvRwNzEz3CKqrqUlijoisIiyHsl5+Mt/YQGCX9PEQvELlpS9X gRDu8rDZNGc3PiVRhcxM =AqzY -----END PGP SIGNATURE----- --puRvKymbJtNrbugC-- From owner-freebsd-arch@FreeBSD.ORG Thu Jan 31 11:24:31 2013 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id A9B1B5DD; Thu, 31 Jan 2013 11:24:31 +0000 (UTC) (envelope-from des@des.no) Received: from smtp.des.no (smtp.des.no [194.63.250.102]) by mx1.freebsd.org (Postfix) with ESMTP id 501D8D48; Thu, 31 Jan 2013 11:24:31 +0000 (UTC) Received: from ds4.des.no (smtp.des.no [194.63.250.102]) by smtp-int.des.no (Postfix) with ESMTP id C05BB65B1; Thu, 31 Jan 2013 11:24:23 +0000 (UTC) Received: by ds4.des.no (Postfix, from userid 1001) id 7C1C3A995; Thu, 31 Jan 2013 12:24:23 +0100 (CET) From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= To: Konstantin Belousov Subject: Re: axe vm.max_wired References: <861umkurt8.fsf@ds4.des.no> <20120517055425.GA802@infradead.org> <4FC762DD.90101@FreeBSD.org> <4FC81D9C.2080801@FreeBSD.org> <4FC8E29F.2010806@shatow.net> <4FC95A10.7000806@freebsd.org> <4FC9F94B.8060708@FreeBSD.org> <51098977.4000603@FreeBSD.org> <20130131091853.GI2522@kib.kiev.ua> Date: Thu, 31 Jan 2013 12:24:21 +0100 In-Reply-To: <20130131091853.GI2522@kib.kiev.ua> (Konstantin Belousov's message of "Thu, 31 Jan 2013 11:18:53 +0200") Message-ID: <86boc5wq6y.fsf@ds4.des.no> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.4 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: Andriy Gapon , freebsd-arch@FreeBSD.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 31 Jan 2013 11:24:31 -0000 Konstantin Belousov writes: > Andriy Gapon writes: > > I would like to propose to axe vm.max_wired limit. > The limit is useful to prevent the system from entering live-lock. > ZFS-using machines should be tuned. ZFS shouldn't be allowed to wire arbitrary amounts of memory. It is nearly impossible to handle passwords and encryption keys securely on ZFS systems, because there is no wired memory left for applications. DES --=20 Dag-Erling Sm=C3=B8rgrav - des@des.no From owner-freebsd-arch@FreeBSD.ORG Thu Jan 31 18:30:33 2013 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id E5DB1EA6; Thu, 31 Jan 2013 18:30:33 +0000 (UTC) (envelope-from alc@rice.edu) Received: from mh3.mail.rice.edu (mh3.mail.rice.edu [128.42.199.10]) by mx1.freebsd.org (Postfix) with ESMTP id AF7EE9B4; Thu, 31 Jan 2013 18:30:33 +0000 (UTC) Received: from mh3.mail.rice.edu (localhost.localdomain [127.0.0.1]) by mh3.mail.rice.edu (Postfix) with ESMTP id 592A4401A9; Thu, 31 Jan 2013 12:30:33 -0600 (CST) Received: from mh3.mail.rice.edu (localhost.localdomain [127.0.0.1]) by mh3.mail.rice.edu (Postfix) with ESMTP id 57F82401A8; Thu, 31 Jan 2013 12:30:33 -0600 (CST) X-Virus-Scanned: by amavis-2.7.0 at mh3.mail.rice.edu, auth channel Received: from mh3.mail.rice.edu ([127.0.0.1]) by mh3.mail.rice.edu (mh3.mail.rice.edu [127.0.0.1]) (amavis, port 10026) with ESMTP id 2QwlumBZVafD; Thu, 31 Jan 2013 12:30:33 -0600 (CST) Received: from adsl-216-63-78-18.dsl.hstntx.swbell.net (adsl-216-63-78-18.dsl.hstntx.swbell.net [216.63.78.18]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) (Authenticated sender: alc) by mh3.mail.rice.edu (Postfix) with ESMTPSA id DF89F401A0; Thu, 31 Jan 2013 12:30:32 -0600 (CST) Message-ID: <510AB848.3010806@rice.edu> Date: Thu, 31 Jan 2013 12:30:32 -0600 From: Alan Cox User-Agent: Mozilla/5.0 (X11; FreeBSD i386; rv:17.0) Gecko/20130127 Thunderbird/17.0.2 MIME-Version: 1.0 To: Andriy Gapon Subject: Re: kva size on amd64 References: <507E7E59.8060201@FreeBSD.org> <51098743.2050603@FreeBSD.org> <510A2C09.6030709@FreeBSD.org> In-Reply-To: <510A2C09.6030709@FreeBSD.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: alc@FreeBSD.org, Alan Cox , freebsd-arch@FreeBSD.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 31 Jan 2013 18:30:34 -0000 On 01/31/2013 02:32, Andriy Gapon wrote: > on 31/01/2013 10:10 Alan Cox said the following: >> In short, it will waste a non-trivial amount of physical memory. Unlike user >> virtual address spaces, page table pages are preallocated for the kernel virtual >> address space. More precisely, they are preallocated for the reserved (or >> defined) regions of the kernel map, i.e., every range of addresses that has a >> corresponding vm_map_entry. The kmem map is one such reserved region. So, if >> you always set your kmem map to its maximum possible size of ~300GB, then you >> are preallocating about 600MB of physical memory for page table pages that will >> never be used (except on machines with 300+ GB of DRAM). > > Alan, > > thank you very much for this information! > > Would it make sense then to do either of the following: > - add some (non-trivial) code to auto-grow kmem map upon kva shortage > - set default vm_kmem_size to min(2 * mem_size, vm_kmem_size_max) > ? > > Perhaps something else?.. Try developing a different allocation strategy for the kmem_map. First-fit is clearly not working well for the ZFS ARC, because of fragmentation. For example, instead of further enlarging the kmem_map, try splitting it into multiple submaps of the same total size, kmem_map1, kmem_map2, etc. Then, utilize these akin to the "old" and "new" spaces of a copying garbage collector or storage segments in a log-structured file system. However, actual copying from an "old" space to a "new" space may not be necessary. By the time that the "new" space from which you are currently allocating fills up or becomes sufficiently fragmented that you can't satisfy an allocation, you've likely created enough contiguous space in an "old" space. I'll hypothesize that just a couple kmem_map submaps that are .625 of physical memory size would suffice. The bottom line is that the total virtual address space should be less than 2x physical memory. In fact, maybe the system starts off with just a single kmem_map, and you only create additional kmem_maps on demand. As someone who doesn't use ZFS that would actually save me physical memory that is currently being wasted on unnecessary preallocated page table pages for my kmem_map. This begins to sound like option (1) that you propose above. This might also help to keep physical memory fragmentation in check. > BTW, it seems that in OpenSolaris they do not limit kva size, but probably they > allocate kernel page tables in some different way (on demand perhaps). > From owner-freebsd-arch@FreeBSD.ORG Fri Feb 1 08:23:51 2013 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 97F0718B for ; Fri, 1 Feb 2013 08:23:51 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id B64CD2A1 for ; Fri, 1 Feb 2013 08:23:50 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id KAA16914; Fri, 01 Feb 2013 10:23:48 +0200 (EET) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1U1BuV-0000kk-SE; Fri, 01 Feb 2013 10:23:48 +0200 Message-ID: <510B7B92.4030804@FreeBSD.org> Date: Fri, 01 Feb 2013 10:23:46 +0200 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130121 Thunderbird/17.0.2 MIME-Version: 1.0 To: Konstantin Belousov Subject: Re: axe vm.max_wired References: <861umkurt8.fsf@ds4.des.no> <20120517055425.GA802@infradead.org> <4FC762DD.90101@FreeBSD.org> <4FC81D9C.2080801@FreeBSD.org> <4FC8E29F.2010806@shatow.net> <4FC95A10.7000806@freebsd.org> <4FC9F94B.8060708@FreeBSD.org> <51098977.4000603@FreeBSD.org> <20130131091853.GI2522@kib.kiev.ua> In-Reply-To: <20130131091853.GI2522@kib.kiev.ua> X-Enigmail-Version: 1.4.6 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-arch@FreeBSD.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Feb 2013 08:23:51 -0000 on 31/01/2013 11:18 Konstantin Belousov said the following: > On Wed, Jan 30, 2013 at 10:58:31PM +0200, Andriy Gapon wrote: >> on 02/06/2012 14:30 Andriy Gapon said the following: >>> o There is also vm.max_wired sysctl (with no equivalent tunable), which >>> specifies number of _pages_ that can be wired system wide (by both kernel and >>> userland). But note that the limit applies only to userland requests, the >>> kernel is allowed to wire new pages even when the limit is exceeded. By default >>> the limit is set to 1/3 of available pages. >> >> I would like to propose to axe vm.max_wired limit. >> It is not good when too many pages are wired, but... >> >> This limit is quite arbitrary (why 1/3). >> It's no good for ZFS systems where e.g. 90% of memory can be normally wired by >> ZFS in kernel. >> >> So this limit should be either axed or perhaps replaced with some much higher >> limit like e.g. v_page_count - 2 * v_free_target or some such number "close" to >> v_page_count. >> > > I dislike your proposal. > > The limit is useful to prevent the system from entering live-lock. Well, I definitely agree that we should prevent all of memory from becoming wired. And I myself don't like full axing of vm.max_wired :-) But I do not fully agree with your logic here. Completely prohibiting any page wiring in userland would achieve the goal too, but that doesn't mean that that would be useful. > ZFS-using machines should be tuned. I would like them to be auto-tuned. > Or finally the ZFS caches should > communicate the fact that the pages used are for caches and provide > easy way for the VM to request flush. This would be big project indeed. > > E.g., could ZFS make an impression that zfs-cached pages are cached, to VM ? I would love to have ZFS ARC implemented differently. But I do not expect that to happen soon. Regarding your question - I do not have an answer. Perhaps let's discuss how that could be done (while preserving useful/advanced features of ARC)... So, meanwhile, I object to your objection :-) You didn't explain why vm.max_wired should be 1/3 of v_page_count by default. You didn't explain how a situation where, say, 80% of pages are wired by kernel is radically better than a situation where 80% of pages are wired by kernel and 1% are wired by userland. So, I still think that vm.max_wired as it is used now is too arbitrary and too indiscriminate to be useful. There are other tools to limit page wiring by userland e.g. memlocked limit. But, as I've said in the original email, I can agree with vm.max_wired usefulness if it is set to something more reasonable by default. IMO, it should not be a fixed percentage of available memory, it should be derived from other VM thresholds related to paging. -- Andriy Gapon From owner-freebsd-arch@FreeBSD.ORG Fri Feb 1 09:47:31 2013 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id DFFE7F38; Fri, 1 Feb 2013 09:47:31 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id F22387DD; Fri, 1 Feb 2013 09:47:30 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id LAA17527; Fri, 01 Feb 2013 11:47:24 +0200 (EET) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1U1DDQ-0000pu-2K; Fri, 01 Feb 2013 11:47:24 +0200 Message-ID: <510B8F2B.5070609@FreeBSD.org> Date: Fri, 01 Feb 2013 11:47:23 +0200 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130121 Thunderbird/17.0.2 MIME-Version: 1.0 To: Alan Cox Subject: Re: kva size on amd64 References: <507E7E59.8060201@FreeBSD.org> <51098743.2050603@FreeBSD.org> <510A2C09.6030709@FreeBSD.org> <510AB848.3010806@rice.edu> In-Reply-To: <510AB848.3010806@rice.edu> X-Enigmail-Version: 1.4.6 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: alc@FreeBSD.org, Alan Cox , freebsd-arch@FreeBSD.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Feb 2013 09:47:31 -0000 on 31/01/2013 20:30 Alan Cox said the following: > Try developing a different allocation strategy for the kmem_map. > First-fit is clearly not working well for the ZFS ARC, because of > fragmentation. For example, instead of further enlarging the kmem_map, > try splitting it into multiple submaps of the same total size, > kmem_map1, kmem_map2, etc. Then, utilize these akin to the "old" and > "new" spaces of a copying garbage collector or storage segments in a > log-structured file system. However, actual copying from an "old" space > to a "new" space may not be necessary. By the time that the "new" space > from which you are currently allocating fills up or becomes sufficiently > fragmented that you can't satisfy an allocation, you've likely created > enough contiguous space in an "old" space. > > I'll hypothesize that just a couple kmem_map submaps that are .625 of > physical memory size would suffice. The bottom line is that the total > virtual address space should be less than 2x physical memory. > > In fact, maybe the system starts off with just a single kmem_map, and > you only create additional kmem_maps on demand. As someone who doesn't > use ZFS that would actually save me physical memory that is currently > being wasted on unnecessary preallocated page table pages for my > kmem_map. This begins to sound like option (1) that you propose above. > > This might also help to keep physical memory fragmentation in check. Alan, very interesting suggestions, thank you! Of course, this is quite a bit more work than just jacking up some limit :-) So, it could be a while before any code materializes. Actually, I have been obsessed quite for some time with an idea of confining ZFS to its own submap. But ZFS does its allocations through malloc(9) and uma(9) (depending on configuration). It seemed like a bit of work to provide support for per-zone or per-tag submaps in uma and malloc. What is your opinion of this approach? P.S. BTW, do I understand correctly that the reservation of kernel page tables happens through vm_map_insert -> pmap_growkernel ? -- Andriy Gapon From owner-freebsd-arch@FreeBSD.ORG Fri Feb 1 09:57:41 2013 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 5BED439F; Fri, 1 Feb 2013 09:57:41 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) by mx1.freebsd.org (Postfix) with ESMTP id 8B9BC85C; Fri, 1 Feb 2013 09:57:40 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.6/8.14.6) with ESMTP id r119vZEa083737; Fri, 1 Feb 2013 11:57:35 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.7.4 kib.kiev.ua r119vZEa083737 Received: (from kostik@localhost) by tom.home (8.14.6/8.14.6/Submit) id r119vZsT083736; Fri, 1 Feb 2013 11:57:35 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 1 Feb 2013 11:57:35 +0200 From: Konstantin Belousov To: Andriy Gapon Subject: Re: kva size on amd64 Message-ID: <20130201095735.GM2522@kib.kiev.ua> References: <507E7E59.8060201@FreeBSD.org> <51098743.2050603@FreeBSD.org> <510A2C09.6030709@FreeBSD.org> <510AB848.3010806@rice.edu> <510B8F2B.5070609@FreeBSD.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="GAvX+JaMaI2IseJS" Content-Disposition: inline In-Reply-To: <510B8F2B.5070609@FreeBSD.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: alc@FreeBSD.org, freebsd-arch@FreeBSD.org, Alan Cox , Alan Cox X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Feb 2013 09:57:41 -0000 --GAvX+JaMaI2IseJS Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Feb 01, 2013 at 11:47:23AM +0200, Andriy Gapon wrote: > on 31/01/2013 20:30 Alan Cox said the following: > > Try developing a different allocation strategy for the kmem_map.=20 > > First-fit is clearly not working well for the ZFS ARC, because of > > fragmentation. For example, instead of further enlarging the kmem_map, > > try splitting it into multiple submaps of the same total size, > > kmem_map1, kmem_map2, etc. Then, utilize these akin to the "old" and > > "new" spaces of a copying garbage collector or storage segments in a > > log-structured file system. However, actual copying from an "old" space > > to a "new" space may not be necessary. By the time that the "new" space > > from which you are currently allocating fills up or becomes sufficiently > > fragmented that you can't satisfy an allocation, you've likely created > > enough contiguous space in an "old" space. > >=20 > > I'll hypothesize that just a couple kmem_map submaps that are .625 of > > physical memory size would suffice. The bottom line is that the total > > virtual address space should be less than 2x physical memory. > >=20 > > In fact, maybe the system starts off with just a single kmem_map, and > > you only create additional kmem_maps on demand. As someone who doesn't > > use ZFS that would actually save me physical memory that is currently > > being wasted on unnecessary preallocated page table pages for my > > kmem_map. This begins to sound like option (1) that you propose above. > >=20 > > This might also help to keep physical memory fragmentation in check. >=20 > Alan, >=20 > very interesting suggestions, thank you! >=20 > Of course, this is quite a bit more work than just jacking up some limit = :-) > So, it could be a while before any code materializes. >=20 > Actually, I have been obsessed quite for some time with an idea of confin= ing ZFS > to its own submap. But ZFS does its allocations through malloc(9) and um= a(9) > (depending on configuration). It seemed like a bit of work to provide sup= port > for per-zone or per-tag submaps in uma and malloc. > What is your opinion of this approach? Definitely not being Alan. I think that the rework of the ZFS memory management should remove the use of uma or kmem_alloc() at all. From what I heard in part from you, there is no reason to keep the filesystem caches mapped full time. I hope to commit shortly a facilities that would allow ZFS to easily manage copying for i/o from the unmapped set of pages. The checksumming you mentioned would require some more work, but this does not look unsurmountable. Having ZFS use raw vm_page_t for caching would also remove the pressure on KVA. >=20 > P.S. > BTW, do I understand correctly that the reservation of kernel page tables > happens through vm_map_insert -> pmap_growkernel ? Yes. E.g. kmem_suballoc->vm_map_find->vm_map_insert->pmap_growkernel. --GAvX+JaMaI2IseJS Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iQIcBAEBAgAGBQJRC5GOAAoJEJDCuSvBvK1BSl4P/AwobgyBRxTlpRtXwizUDZmx ucd851Y5z+h67xiTT3v7DMrHo9k7q4eW/S0aQwb1nbDdHEFUFm8Cesc5tTm/7cJ6 6x2SHbUd362Vwvfr0hfyhcnEB8w0YulapEqwaogzeixh1VFLxfFYTnFFykqFH/8q Wc4LkgiNHyGDQxKHh00HgvsMU0XBZqkqMQrCr73ePR2/CMXNPEQkiJGZqMXlaFYY 5NSpDJqbkjiz8Y5bywOpxdp28Ywdkg9bGwdGzBPeVQZy5RePo8I9GOG3/lhZ0sm8 Cpoez3Pr/iXkLVGz26ZiuO0v4fQcduog/96WarjShU4rC6EthEy5XrJknUUg+wGs OND+e709g+o25APt64LrRw9X1I5l9qeQKTUXRv6NR2V9R5v0pVZF8Dec+3N3ZK9S OiLENXMQB404jSbcBGJyPFaQaxuo0MHAR4Rh6G2AiW+m10hzxL+A3O2WgpLl+JZl eFUNRePxyoMOo2A0ZhkVJrsvS/iOKZnOzeJT1zjTe0VlWuBAvhSteeXveLHN5R2E VedO5XPygw0v1kX/kr1ZAN5g8wISNkE8SikcM9M+nkp4eVHLNmvROGP8SVtBsdRA QyJr4cl1YraxLY6B94VanmxQA6QQS5C9S5i4HUO/b1IfzQyEOSVqjX7RimX00C7i cVdkSr2zo2riOg9NpPgo =xt4H -----END PGP SIGNATURE----- --GAvX+JaMaI2IseJS-- From owner-freebsd-arch@FreeBSD.ORG Fri Feb 1 10:52:54 2013 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 8D1B850F; Fri, 1 Feb 2013 10:52:54 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 9CE90B6B; Fri, 1 Feb 2013 10:52:53 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id MAA18148; Fri, 01 Feb 2013 12:52:44 +0200 (EET) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1U1EEe-0000v9-4N; Fri, 01 Feb 2013 12:52:44 +0200 Message-ID: <510B9E7A.1070709@FreeBSD.org> Date: Fri, 01 Feb 2013 12:52:42 +0200 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130121 Thunderbird/17.0.2 MIME-Version: 1.0 To: Konstantin Belousov Subject: Re: kva size on amd64 References: <507E7E59.8060201@FreeBSD.org> <51098743.2050603@FreeBSD.org> <510A2C09.6030709@FreeBSD.org> <510AB848.3010806@rice.edu> <510B8F2B.5070609@FreeBSD.org> <20130201095735.GM2522@kib.kiev.ua> In-Reply-To: <20130201095735.GM2522@kib.kiev.ua> X-Enigmail-Version: 1.4.6 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: alc@FreeBSD.org, freebsd-arch@FreeBSD.org, Alan Cox , Alan Cox X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Feb 2013 10:52:54 -0000 on 01/02/2013 11:57 Konstantin Belousov said the following: > On Fri, Feb 01, 2013 at 11:47:23AM +0200, Andriy Gapon wrote: > I think that the rework of the ZFS memory management should remove the > use of uma or kmem_alloc() at all. From what I heard in part from you, > there is no reason to keep the filesystem caches mapped full time. > > I hope to commit shortly a facilities that would allow ZFS to easily > manage copying for i/o from the unmapped set of pages. The checksumming > you mentioned would require some more work, but this does not look > unsurmountable. Having ZFS use raw vm_page_t for caching would also > remove the pressure on KVA. Yes, this would be perfect. I think that perhaps we also need some helper API to manage groups of pages. E.g. right now ZFS can malloc or uma_zalloc a 32KB buffer and it would have a single handle (a pointer to the mapped pages). This is convenient. So it would be useful to have some representation for e.g. N non-contiguous unmapped physical pages that logically represent M KB of some contiguous data. An opposite issue is e.g packing 4 (or is it three) unrelated 512-byte blocks into a single page as is possible with uma. So perhaps some "unmapped uma"? Another, purely ZFS issue is that ZFS code freely accesses buffers with metadata. Adding mapping+unmapping code around such all accesses could be cumbersome. All in all, this is not a quick project, IMO. >> P.S. >> BTW, do I understand correctly that the reservation of kernel page tables >> happens through vm_map_insert -> pmap_growkernel ? > > Yes. E.g. kmem_suballoc->vm_map_find->vm_map_insert->pmap_growkernel. > Thank you! -- Andriy Gapon From owner-freebsd-arch@FreeBSD.ORG Sat Feb 2 16:25:18 2013 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id CFFE53C4; Sat, 2 Feb 2013 16:25:18 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) by mx1.freebsd.org (Postfix) with ESMTP id 359299E8; Sat, 2 Feb 2013 16:25:18 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.6/8.14.6) with ESMTP id r12GP941087610; Sat, 2 Feb 2013 18:25:09 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.7.4 kib.kiev.ua r12GP941087610 Received: (from kostik@localhost) by tom.home (8.14.6/8.14.6/Submit) id r12GP93C087609; Sat, 2 Feb 2013 18:25:09 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 2 Feb 2013 18:25:09 +0200 From: Konstantin Belousov To: Andriy Gapon Subject: Re: axe vm.max_wired Message-ID: <20130202162509.GZ2522@kib.kiev.ua> References: <20120517055425.GA802@infradead.org> <4FC762DD.90101@FreeBSD.org> <4FC81D9C.2080801@FreeBSD.org> <4FC8E29F.2010806@shatow.net> <4FC95A10.7000806@freebsd.org> <4FC9F94B.8060708@FreeBSD.org> <51098977.4000603@FreeBSD.org> <20130131091853.GI2522@kib.kiev.ua> <510B7B92.4030804@FreeBSD.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="1HZcmCvxmsp4ai32" Content-Disposition: inline In-Reply-To: <510B7B92.4030804@FreeBSD.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: freebsd-arch@FreeBSD.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Feb 2013 16:25:18 -0000 --1HZcmCvxmsp4ai32 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Feb 01, 2013 at 10:23:46AM +0200, Andriy Gapon wrote: > on 31/01/2013 11:18 Konstantin Belousov said the following: > > On Wed, Jan 30, 2013 at 10:58:31PM +0200, Andriy Gapon wrote: > >> on 02/06/2012 14:30 Andriy Gapon said the following: > >>> o There is also vm.max_wired sysctl (with no equivalent tunable), wh= ich > >>> specifies number of _pages_ that can be wired system wide (by both ke= rnel and > >>> userland). But note that the limit applies only to userland requests= , the > >>> kernel is allowed to wire new pages even when the limit is exceeded. = By default > >>> the limit is set to 1/3 of available pages. > >> > >> I would like to propose to axe vm.max_wired limit. > >> It is not good when too many pages are wired, but... > >> > >> This limit is quite arbitrary (why 1/3). > >> It's no good for ZFS systems where e.g. 90% of memory can be normally = wired by > >> ZFS in kernel. > >> > >> So this limit should be either axed or perhaps replaced with some much= higher > >> limit like e.g. v_page_count - 2 * v_free_target or some such number "= close" to > >> v_page_count. > >> > >=20 > > I dislike your proposal. > >=20 > > The limit is useful to prevent the system from entering live-lock. >=20 > Well, I definitely agree that we should prevent all of memory from becomi= ng > wired. And I myself don't like full axing of vm.max_wired :-) >=20 > But I do not fully agree with your logic here. Completely prohibiting an= y page > wiring in userland would achieve the goal too, but that doesn't mean that= that > would be useful. >=20 > > ZFS-using machines should be tuned. >=20 > I would like them to be auto-tuned. >=20 > > Or finally the ZFS caches should > > communicate the fact that the pages used are for caches and provide > > easy way for the VM to request flush. This would be big project indeed. > >=20 > > E.g., could ZFS make an impression that zfs-cached pages are cached, to= VM ? >=20 > I would love to have ZFS ARC implemented differently. ZFS integration with the VM, is, to say it mildly, not good. The fact that ZFS cache (ARC ?) presents the cached pages as wired, makes the VM almost useless for a ZFS machine. Your displeasure and tweaks should be directed to ZFS integration, and not to unbalancing current tuning which is not that bad for ZFS-less boxes. > But I do not expect that to happen soon. > Regarding your question - I do not have an answer. Perhaps let's discuss= how > that could be done (while preserving useful/advanced features of ARC)... >=20 > So, meanwhile, I object to your objection :-) > You didn't explain why vm.max_wired should be 1/3 of v_page_count by defa= ult. > You didn't explain how a situation where, say, 80% of pages are wired by = kernel > is radically better than a situation where 80% of pages are wired by kern= el and > 1% are wired by userland. >=20 > So, I still think that vm.max_wired as it is used now is too arbitrary an= d too > indiscriminate to be useful. It is sized well to the default size of the buffer map, which takes 10% of the physical RAM of the machine. Since buffers wiring the pages, be it VMIO or malloc buffer, this leaves 20% for other things, like mbufs, page tables and user wires. >=20 > There are other tools to limit page wiring by userland e.g. memlocked lim= it. The memlock limit is per-process. It is completely useless as the safety measure. >=20 > But, as I've said in the original email, I can agree with vm.max_wired > usefulness if it is set to something more reasonable by default. > IMO, it should not be a fixed percentage of available memory, it should be > derived from other VM thresholds related to paging. Might be. Please provide a suggestion or better, a change. --1HZcmCvxmsp4ai32 Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iQIcBAEBAgAGBQJRDT3kAAoJEJDCuSvBvK1BE8oP/RgdrDpRyGLUcSwbl3Htn4+E 3rR++k82lp2PA0Dn6XkNOLKMK6jPg0ytuu46gkl2t6jCSwuA0cvUOgK5p56Pm763 XYQ+ptV9G8wg4QaZc4yFzhEDEbqEPeO253qmpNhh59T680Ylxho9ZDRcvVQEyHtt /J9VpjfwZV4wUdwbR8dPz0zRt+OaOJuie1e5HlAuhBfele5PoMARkcUFav0i5RZu YfYfUkj9VHL9eLCticdvAezXuLj268fmGRK4pjU2u9Ke1qxLngoe0TRVRV5/2Zgk 7e6f2h5xmbQvdxyAGsLivxtVWqKUgFmu2y8Cl4sUn2i2Jvoqcp5VPtTn1VnfhqQm d7whq6GhQ/fASnHjPMglsn+1YnUxMvXl/v0OfJED4AXuRLxDiZAmSRyiLUhSgXfB YiDWu8A040s3yk8BVJWttows7wzVTGFbi14eXQWbNnQL/3XQpohpQmO6AoVfhxZm 3hMTeDZjn5VwqzkYkp449YPATm0xL6dtd5rMxxX1uxAUOPTfTtA3E/GWd2PlxywA k7DICYTIWsCTaO+3DbmCsb2tKyRopQLD4LEzwa29zOd1mMPHhknCBEevBPaTm7el J7pvo0PyDT4Q7TcN7dWnfhYfV6UhxeHMzvad2U3MNYTlZUlmTrFwroCC4hM3kbuH wvshxawnnOwS+yyct6h4 =Wijp -----END PGP SIGNATURE----- --1HZcmCvxmsp4ai32-- From owner-freebsd-arch@FreeBSD.ORG Sat Feb 2 16:33:27 2013 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 1C9F34C5; Sat, 2 Feb 2013 16:33:27 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) by mx1.freebsd.org (Postfix) with ESMTP id 7D5F6A29; Sat, 2 Feb 2013 16:33:26 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.6/8.14.6) with ESMTP id r12GXMsX088485; Sat, 2 Feb 2013 18:33:22 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.7.4 kib.kiev.ua r12GXMsX088485 Received: (from kostik@localhost) by tom.home (8.14.6/8.14.6/Submit) id r12GXMVP088484; Sat, 2 Feb 2013 18:33:22 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 2 Feb 2013 18:33:22 +0200 From: Konstantin Belousov To: current@freebsd.org, arch@freebsd.org Subject: Physbio changes final call for tests and reviews Message-ID: <20130202163322.GA2522@kib.kiev.ua> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="xIk0xHvQc0Ku+QuL" Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: powerpc@freebsd.org, mips@freebsd.org, jeff@freebsd.org, ia64@freebsd.org, sparc64@freebsd.org, arm@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Feb 2013 16:33:27 -0000 --xIk0xHvQc0Ku+QuL Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi, I finished the last (insignificant) missed bits in the Jeff' physbio work. Now I am asking for the last round of testing and review, esp. for the !x86 architectures. Another testing focus are the SCSI HBAs and RAID controllers which drivers are changed by the patchset. Please do test this before the patchset is committed into HEAD ! The plan is to commit the patch somewhere in two weeks from this moment. The patch is required for the finalizing of the unmapped I/O work for UFS I did in parallel, which I hope to finish shortly after the commit. Patch is available at http://people.freebsd.org/~kib/misc/physbio.5.diff Thank you in advance. --xIk0xHvQc0Ku+QuL Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iQIcBAEBAgAGBQJRDT/SAAoJEJDCuSvBvK1BmGoP/314FHxSTtWQ22v30ZkLUUHw p/5C3qRV+xjpEWfE80WQtMobuh9XMxaHGLJxz9UvVJQIqx5WUe0bINN4PfwzVbSm r+Zw+LtyVrBdVgMzuq48h4IEFHkcxN1O6VznBKPBBDYcIZTTZ7y/bdljiB7q0HRP G7SHsOfpIN8CSZNXPodVMIeagDPcbVDb3yVtW5zIsHeMyxCtjWIRCczVCwr0vpXQ kfOGqAquLrXlv95e04LB08REJnrymkoPpMmA6mHdxY3a8ggMXcd4wFVP8YjX23iR rEY8idLlW4rLfMwZuCcyR5w4UIcsx2NOhcyouFVRDYs5HxLDoDBUx+GYFu5fOlaL NShrb/pc6AQZHucLjDfLxn+479Z7zgfxFDDiDdnGjUdmcYHZds1v+6WuwwIi+HwD X6Edqfd37h+itGBt2h8LNgNA4urbWW7Xq7amsojjmOBgqdJ2Pe6ML+L0txwtY/cN ICcBuDlJj3qXf81JZqr2zBa7+1DOYB+EFMTtOBwIUQKixAh1PL/+EbarDNLTOKld tiRL9wjkiGKGXgX5fBj06949XMP2Nsi4xyQwc9hhvujsY4zNmBJvvj/ui6w7tRGI P68tmOBeVPwVVHzfFvOm4MgHZpS0gzL9cr04p+knB6IBzjkfBEmUjGXC/85MGmXG TY4uqxb8dCMidsn6TG3D =lu5h -----END PGP SIGNATURE----- --xIk0xHvQc0Ku+QuL-- From owner-freebsd-arch@FreeBSD.ORG Sat Feb 2 18:34:40 2013 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 0B0E4DBB; Sat, 2 Feb 2013 18:34:40 +0000 (UTC) (envelope-from alc@rice.edu) Received: from mh3.mail.rice.edu (mh3.mail.rice.edu [128.42.199.10]) by mx1.freebsd.org (Postfix) with ESMTP id C7C1AF7E; Sat, 2 Feb 2013 18:34:39 +0000 (UTC) Received: from mh3.mail.rice.edu (localhost.localdomain [127.0.0.1]) by mh3.mail.rice.edu (Postfix) with ESMTP id 9628740199; Sat, 2 Feb 2013 12:34:33 -0600 (CST) Received: from mh3.mail.rice.edu (localhost.localdomain [127.0.0.1]) by mh3.mail.rice.edu (Postfix) with ESMTP id 9471D40183; Sat, 2 Feb 2013 12:34:33 -0600 (CST) X-Virus-Scanned: by amavis-2.7.0 at mh3.mail.rice.edu, auth channel Received: from mh3.mail.rice.edu ([127.0.0.1]) by mh3.mail.rice.edu (mh3.mail.rice.edu [127.0.0.1]) (amavis, port 10026) with ESMTP id NRwx4diKBFgP; Sat, 2 Feb 2013 12:34:33 -0600 (CST) Received: from adsl-216-63-78-18.dsl.hstntx.swbell.net (adsl-216-63-78-18.dsl.hstntx.swbell.net [216.63.78.18]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) (Authenticated sender: alc) by mh3.mail.rice.edu (Postfix) with ESMTPSA id DB61640182; Sat, 2 Feb 2013 12:34:32 -0600 (CST) Message-ID: <510D5C37.6000507@rice.edu> Date: Sat, 02 Feb 2013 12:34:31 -0600 From: Alan Cox User-Agent: Mozilla/5.0 (X11; FreeBSD i386; rv:17.0) Gecko/20130127 Thunderbird/17.0.2 MIME-Version: 1.0 To: Andriy Gapon Subject: Re: kva size on amd64 References: <507E7E59.8060201@FreeBSD.org> <51098743.2050603@FreeBSD.org> <510A2C09.6030709@FreeBSD.org> <510AB848.3010806@rice.edu> <510B8F2B.5070609@FreeBSD.org> In-Reply-To: <510B8F2B.5070609@FreeBSD.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: alc@FreeBSD.org, Alan Cox , freebsd-arch@FreeBSD.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Feb 2013 18:34:40 -0000 On 02/01/2013 03:47, Andriy Gapon wrote: > on 31/01/2013 20:30 Alan Cox said the following: >> Try developing a different allocation strategy for the kmem_map. >> First-fit is clearly not working well for the ZFS ARC, because of >> fragmentation. For example, instead of further enlarging the kmem_map, >> try splitting it into multiple submaps of the same total size, >> kmem_map1, kmem_map2, etc. Then, utilize these akin to the "old" and >> "new" spaces of a copying garbage collector or storage segments in a >> log-structured file system. However, actual copying from an "old" space >> to a "new" space may not be necessary. By the time that the "new" space >> from which you are currently allocating fills up or becomes sufficiently >> fragmented that you can't satisfy an allocation, you've likely created >> enough contiguous space in an "old" space. >> >> I'll hypothesize that just a couple kmem_map submaps that are .625 of >> physical memory size would suffice. The bottom line is that the total >> virtual address space should be less than 2x physical memory. >> >> In fact, maybe the system starts off with just a single kmem_map, and >> you only create additional kmem_maps on demand. As someone who doesn't >> use ZFS that would actually save me physical memory that is currently >> being wasted on unnecessary preallocated page table pages for my >> kmem_map. This begins to sound like option (1) that you propose above. >> >> This might also help to keep physical memory fragmentation in check. > Alan, > > very interesting suggestions, thank you! > > Of course, this is quite a bit more work than just jacking up some limit :-) > So, it could be a while before any code materializes. > > Actually, I have been obsessed quite for some time with an idea of confining ZFS > to its own submap. But ZFS does its allocations through malloc(9) and uma(9) > (depending on configuration). It seemed like a bit of work to provide support > for per-zone or per-tag submaps in uma and malloc. > What is your opinion of this approach? I'm skeptical that it would accomplish anything. Specifically, I don't think that it would have any impact on the fragmentation problem that we have with ZFS. On amd64, with its direct map, all small allocations are implemented by uma_small_alloc() and accessed through the direct map, rather than coming from the kmem map. Outside of ZFS, large, multipage allocations from the kmem map aren't that common. So, for all practical purposes, ZFS has the kmem map to itself. While I'm here, I'll offer some other food for thought. In HEAD, we have a new-ish function, vm_page_alloc_contig(), that can allocate contiguous, unmapped physical pages either to an arbitrary vm object or VM_ALLOC_NOOBJ, just like vm_page_alloc(). Moreover, just like vm_page_alloc(), it honors the VM_ALLOC_{NORMAL,SYSTEM,INTERRUPT} thresholds and wakes the page daemon when appropriate. Using this function, you could rewrite the multipage allocation code to first attempt allocation through vm_page_alloc_contig() and then fall back to the kmem map only if vm_page_alloc_contig() fails. > P.S. > BTW, do I understand correctly that the reservation of kernel page tables > happens through vm_map_insert -> pmap_growkernel ? > I believe kib@ already answered this, but, yes, that is correct. From owner-freebsd-arch@FreeBSD.ORG Sat Feb 2 21:47:17 2013 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 945863D3; Sat, 2 Feb 2013 21:47:17 +0000 (UTC) (envelope-from marius@alchemy.franken.de) Received: from alchemy.franken.de (alchemy.franken.de [194.94.249.214]) by mx1.freebsd.org (Postfix) with ESMTP id 2E8377A7; Sat, 2 Feb 2013 21:47:16 +0000 (UTC) Received: from alchemy.franken.de (localhost [127.0.0.1]) by alchemy.franken.de (8.14.5/8.14.5/ALCHEMY.FRANKEN.DE) with ESMTP id r12Ll9JK099456; Sat, 2 Feb 2013 22:47:09 +0100 (CET) (envelope-from marius@alchemy.franken.de) Received: (from marius@localhost) by alchemy.franken.de (8.14.5/8.14.5/Submit) id r12Ll9dm099455; Sat, 2 Feb 2013 22:47:09 +0100 (CET) (envelope-from marius) Date: Sat, 2 Feb 2013 22:47:09 +0100 From: Marius Strobl To: Konstantin Belousov Subject: Re: Physbio changes final call for tests and reviews Message-ID: <20130202214709.GA99418@alchemy.franken.de> References: <20130202163322.GA2522@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130202163322.GA2522@kib.kiev.ua> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: powerpc@freebsd.org, mips@freebsd.org, current@freebsd.org, jeff@freebsd.org, ia64@freebsd.org, arch@freebsd.org, sparc64@freebsd.org, arm@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Feb 2013 21:47:17 -0000 On Sat, Feb 02, 2013 at 06:33:22PM +0200, Konstantin Belousov wrote: > Hi, > I finished the last (insignificant) missed bits in the Jeff' physbio > work. Now I am asking for the last round of testing and review, esp. for > the !x86 architectures. Another testing focus are the SCSI HBAs and RAID > controllers which drivers are changed by the patchset. Please do test > this before the patchset is committed into HEAD ! > > The plan is to commit the patch somewhere in two weeks from this moment. > The patch is required for the finalizing of the unmapped I/O work for UFS > I did in parallel, which I hope to finish shortly after the commit. > > Patch is available at http://people.freebsd.org/~kib/misc/physbio.5.diff > First tests on sparc64 with ata(4), mpt(4) and sym(4) look good (to be sure I still need to test with a machine using a streaming buffer in addition to the IOMMU, though). However, by accident I noticed that your patch (i.e. stock head is fine) somehow breaks smartd of smartmontools with ata(4): root@b1k2:/root # smartd ata3: timeout waiting for write DRQ The machine just hangs at this point (it's also strange that the above message is from the PIO rather than from the DMA path). One note: mjacob@ probably will be annoyed if you don't wrap the changes to isp(4) in __FreeBSD_version so the same source still compiles on older ones. Marius