From owner-freebsd-arch@FreeBSD.ORG  Wed Jan 30 20:49:19 2013
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id DA51BC51
 for <freebsd-arch@FreeBSD.org>; Wed, 30 Jan 2013 20:49:19 +0000 (UTC)
 (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id 2CF9663B
 for <freebsd-arch@FreeBSD.org>; Wed, 30 Jan 2013 20:49:18 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
 [212.40.38.100])
 by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id WAA29687
 for <freebsd-arch@FreeBSD.org>; Wed, 30 Jan 2013 22:49:10 +0200 (EET)
 (envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
 by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
 id 1U0eak-000OOT-1Q
 for freebsd-arch@FreeBSD.org; Wed, 30 Jan 2013 22:49:10 +0200
Message-ID: <51098743.2050603@FreeBSD.org>
Date: Wed, 30 Jan 2013 22:49:07 +0200
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:17.0) Gecko/20130121 Thunderbird/17.0.2
MIME-Version: 1.0
To: freebsd-arch@FreeBSD.org
Subject: Re: kva size on amd64
References: <507E7E59.8060201@FreeBSD.org>
In-Reply-To: <507E7E59.8060201@FreeBSD.org>
X-Enigmail-Version: 1.4.6
Content-Type: text/plain; charset=x-viet-vps
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 30 Jan 2013 20:49:19 -0000

on 17/10/2012 12:46 Andriy Gapon said the following:
> 
> What are the main benefits, if any, of limiting KVA space size - or in fact
> tying it to physical memory size - on amd64?
> This question is perhaps relevant to other platforms with "unlimited kva" too.

I actually already have patch that auto-sets kmem_size to kmem_size_max on amd64.

My primary motivation is that I from time to time still see reports about too
small kmem_map on non-tuned amd64 systems.   This is really ridiculous
regardless of whether there is ZFS in use or not.

Another motivation is that I really see no reason at all to artificially limit
KVA.  This creates no benefits, increases fragility and reduces flexibility.


-- 
Andriy Gapon

From owner-freebsd-arch@FreeBSD.ORG  Wed Jan 30 20:58:35 2013
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 4E72CE08
 for <freebsd-arch@FreeBSD.org>; Wed, 30 Jan 2013 20:58:35 +0000 (UTC)
 (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id 96A796A5
 for <freebsd-arch@FreeBSD.org>; Wed, 30 Jan 2013 20:58:34 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
 [212.40.38.100])
 by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id WAA29772
 for <freebsd-arch@freebsd.org>; Wed, 30 Jan 2013 22:58:33 +0200 (EET)
 (envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
 by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
 id 1U0ejo-000OP0-OM
 for freebsd-arch@freebsd.org; Wed, 30 Jan 2013 22:58:32 +0200
Message-ID: <51098977.4000603@FreeBSD.org>
Date: Wed, 30 Jan 2013 22:58:31 +0200
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:17.0) Gecko/20130121 Thunderbird/17.0.2
MIME-Version: 1.0
CC: freebsd-arch@FreeBSD.org
Subject: axe vm.max_wired [Was: Allow small amount of memory be mlock()'ed
 by unprivileged process?]
References: <4FAC3EAB.6050303@delphij.net> <861umkurt8.fsf@ds4.des.no>
 <CAJ-VmokY+pgcq999NHShbq-3rK3=oeWT2WY7NmTvVdXOHZJhdg@mail.gmail.com>
 <CAF6rxgmDW21aPJ5Mp6Tbk1z02ivw4UPhSaNEX+Wiu7O0v13skA@mail.gmail.com>
 <20120517055425.GA802@infradead.org> <4FC762DD.90101@FreeBSD.org>
 <4FC81D9C.2080801@FreeBSD.org> <4FC8E29F.2010806@shatow.net>
 <4FC95A10.7000806@freebsd.org> <4FC9F94B.8060708@FreeBSD.org>
In-Reply-To: <4FC9F94B.8060708@FreeBSD.org>
X-Enigmail-Version: 1.4.6
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 30 Jan 2013 20:58:35 -0000

on 02/06/2012 14:30 Andriy Gapon said the following:
> o  There is also vm.max_wired sysctl (with no equivalent tunable), which
> specifies number of _pages_ that can be wired system wide (by both kernel and
> userland).  But note that the limit applies only to userland requests, the
> kernel is allowed to wire new pages even when the limit is exceeded.  By default
> the limit is set to 1/3 of available pages.

I would like to propose to axe vm.max_wired limit.
It is not good when too many pages are wired, but...

This limit is quite arbitrary (why 1/3).
It's no good for ZFS systems where e.g. 90% of memory can be normally wired by
ZFS in kernel.

So this limit should be either axed or perhaps replaced with some much higher
limit like e.g. v_page_count - 2 * v_free_target or some such number "close" to
v_page_count.

-- 
Andriy Gapon

From owner-freebsd-arch@FreeBSD.ORG  Thu Jan 31 08:10:14 2013
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id F015DB80;
 Thu, 31 Jan 2013 08:10:14 +0000 (UTC)
 (envelope-from alan.l.cox@gmail.com)
Received: from mail-oa0-f45.google.com (mail-oa0-f45.google.com
 [209.85.219.45]) by mx1.freebsd.org (Postfix) with ESMTP id B55171B0;
 Thu, 31 Jan 2013 08:10:14 +0000 (UTC)
Received: by mail-oa0-f45.google.com with SMTP id o6so2732609oag.32
 for <multiple recipients>; Thu, 31 Jan 2013 00:10:14 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:x-received:reply-to:in-reply-to:references:date
 :message-id:subject:from:to:cc:content-type;
 bh=PkgrAgwYtKMZX2tpGLcZrm0DMa25FIyIPtJ6Z2fsq6E=;
 b=OXmByQt+wka/OdY5S6h8ueAgxjcYr0skHCzsvpl1fcw1Dhf00dV42i2i+AgAlm3D9L
 KqUzWZsTZ3qevmGKs8+D8ue/11++t/CLNlqzYJ5S9YbqtZzKUbU3kLk0Hxr9xnKlgkeW
 DzJipOAcCH5S8R7HZlWtNqh25M+DPpxnXzIzR75YvFUlW1PjwO4U7eCwFqIhMNLD7Ecd
 UFt+zzIELPdLNVy9vSd/TscjpmGdKmQU/TsiEBRlAXRVVjZHBEPf15gP6c4OeluT3G6V
 +6jzGLfw8Yg0ySSRajIVen/mUSfAg6Vvg2qjDNVp8fzaJUirSilal09zCgJ2lsU9q0+G
 PWPg==
MIME-Version: 1.0
X-Received: by 10.182.43.103 with SMTP id v7mr5775361obl.17.1359619813954;
 Thu, 31 Jan 2013 00:10:13 -0800 (PST)
Received: by 10.182.102.69 with HTTP; Thu, 31 Jan 2013 00:10:13 -0800 (PST)
In-Reply-To: <51098743.2050603@FreeBSD.org>
References: <507E7E59.8060201@FreeBSD.org>
	<51098743.2050603@FreeBSD.org>
Date: Thu, 31 Jan 2013 02:10:13 -0600
Message-ID: <CAJUyCcOvHXauk76LnahQPGmdcHbkDOiR1_=4w+DW=sZ6i6EJ+A@mail.gmail.com>
Subject: Re: kva size on amd64
From: Alan Cox <alan.l.cox@gmail.com>
To: Andriy Gapon <avg@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.14
Cc: freebsd-arch@freebsd.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
Reply-To: alc@freebsd.org
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 31 Jan 2013 08:10:15 -0000

On Wed, Jan 30, 2013 at 2:49 PM, Andriy Gapon <avg@freebsd.org> wrote:

> on 17/10/2012 12:46 Andriy Gapon said the following:
> >
> > What are the main benefits, if any, of limiting KVA space size - or in
> fact
> > tying it to physical memory size - on amd64?
> > This question is perhaps relevant to other platforms with "unlimited
> kva" too.
>
> I actually already have patch that auto-sets kmem_size to kmem_size_max on
> amd64.
>
> My primary motivation is that I from time to time still see reports about
> too
> small kmem_map on non-tuned amd64 systems.   This is really ridiculous
> regardless of whether there is ZFS in use or not.
>
> Another motivation is that I really see no reason at all to artificially
> limit
> KVA.  This creates no benefits, increases fragility and reduces
> flexibility.
>
>
>

In short, it will waste a non-trivial amount of physical memory.  Unlike
user virtual address spaces, page table pages are preallocated for the
kernel virtual address space.  More precisely, they are preallocated for
the reserved (or defined) regions of the kernel map, i.e., every range of
addresses that has a corresponding vm_map_entry.  The kmem map is one such
reserved region.  So, if you always set your kmem map to its maximum
possible size of ~300GB, then you are preallocating about 600MB of physical
memory for page table pages that will never be used (except on machines
with 300+ GB of DRAM).

From owner-freebsd-arch@FreeBSD.ORG  Thu Jan 31 08:32:14 2013
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 8EA2C10A;
 Thu, 31 Jan 2013 08:32:14 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id A8ECD290;
 Thu, 31 Jan 2013 08:32:13 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
 [212.40.38.100])
 by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id KAA05436;
 Thu, 31 Jan 2013 10:32:12 +0200 (EET) (envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
 by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
 id 1U0pZ5-000PCy-Mm; Thu, 31 Jan 2013 10:32:11 +0200
Message-ID: <510A2C09.6030709@FreeBSD.org>
Date: Thu, 31 Jan 2013 10:32:09 +0200
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:17.0) Gecko/20130121 Thunderbird/17.0.2
MIME-Version: 1.0
To: alc@FreeBSD.org
Subject: Re: kva size on amd64
References: <507E7E59.8060201@FreeBSD.org> <51098743.2050603@FreeBSD.org>
 <CAJUyCcOvHXauk76LnahQPGmdcHbkDOiR1_=4w+DW=sZ6i6EJ+A@mail.gmail.com>
In-Reply-To: <CAJUyCcOvHXauk76LnahQPGmdcHbkDOiR1_=4w+DW=sZ6i6EJ+A@mail.gmail.com>
X-Enigmail-Version: 1.4.6
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: Alan Cox <alan.l.cox@gmail.com>, freebsd-arch@FreeBSD.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 31 Jan 2013 08:32:14 -0000

on 31/01/2013 10:10 Alan Cox said the following:
> In short, it will waste a non-trivial amount of physical memory.  Unlike user
> virtual address spaces, page table pages are preallocated for the kernel virtual
> address space.  More precisely, they are preallocated for the reserved (or
> defined) regions of the kernel map, i.e., every range of addresses that has a
> corresponding vm_map_entry.  The kmem map is one such reserved region.  So, if
> you always set your kmem map to its maximum possible size of ~300GB, then you
> are preallocating about 600MB of physical memory for page table pages that will
> never be used (except on machines with 300+ GB of DRAM).


Alan,

thank you very much for this information!

Would it make sense then to do either of the following:
- add some (non-trivial) code to auto-grow kmem map upon kva shortage
- set default vm_kmem_size to min(2 * mem_size, vm_kmem_size_max)
?

Perhaps something else?..

BTW, it seems that in OpenSolaris they do not limit kva size, but probably they
allocate kernel page tables in some different way (on demand perhaps).

-- 
Andriy Gapon

From owner-freebsd-arch@FreeBSD.ORG  Thu Jan 31 09:18:58 2013
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id B175EDA9;
 Thu, 31 Jan 2013 09:18:58 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 by mx1.freebsd.org (Postfix) with ESMTP id 3D7566F0;
 Thu, 31 Jan 2013 09:18:58 +0000 (UTC)
Received: from tom.home (kostik@localhost [127.0.0.1])
 by kib.kiev.ua (8.14.6/8.14.6) with ESMTP id r0V9IrwN026281;
 Thu, 31 Jan 2013 11:18:53 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.7.4 kib.kiev.ua r0V9IrwN026281
Received: (from kostik@localhost)
 by tom.home (8.14.6/8.14.6/Submit) id r0V9IrmJ026280;
 Thu, 31 Jan 2013 11:18:53 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Thu, 31 Jan 2013 11:18:53 +0200
From: Konstantin Belousov <kostikbel@gmail.com>
To: Andriy Gapon <avg@FreeBSD.org>
Subject: Re: axe vm.max_wired [Was: Allow small amount of memory be
 mlock()'ed by unprivileged process?]
Message-ID: <20130131091853.GI2522@kib.kiev.ua>
References: <861umkurt8.fsf@ds4.des.no>
 <CAJ-VmokY+pgcq999NHShbq-3rK3=oeWT2WY7NmTvVdXOHZJhdg@mail.gmail.com>
 <CAF6rxgmDW21aPJ5Mp6Tbk1z02ivw4UPhSaNEX+Wiu7O0v13skA@mail.gmail.com>
 <20120517055425.GA802@infradead.org> <4FC762DD.90101@FreeBSD.org>
 <4FC81D9C.2080801@FreeBSD.org> <4FC8E29F.2010806@shatow.net>
 <4FC95A10.7000806@freebsd.org> <4FC9F94B.8060708@FreeBSD.org>
 <51098977.4000603@FreeBSD.org>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature"; boundary="puRvKymbJtNrbugC"
Content-Disposition: inline
In-Reply-To: <51098977.4000603@FreeBSD.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home
Cc: freebsd-arch@FreeBSD.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 31 Jan 2013 09:18:58 -0000


--puRvKymbJtNrbugC
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Jan 30, 2013 at 10:58:31PM +0200, Andriy Gapon wrote:
> on 02/06/2012 14:30 Andriy Gapon said the following:
> > o  There is also vm.max_wired sysctl (with no equivalent tunable), which
> > specifies number of _pages_ that can be wired system wide (by both kern=
el and
> > userland).  But note that the limit applies only to userland requests, =
the
> > kernel is allowed to wire new pages even when the limit is exceeded.  B=
y default
> > the limit is set to 1/3 of available pages.
>=20
> I would like to propose to axe vm.max_wired limit.
> It is not good when too many pages are wired, but...
>=20
> This limit is quite arbitrary (why 1/3).
> It's no good for ZFS systems where e.g. 90% of memory can be normally wir=
ed by
> ZFS in kernel.
>=20
> So this limit should be either axed or perhaps replaced with some much hi=
gher
> limit like e.g. v_page_count - 2 * v_free_target or some such number "clo=
se" to
> v_page_count.
>=20

I dislike your proposal.

The limit is useful to prevent the system from entering live-lock.
ZFS-using machines should be tuned. Or finally the ZFS caches should
communicate the fact that the pages used are for caches and provide
easy way for the VM to request flush. This would be big project indeed.

E.g., could ZFS make an impression that zfs-cached pages are cached, to VM ?

--puRvKymbJtNrbugC
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)

iQIcBAEBAgAGBQJRCjb9AAoJEJDCuSvBvK1BcREP/A+ujR05ARhoSg+ogHAQmi1X
EZl/OngRKYd3SW/WeJV9eCFyPWDVIpnBm+nV4MBn8yVSUM2JCUb5DfPKQoAUUR/6
BV68ruyJrY05sgdbV4LpBHvvp5mT3D3W+eVzsKDad6QNIaCrrHXxrwtUR2G6GlHz
n2WXi9h3qQpcCzhicUUNeBs8cjyWp3I8Nz4n52s8d2A1k8ndmtgs6x3bz+Gw7QrI
S1xiwEnnKdCUQohKCegjBIyKNDBPWqNEPvPjULoidTBKwlo1uS/91gwSCyCYBN6i
KYO3pDfxCU20whMNeVnDdoJ/CN597zZRl7kZIDOPq05kcgOJzrkdC7KpGt0iJ4KK
o2q3srY+cPkBL/l5OzqlWhgit1Uc324GhRCPjskhz03S4NsI25I/YMVkzJ+KaxyH
BYqJj1ItZcQDJgNEbmI80NiYr0kaJzdltgPiSvlIHBKsLWnCBCYrHQV9Mi8/MsEG
uBcVFZWB2NU3RkNv16n7wxxtbsZWDzcW6mThTwHhgZ2QPqL7GnJRDlwcgLwZpznL
QOR1PcppPXUhfsLQxXHAeHvlNl2M3e4s93W0xMyhSN00I5eEEfPNIrW9x0Gss8II
C6Aq1D42IvjQH1NmvRwNzEz3CKqrqUlijoisIiyHsl5+Mt/YQGCX9PEQvELlpS9X
gRDu8rDZNGc3PiVRhcxM
=AqzY
-----END PGP SIGNATURE-----

--puRvKymbJtNrbugC--

From owner-freebsd-arch@FreeBSD.ORG  Thu Jan 31 11:24:31 2013
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id A9B1B5DD;
 Thu, 31 Jan 2013 11:24:31 +0000 (UTC) (envelope-from des@des.no)
Received: from smtp.des.no (smtp.des.no [194.63.250.102])
 by mx1.freebsd.org (Postfix) with ESMTP id 501D8D48;
 Thu, 31 Jan 2013 11:24:31 +0000 (UTC)
Received: from ds4.des.no (smtp.des.no [194.63.250.102])
 by smtp-int.des.no (Postfix) with ESMTP id C05BB65B1;
 Thu, 31 Jan 2013 11:24:23 +0000 (UTC)
Received: by ds4.des.no (Postfix, from userid 1001)
 id 7C1C3A995; Thu, 31 Jan 2013 12:24:23 +0100 (CET)
From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= <des@des.no>
To: Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: axe vm.max_wired
References: <861umkurt8.fsf@ds4.des.no>
 <CAJ-VmokY+pgcq999NHShbq-3rK3=oeWT2WY7NmTvVdXOHZJhdg@mail.gmail.com>
 <CAF6rxgmDW21aPJ5Mp6Tbk1z02ivw4UPhSaNEX+Wiu7O0v13skA@mail.gmail.com>
 <20120517055425.GA802@infradead.org> <4FC762DD.90101@FreeBSD.org>
 <4FC81D9C.2080801@FreeBSD.org> <4FC8E29F.2010806@shatow.net>
 <4FC95A10.7000806@freebsd.org> <4FC9F94B.8060708@FreeBSD.org>
 <51098977.4000603@FreeBSD.org> <20130131091853.GI2522@kib.kiev.ua>
Date: Thu, 31 Jan 2013 12:24:21 +0100
In-Reply-To: <20130131091853.GI2522@kib.kiev.ua> (Konstantin Belousov's
 message of "Thu, 31 Jan 2013 11:18:53 +0200")
Message-ID: <86boc5wq6y.fsf@ds4.des.no>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.4 (berkeley-unix)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Cc: Andriy Gapon <avg@FreeBSD.org>, freebsd-arch@FreeBSD.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 31 Jan 2013 11:24:31 -0000

Konstantin Belousov <kostikbel@gmail.com> writes:
> Andriy Gapon <avg@FreeBSD.org> writes:
> > I would like to propose to axe vm.max_wired limit.
> The limit is useful to prevent the system from entering live-lock.
> ZFS-using machines should be tuned.

ZFS shouldn't be allowed to wire arbitrary amounts of memory.  It is
nearly impossible to handle passwords and encryption keys securely on
ZFS systems, because there is no wired memory left for applications.

DES
--=20
Dag-Erling Sm=C3=B8rgrav - des@des.no

From owner-freebsd-arch@FreeBSD.ORG  Thu Jan 31 18:30:33 2013
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id E5DB1EA6;
 Thu, 31 Jan 2013 18:30:33 +0000 (UTC) (envelope-from alc@rice.edu)
Received: from mh3.mail.rice.edu (mh3.mail.rice.edu [128.42.199.10])
 by mx1.freebsd.org (Postfix) with ESMTP id AF7EE9B4;
 Thu, 31 Jan 2013 18:30:33 +0000 (UTC)
Received: from mh3.mail.rice.edu (localhost.localdomain [127.0.0.1])
 by mh3.mail.rice.edu (Postfix) with ESMTP id 592A4401A9;
 Thu, 31 Jan 2013 12:30:33 -0600 (CST)
Received: from mh3.mail.rice.edu (localhost.localdomain [127.0.0.1])
 by mh3.mail.rice.edu (Postfix) with ESMTP id 57F82401A8;
 Thu, 31 Jan 2013 12:30:33 -0600 (CST)
X-Virus-Scanned: by amavis-2.7.0 at mh3.mail.rice.edu, auth channel
Received: from mh3.mail.rice.edu ([127.0.0.1])
 by mh3.mail.rice.edu (mh3.mail.rice.edu [127.0.0.1]) (amavis, port 10026)
 with ESMTP id 2QwlumBZVafD; Thu, 31 Jan 2013 12:30:33 -0600 (CST)
Received: from adsl-216-63-78-18.dsl.hstntx.swbell.net
 (adsl-216-63-78-18.dsl.hstntx.swbell.net [216.63.78.18])
 (using TLSv1 with cipher RC4-MD5 (128/128 bits))
 (No client certificate requested) (Authenticated sender: alc)
 by mh3.mail.rice.edu (Postfix) with ESMTPSA id DF89F401A0;
 Thu, 31 Jan 2013 12:30:32 -0600 (CST)
Message-ID: <510AB848.3010806@rice.edu>
Date: Thu, 31 Jan 2013 12:30:32 -0600
From: Alan Cox <alc@rice.edu>
User-Agent: Mozilla/5.0 (X11; FreeBSD i386;
 rv:17.0) Gecko/20130127 Thunderbird/17.0.2
MIME-Version: 1.0
To: Andriy Gapon <avg@FreeBSD.org>
Subject: Re: kva size on amd64
References: <507E7E59.8060201@FreeBSD.org> <51098743.2050603@FreeBSD.org>
 <CAJUyCcOvHXauk76LnahQPGmdcHbkDOiR1_=4w+DW=sZ6i6EJ+A@mail.gmail.com>
 <510A2C09.6030709@FreeBSD.org>
In-Reply-To: <510A2C09.6030709@FreeBSD.org>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: alc@FreeBSD.org, Alan Cox <alan.l.cox@gmail.com>, freebsd-arch@FreeBSD.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 31 Jan 2013 18:30:34 -0000

On 01/31/2013 02:32, Andriy Gapon wrote:
> on 31/01/2013 10:10 Alan Cox said the following:
>> In short, it will waste a non-trivial amount of physical memory.  Unlike user
>> virtual address spaces, page table pages are preallocated for the kernel virtual
>> address space.  More precisely, they are preallocated for the reserved (or
>> defined) regions of the kernel map, i.e., every range of addresses that has a
>> corresponding vm_map_entry.  The kmem map is one such reserved region.  So, if
>> you always set your kmem map to its maximum possible size of ~300GB, then you
>> are preallocating about 600MB of physical memory for page table pages that will
>> never be used (except on machines with 300+ GB of DRAM).
>
> Alan,
>
> thank you very much for this information!
>
> Would it make sense then to do either of the following:
> - add some (non-trivial) code to auto-grow kmem map upon kva shortage
> - set default vm_kmem_size to min(2 * mem_size, vm_kmem_size_max)
> ?
>
> Perhaps something else?..

Try developing a different allocation strategy for the kmem_map. 
First-fit is clearly not working well for the ZFS ARC, because of
fragmentation.  For example, instead of further enlarging the kmem_map,
try splitting it into multiple submaps of the same total size,
kmem_map1, kmem_map2, etc.  Then, utilize these akin to the "old" and
"new" spaces of a copying garbage collector or storage segments in a
log-structured file system.  However, actual copying from an "old" space
to a "new" space may not be necessary.  By the time that the "new" space
from which you are currently allocating fills up or becomes sufficiently
fragmented that you can't satisfy an allocation, you've likely created
enough contiguous space in an "old" space.

I'll hypothesize that just a couple kmem_map submaps that are .625 of
physical memory size would suffice.  The bottom line is that the total
virtual address space should be less than 2x physical memory.

In fact, maybe the system starts off with just a single kmem_map, and
you only create additional kmem_maps on demand.  As someone who doesn't
use ZFS that would actually save me physical memory that is currently
being wasted on unnecessary preallocated page table pages for my
kmem_map.  This begins to sound like option (1) that you propose above.

This might also help to keep physical memory fragmentation in check.

> BTW, it seems that in OpenSolaris they do not limit kva size, but probably they
> allocate kernel page tables in some different way (on demand perhaps).
>


From owner-freebsd-arch@FreeBSD.ORG  Fri Feb  1 08:23:51 2013
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 97F0718B
 for <freebsd-arch@FreeBSD.org>; Fri,  1 Feb 2013 08:23:51 +0000 (UTC)
 (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id B64CD2A1
 for <freebsd-arch@FreeBSD.org>; Fri,  1 Feb 2013 08:23:50 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
 [212.40.38.100])
 by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id KAA16914;
 Fri, 01 Feb 2013 10:23:48 +0200 (EET) (envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
 by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
 id 1U1BuV-0000kk-SE; Fri, 01 Feb 2013 10:23:48 +0200
Message-ID: <510B7B92.4030804@FreeBSD.org>
Date: Fri, 01 Feb 2013 10:23:46 +0200
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:17.0) Gecko/20130121 Thunderbird/17.0.2
MIME-Version: 1.0
To: Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: axe vm.max_wired
References: <861umkurt8.fsf@ds4.des.no>
 <CAJ-VmokY+pgcq999NHShbq-3rK3=oeWT2WY7NmTvVdXOHZJhdg@mail.gmail.com>
 <CAF6rxgmDW21aPJ5Mp6Tbk1z02ivw4UPhSaNEX+Wiu7O0v13skA@mail.gmail.com>
 <20120517055425.GA802@infradead.org> <4FC762DD.90101@FreeBSD.org>
 <4FC81D9C.2080801@FreeBSD.org> <4FC8E29F.2010806@shatow.net>
 <4FC95A10.7000806@freebsd.org> <4FC9F94B.8060708@FreeBSD.org>
 <51098977.4000603@FreeBSD.org> <20130131091853.GI2522@kib.kiev.ua>
In-Reply-To: <20130131091853.GI2522@kib.kiev.ua>
X-Enigmail-Version: 1.4.6
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-arch@FreeBSD.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Feb 2013 08:23:51 -0000

on 31/01/2013 11:18 Konstantin Belousov said the following:
> On Wed, Jan 30, 2013 at 10:58:31PM +0200, Andriy Gapon wrote:
>> on 02/06/2012 14:30 Andriy Gapon said the following:
>>> o  There is also vm.max_wired sysctl (with no equivalent tunable), which
>>> specifies number of _pages_ that can be wired system wide (by both kernel and
>>> userland).  But note that the limit applies only to userland requests, the
>>> kernel is allowed to wire new pages even when the limit is exceeded.  By default
>>> the limit is set to 1/3 of available pages.
>>
>> I would like to propose to axe vm.max_wired limit.
>> It is not good when too many pages are wired, but...
>>
>> This limit is quite arbitrary (why 1/3).
>> It's no good for ZFS systems where e.g. 90% of memory can be normally wired by
>> ZFS in kernel.
>>
>> So this limit should be either axed or perhaps replaced with some much higher
>> limit like e.g. v_page_count - 2 * v_free_target or some such number "close" to
>> v_page_count.
>>
> 
> I dislike your proposal.
> 
> The limit is useful to prevent the system from entering live-lock.

Well, I definitely agree that we should prevent all of memory from becoming
wired.  And I myself don't like full axing of vm.max_wired :-)

But I do not fully agree with your logic here.  Completely prohibiting any page
wiring in userland would achieve the goal too, but that doesn't mean that that
would be useful.

> ZFS-using machines should be tuned.

I would like them to be auto-tuned.

> Or finally the ZFS caches should
> communicate the fact that the pages used are for caches and provide
> easy way for the VM to request flush. This would be big project indeed.
> 
> E.g., could ZFS make an impression that zfs-cached pages are cached, to VM ?

I would love to have ZFS ARC implemented differently.
But I do not expect that to happen soon.
Regarding your question - I do not have an answer.  Perhaps let's discuss how
that could be done (while preserving useful/advanced features of ARC)...

So, meanwhile, I object to your objection :-)
You didn't explain why vm.max_wired should be 1/3 of v_page_count by default.
You didn't explain how a situation where, say, 80% of pages are wired by kernel
is radically better than a situation where 80% of pages are wired by kernel and
1% are wired by userland.

So, I still think that vm.max_wired as it is used now is too arbitrary and too
indiscriminate to be useful.

There are other tools to limit page wiring by userland e.g. memlocked limit.

But, as I've said in the original email, I can agree with vm.max_wired
usefulness if it is set to something more reasonable by default.
IMO, it should not be a fixed percentage of available memory, it should be
derived from other VM thresholds related to paging.

-- 
Andriy Gapon

From owner-freebsd-arch@FreeBSD.ORG  Fri Feb  1 09:47:31 2013
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id DFFE7F38;
 Fri,  1 Feb 2013 09:47:31 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id F22387DD;
 Fri,  1 Feb 2013 09:47:30 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
 [212.40.38.100])
 by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id LAA17527;
 Fri, 01 Feb 2013 11:47:24 +0200 (EET) (envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
 by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
 id 1U1DDQ-0000pu-2K; Fri, 01 Feb 2013 11:47:24 +0200
Message-ID: <510B8F2B.5070609@FreeBSD.org>
Date: Fri, 01 Feb 2013 11:47:23 +0200
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:17.0) Gecko/20130121 Thunderbird/17.0.2
MIME-Version: 1.0
To: Alan Cox <alc@rice.edu>
Subject: Re: kva size on amd64
References: <507E7E59.8060201@FreeBSD.org> <51098743.2050603@FreeBSD.org>
 <CAJUyCcOvHXauk76LnahQPGmdcHbkDOiR1_=4w+DW=sZ6i6EJ+A@mail.gmail.com>
 <510A2C09.6030709@FreeBSD.org> <510AB848.3010806@rice.edu>
In-Reply-To: <510AB848.3010806@rice.edu>
X-Enigmail-Version: 1.4.6
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: alc@FreeBSD.org, Alan Cox <alan.l.cox@gmail.com>, freebsd-arch@FreeBSD.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Feb 2013 09:47:31 -0000

on 31/01/2013 20:30 Alan Cox said the following:
> Try developing a different allocation strategy for the kmem_map. 
> First-fit is clearly not working well for the ZFS ARC, because of
> fragmentation.  For example, instead of further enlarging the kmem_map,
> try splitting it into multiple submaps of the same total size,
> kmem_map1, kmem_map2, etc.  Then, utilize these akin to the "old" and
> "new" spaces of a copying garbage collector or storage segments in a
> log-structured file system.  However, actual copying from an "old" space
> to a "new" space may not be necessary.  By the time that the "new" space
> from which you are currently allocating fills up or becomes sufficiently
> fragmented that you can't satisfy an allocation, you've likely created
> enough contiguous space in an "old" space.
> 
> I'll hypothesize that just a couple kmem_map submaps that are .625 of
> physical memory size would suffice.  The bottom line is that the total
> virtual address space should be less than 2x physical memory.
> 
> In fact, maybe the system starts off with just a single kmem_map, and
> you only create additional kmem_maps on demand.  As someone who doesn't
> use ZFS that would actually save me physical memory that is currently
> being wasted on unnecessary preallocated page table pages for my
> kmem_map.  This begins to sound like option (1) that you propose above.
> 
> This might also help to keep physical memory fragmentation in check.

Alan,

very interesting suggestions, thank you!

Of course, this is quite a bit more work than just jacking up some limit :-)
So, it could be a while before any code materializes.

Actually, I have been obsessed quite for some time with an idea of confining ZFS
to its own submap.  But ZFS does its allocations through malloc(9) and uma(9)
(depending on configuration). It seemed like a bit of work to provide support
for per-zone or per-tag submaps in uma and malloc.
What is your opinion of this approach?

P.S.
BTW, do I understand correctly that the reservation of kernel page tables
happens through vm_map_insert -> pmap_growkernel ?

-- 
Andriy Gapon

From owner-freebsd-arch@FreeBSD.ORG  Fri Feb  1 09:57:41 2013
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 5BED439F;
 Fri,  1 Feb 2013 09:57:41 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 by mx1.freebsd.org (Postfix) with ESMTP id 8B9BC85C;
 Fri,  1 Feb 2013 09:57:40 +0000 (UTC)
Received: from tom.home (kostik@localhost [127.0.0.1])
 by kib.kiev.ua (8.14.6/8.14.6) with ESMTP id r119vZEa083737;
 Fri, 1 Feb 2013 11:57:35 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.7.4 kib.kiev.ua r119vZEa083737
Received: (from kostik@localhost)
 by tom.home (8.14.6/8.14.6/Submit) id r119vZsT083736;
 Fri, 1 Feb 2013 11:57:35 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Fri, 1 Feb 2013 11:57:35 +0200
From: Konstantin Belousov <kostikbel@gmail.com>
To: Andriy Gapon <avg@FreeBSD.org>
Subject: Re: kva size on amd64
Message-ID: <20130201095735.GM2522@kib.kiev.ua>
References: <507E7E59.8060201@FreeBSD.org> <51098743.2050603@FreeBSD.org>
 <CAJUyCcOvHXauk76LnahQPGmdcHbkDOiR1_=4w+DW=sZ6i6EJ+A@mail.gmail.com>
 <510A2C09.6030709@FreeBSD.org> <510AB848.3010806@rice.edu>
 <510B8F2B.5070609@FreeBSD.org>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature"; boundary="GAvX+JaMaI2IseJS"
Content-Disposition: inline
In-Reply-To: <510B8F2B.5070609@FreeBSD.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home
Cc: alc@FreeBSD.org, freebsd-arch@FreeBSD.org, Alan Cox <alan.l.cox@gmail.com>,
 Alan Cox <alc@rice.edu>
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Feb 2013 09:57:41 -0000


--GAvX+JaMaI2IseJS
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Feb 01, 2013 at 11:47:23AM +0200, Andriy Gapon wrote:
> on 31/01/2013 20:30 Alan Cox said the following:
> > Try developing a different allocation strategy for the kmem_map.=20
> > First-fit is clearly not working well for the ZFS ARC, because of
> > fragmentation.  For example, instead of further enlarging the kmem_map,
> > try splitting it into multiple submaps of the same total size,
> > kmem_map1, kmem_map2, etc.  Then, utilize these akin to the "old" and
> > "new" spaces of a copying garbage collector or storage segments in a
> > log-structured file system.  However, actual copying from an "old" space
> > to a "new" space may not be necessary.  By the time that the "new" space
> > from which you are currently allocating fills up or becomes sufficiently
> > fragmented that you can't satisfy an allocation, you've likely created
> > enough contiguous space in an "old" space.
> >=20
> > I'll hypothesize that just a couple kmem_map submaps that are .625 of
> > physical memory size would suffice.  The bottom line is that the total
> > virtual address space should be less than 2x physical memory.
> >=20
> > In fact, maybe the system starts off with just a single kmem_map, and
> > you only create additional kmem_maps on demand.  As someone who doesn't
> > use ZFS that would actually save me physical memory that is currently
> > being wasted on unnecessary preallocated page table pages for my
> > kmem_map.  This begins to sound like option (1) that you propose above.
> >=20
> > This might also help to keep physical memory fragmentation in check.
>=20
> Alan,
>=20
> very interesting suggestions, thank you!
>=20
> Of course, this is quite a bit more work than just jacking up some limit =
:-)
> So, it could be a while before any code materializes.
>=20
> Actually, I have been obsessed quite for some time with an idea of confin=
ing ZFS
> to its own submap.  But ZFS does its allocations through malloc(9) and um=
a(9)
> (depending on configuration). It seemed like a bit of work to provide sup=
port
> for per-zone or per-tag submaps in uma and malloc.
> What is your opinion of this approach?
Definitely not being Alan.

I think that the rework of the ZFS memory management should remove the
use of uma or kmem_alloc() at all. From what I heard in part from you,
there is no reason to keep the filesystem caches mapped full time.

I hope to commit shortly a facilities that would allow ZFS to easily
manage copying for i/o from the unmapped set of pages. The checksumming
you mentioned would require some more work, but this does not look
unsurmountable. Having ZFS use raw vm_page_t for caching would also
remove the pressure on KVA.

>=20
> P.S.
> BTW, do I understand correctly that the reservation of kernel page tables
> happens through vm_map_insert -> pmap_growkernel ?

Yes. E.g. kmem_suballoc->vm_map_find->vm_map_insert->pmap_growkernel.

--GAvX+JaMaI2IseJS
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)

iQIcBAEBAgAGBQJRC5GOAAoJEJDCuSvBvK1BSl4P/AwobgyBRxTlpRtXwizUDZmx
ucd851Y5z+h67xiTT3v7DMrHo9k7q4eW/S0aQwb1nbDdHEFUFm8Cesc5tTm/7cJ6
6x2SHbUd362Vwvfr0hfyhcnEB8w0YulapEqwaogzeixh1VFLxfFYTnFFykqFH/8q
Wc4LkgiNHyGDQxKHh00HgvsMU0XBZqkqMQrCr73ePR2/CMXNPEQkiJGZqMXlaFYY
5NSpDJqbkjiz8Y5bywOpxdp28Ywdkg9bGwdGzBPeVQZy5RePo8I9GOG3/lhZ0sm8
Cpoez3Pr/iXkLVGz26ZiuO0v4fQcduog/96WarjShU4rC6EthEy5XrJknUUg+wGs
OND+e709g+o25APt64LrRw9X1I5l9qeQKTUXRv6NR2V9R5v0pVZF8Dec+3N3ZK9S
OiLENXMQB404jSbcBGJyPFaQaxuo0MHAR4Rh6G2AiW+m10hzxL+A3O2WgpLl+JZl
eFUNRePxyoMOo2A0ZhkVJrsvS/iOKZnOzeJT1zjTe0VlWuBAvhSteeXveLHN5R2E
VedO5XPygw0v1kX/kr1ZAN5g8wISNkE8SikcM9M+nkp4eVHLNmvROGP8SVtBsdRA
QyJr4cl1YraxLY6B94VanmxQA6QQS5C9S5i4HUO/b1IfzQyEOSVqjX7RimX00C7i
cVdkSr2zo2riOg9NpPgo
=xt4H
-----END PGP SIGNATURE-----

--GAvX+JaMaI2IseJS--

From owner-freebsd-arch@FreeBSD.ORG  Fri Feb  1 10:52:54 2013
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 8D1B850F;
 Fri,  1 Feb 2013 10:52:54 +0000 (UTC) (envelope-from avg@FreeBSD.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
 by mx1.freebsd.org (Postfix) with ESMTP id 9CE90B6B;
 Fri,  1 Feb 2013 10:52:53 +0000 (UTC)
Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua
 [212.40.38.100])
 by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id MAA18148;
 Fri, 01 Feb 2013 12:52:44 +0200 (EET) (envelope-from avg@FreeBSD.org)
Received: from localhost ([127.0.0.1])
 by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
 id 1U1EEe-0000v9-4N; Fri, 01 Feb 2013 12:52:44 +0200
Message-ID: <510B9E7A.1070709@FreeBSD.org>
Date: Fri, 01 Feb 2013 12:52:42 +0200
From: Andriy Gapon <avg@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:17.0) Gecko/20130121 Thunderbird/17.0.2
MIME-Version: 1.0
To: Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: kva size on amd64
References: <507E7E59.8060201@FreeBSD.org> <51098743.2050603@FreeBSD.org>
 <CAJUyCcOvHXauk76LnahQPGmdcHbkDOiR1_=4w+DW=sZ6i6EJ+A@mail.gmail.com>
 <510A2C09.6030709@FreeBSD.org> <510AB848.3010806@rice.edu>
 <510B8F2B.5070609@FreeBSD.org> <20130201095735.GM2522@kib.kiev.ua>
In-Reply-To: <20130201095735.GM2522@kib.kiev.ua>
X-Enigmail-Version: 1.4.6
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: alc@FreeBSD.org, freebsd-arch@FreeBSD.org, Alan Cox <alan.l.cox@gmail.com>,
 Alan Cox <alc@rice.edu>
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Feb 2013 10:52:54 -0000

on 01/02/2013 11:57 Konstantin Belousov said the following:
> On Fri, Feb 01, 2013 at 11:47:23AM +0200, Andriy Gapon wrote:
> I think that the rework of the ZFS memory management should remove the
> use of uma or kmem_alloc() at all. From what I heard in part from you,
> there is no reason to keep the filesystem caches mapped full time.
> 
> I hope to commit shortly a facilities that would allow ZFS to easily
> manage copying for i/o from the unmapped set of pages. The checksumming
> you mentioned would require some more work, but this does not look
> unsurmountable. Having ZFS use raw vm_page_t for caching would also
> remove the pressure on KVA.

Yes, this would be perfect.

I think that perhaps we also need some helper API to manage groups of pages.
E.g. right now ZFS can malloc or uma_zalloc a 32KB buffer and it would have a
single handle (a pointer to the mapped pages).  This is convenient.  So it would
be useful to have some representation for e.g. N non-contiguous unmapped
physical pages that logically represent M KB of some contiguous data.
An opposite issue is e.g packing 4 (or is it three) unrelated 512-byte blocks
into a single page as is possible with uma.

So perhaps some "unmapped uma"?

Another, purely ZFS issue is that ZFS code freely accesses buffers with
metadata.  Adding mapping+unmapping code around such all accesses could be
cumbersome.

All in all, this is not a quick project, IMO.

>> P.S.
>> BTW, do I understand correctly that the reservation of kernel page tables
>> happens through vm_map_insert -> pmap_growkernel ?
> 
> Yes. E.g. kmem_suballoc->vm_map_find->vm_map_insert->pmap_growkernel.
> 

Thank you!

-- 
Andriy Gapon

From owner-freebsd-arch@FreeBSD.ORG  Sat Feb  2 16:25:18 2013
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id CFFE53C4;
 Sat,  2 Feb 2013 16:25:18 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 by mx1.freebsd.org (Postfix) with ESMTP id 359299E8;
 Sat,  2 Feb 2013 16:25:18 +0000 (UTC)
Received: from tom.home (kostik@localhost [127.0.0.1])
 by kib.kiev.ua (8.14.6/8.14.6) with ESMTP id r12GP941087610;
 Sat, 2 Feb 2013 18:25:09 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.7.4 kib.kiev.ua r12GP941087610
Received: (from kostik@localhost)
 by tom.home (8.14.6/8.14.6/Submit) id r12GP93C087609;
 Sat, 2 Feb 2013 18:25:09 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Sat, 2 Feb 2013 18:25:09 +0200
From: Konstantin Belousov <kostikbel@gmail.com>
To: Andriy Gapon <avg@FreeBSD.org>
Subject: Re: axe vm.max_wired
Message-ID: <20130202162509.GZ2522@kib.kiev.ua>
References: <CAF6rxgmDW21aPJ5Mp6Tbk1z02ivw4UPhSaNEX+Wiu7O0v13skA@mail.gmail.com>
 <20120517055425.GA802@infradead.org> <4FC762DD.90101@FreeBSD.org>
 <4FC81D9C.2080801@FreeBSD.org> <4FC8E29F.2010806@shatow.net>
 <4FC95A10.7000806@freebsd.org> <4FC9F94B.8060708@FreeBSD.org>
 <51098977.4000603@FreeBSD.org> <20130131091853.GI2522@kib.kiev.ua>
 <510B7B92.4030804@FreeBSD.org>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature"; boundary="1HZcmCvxmsp4ai32"
Content-Disposition: inline
In-Reply-To: <510B7B92.4030804@FreeBSD.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home
Cc: freebsd-arch@FreeBSD.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 02 Feb 2013 16:25:18 -0000


--1HZcmCvxmsp4ai32
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Feb 01, 2013 at 10:23:46AM +0200, Andriy Gapon wrote:
> on 31/01/2013 11:18 Konstantin Belousov said the following:
> > On Wed, Jan 30, 2013 at 10:58:31PM +0200, Andriy Gapon wrote:
> >> on 02/06/2012 14:30 Andriy Gapon said the following:
> >>> o  There is also vm.max_wired sysctl (with no equivalent tunable), wh=
ich
> >>> specifies number of _pages_ that can be wired system wide (by both ke=
rnel and
> >>> userland).  But note that the limit applies only to userland requests=
, the
> >>> kernel is allowed to wire new pages even when the limit is exceeded. =
 By default
> >>> the limit is set to 1/3 of available pages.
> >>
> >> I would like to propose to axe vm.max_wired limit.
> >> It is not good when too many pages are wired, but...
> >>
> >> This limit is quite arbitrary (why 1/3).
> >> It's no good for ZFS systems where e.g. 90% of memory can be normally =
wired by
> >> ZFS in kernel.
> >>
> >> So this limit should be either axed or perhaps replaced with some much=
 higher
> >> limit like e.g. v_page_count - 2 * v_free_target or some such number "=
close" to
> >> v_page_count.
> >>
> >=20
> > I dislike your proposal.
> >=20
> > The limit is useful to prevent the system from entering live-lock.
>=20
> Well, I definitely agree that we should prevent all of memory from becomi=
ng
> wired.  And I myself don't like full axing of vm.max_wired :-)
>=20
> But I do not fully agree with your logic here.  Completely prohibiting an=
y page
> wiring in userland would achieve the goal too, but that doesn't mean that=
 that
> would be useful.

>=20
> > ZFS-using machines should be tuned.
>=20
> I would like them to be auto-tuned.
>=20
> > Or finally the ZFS caches should
> > communicate the fact that the pages used are for caches and provide
> > easy way for the VM to request flush. This would be big project indeed.
> >=20
> > E.g., could ZFS make an impression that zfs-cached pages are cached, to=
 VM ?
>=20
> I would love to have ZFS ARC implemented differently.
ZFS integration with the VM, is, to say it mildly, not good.

The fact that ZFS cache (ARC ?) presents the cached pages as wired,
makes the VM almost useless for a ZFS machine. Your displeasure and
tweaks should be directed to ZFS integration, and not to unbalancing
current tuning which is not that bad for ZFS-less boxes.

> But I do not expect that to happen soon.
> Regarding your question - I do not have an answer.  Perhaps let's discuss=
 how
> that could be done (while preserving useful/advanced features of ARC)...
>=20
> So, meanwhile, I object to your objection :-)
> You didn't explain why vm.max_wired should be 1/3 of v_page_count by defa=
ult.
> You didn't explain how a situation where, say, 80% of pages are wired by =
kernel
> is radically better than a situation where 80% of pages are wired by kern=
el and
> 1% are wired by userland.
>=20
> So, I still think that vm.max_wired as it is used now is too arbitrary an=
d too
> indiscriminate to be useful.
It is sized well to the default size of the buffer map, which takes 10%
of the physical RAM of the machine. Since buffers wiring the pages, be
it VMIO or malloc buffer, this leaves 20% for other things, like mbufs,
page tables and user wires.

>=20
> There are other tools to limit page wiring by userland e.g. memlocked lim=
it.
The memlock limit is per-process. It is completely useless as the safety
measure.

>=20
> But, as I've said in the original email, I can agree with vm.max_wired
> usefulness if it is set to something more reasonable by default.
> IMO, it should not be a fixed percentage of available memory, it should be
> derived from other VM thresholds related to paging.

Might be. Please provide a suggestion or better, a change.

--1HZcmCvxmsp4ai32
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)

iQIcBAEBAgAGBQJRDT3kAAoJEJDCuSvBvK1BE8oP/RgdrDpRyGLUcSwbl3Htn4+E
3rR++k82lp2PA0Dn6XkNOLKMK6jPg0ytuu46gkl2t6jCSwuA0cvUOgK5p56Pm763
XYQ+ptV9G8wg4QaZc4yFzhEDEbqEPeO253qmpNhh59T680Ylxho9ZDRcvVQEyHtt
/J9VpjfwZV4wUdwbR8dPz0zRt+OaOJuie1e5HlAuhBfele5PoMARkcUFav0i5RZu
YfYfUkj9VHL9eLCticdvAezXuLj268fmGRK4pjU2u9Ke1qxLngoe0TRVRV5/2Zgk
7e6f2h5xmbQvdxyAGsLivxtVWqKUgFmu2y8Cl4sUn2i2Jvoqcp5VPtTn1VnfhqQm
d7whq6GhQ/fASnHjPMglsn+1YnUxMvXl/v0OfJED4AXuRLxDiZAmSRyiLUhSgXfB
YiDWu8A040s3yk8BVJWttows7wzVTGFbi14eXQWbNnQL/3XQpohpQmO6AoVfhxZm
3hMTeDZjn5VwqzkYkp449YPATm0xL6dtd5rMxxX1uxAUOPTfTtA3E/GWd2PlxywA
k7DICYTIWsCTaO+3DbmCsb2tKyRopQLD4LEzwa29zOd1mMPHhknCBEevBPaTm7el
J7pvo0PyDT4Q7TcN7dWnfhYfV6UhxeHMzvad2U3MNYTlZUlmTrFwroCC4hM3kbuH
wvshxawnnOwS+yyct6h4
=Wijp
-----END PGP SIGNATURE-----

--1HZcmCvxmsp4ai32--

From owner-freebsd-arch@FreeBSD.ORG  Sat Feb  2 16:33:27 2013
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 1C9F34C5;
 Sat,  2 Feb 2013 16:33:27 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 by mx1.freebsd.org (Postfix) with ESMTP id 7D5F6A29;
 Sat,  2 Feb 2013 16:33:26 +0000 (UTC)
Received: from tom.home (kostik@localhost [127.0.0.1])
 by kib.kiev.ua (8.14.6/8.14.6) with ESMTP id r12GXMsX088485;
 Sat, 2 Feb 2013 18:33:22 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.7.4 kib.kiev.ua r12GXMsX088485
Received: (from kostik@localhost)
 by tom.home (8.14.6/8.14.6/Submit) id r12GXMVP088484;
 Sat, 2 Feb 2013 18:33:22 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Sat, 2 Feb 2013 18:33:22 +0200
From: Konstantin Belousov <kostikbel@gmail.com>
To: current@freebsd.org, arch@freebsd.org
Subject: Physbio changes final call for tests and reviews
Message-ID: <20130202163322.GA2522@kib.kiev.ua>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature"; boundary="xIk0xHvQc0Ku+QuL"
Content-Disposition: inline
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home
Cc: powerpc@freebsd.org, mips@freebsd.org, jeff@freebsd.org, ia64@freebsd.org,
 sparc64@freebsd.org, arm@freebsd.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 02 Feb 2013 16:33:27 -0000


--xIk0xHvQc0Ku+QuL
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

Hi,
I finished the last (insignificant) missed bits in the Jeff' physbio
work. Now I am asking for the last round of testing and review, esp. for
the !x86 architectures. Another testing focus are the SCSI HBAs and RAID
controllers which drivers are changed by the patchset. Please do test
this before the patchset is committed into HEAD !

The plan is to commit the patch somewhere in two weeks from this moment.
The patch is required for the finalizing of the unmapped I/O work for UFS
I did in parallel, which I hope to finish shortly after the commit.

Patch is available at http://people.freebsd.org/~kib/misc/physbio.5.diff

Thank you in advance.

--xIk0xHvQc0Ku+QuL
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)

iQIcBAEBAgAGBQJRDT/SAAoJEJDCuSvBvK1BmGoP/314FHxSTtWQ22v30ZkLUUHw
p/5C3qRV+xjpEWfE80WQtMobuh9XMxaHGLJxz9UvVJQIqx5WUe0bINN4PfwzVbSm
r+Zw+LtyVrBdVgMzuq48h4IEFHkcxN1O6VznBKPBBDYcIZTTZ7y/bdljiB7q0HRP
G7SHsOfpIN8CSZNXPodVMIeagDPcbVDb3yVtW5zIsHeMyxCtjWIRCczVCwr0vpXQ
kfOGqAquLrXlv95e04LB08REJnrymkoPpMmA6mHdxY3a8ggMXcd4wFVP8YjX23iR
rEY8idLlW4rLfMwZuCcyR5w4UIcsx2NOhcyouFVRDYs5HxLDoDBUx+GYFu5fOlaL
NShrb/pc6AQZHucLjDfLxn+479Z7zgfxFDDiDdnGjUdmcYHZds1v+6WuwwIi+HwD
X6Edqfd37h+itGBt2h8LNgNA4urbWW7Xq7amsojjmOBgqdJ2Pe6ML+L0txwtY/cN
ICcBuDlJj3qXf81JZqr2zBa7+1DOYB+EFMTtOBwIUQKixAh1PL/+EbarDNLTOKld
tiRL9wjkiGKGXgX5fBj06949XMP2Nsi4xyQwc9hhvujsY4zNmBJvvj/ui6w7tRGI
P68tmOBeVPwVVHzfFvOm4MgHZpS0gzL9cr04p+knB6IBzjkfBEmUjGXC/85MGmXG
TY4uqxb8dCMidsn6TG3D
=lu5h
-----END PGP SIGNATURE-----

--xIk0xHvQc0Ku+QuL--

From owner-freebsd-arch@FreeBSD.ORG  Sat Feb  2 18:34:40 2013
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 0B0E4DBB;
 Sat,  2 Feb 2013 18:34:40 +0000 (UTC) (envelope-from alc@rice.edu)
Received: from mh3.mail.rice.edu (mh3.mail.rice.edu [128.42.199.10])
 by mx1.freebsd.org (Postfix) with ESMTP id C7C1AF7E;
 Sat,  2 Feb 2013 18:34:39 +0000 (UTC)
Received: from mh3.mail.rice.edu (localhost.localdomain [127.0.0.1])
 by mh3.mail.rice.edu (Postfix) with ESMTP id 9628740199;
 Sat,  2 Feb 2013 12:34:33 -0600 (CST)
Received: from mh3.mail.rice.edu (localhost.localdomain [127.0.0.1])
 by mh3.mail.rice.edu (Postfix) with ESMTP id 9471D40183;
 Sat,  2 Feb 2013 12:34:33 -0600 (CST)
X-Virus-Scanned: by amavis-2.7.0 at mh3.mail.rice.edu, auth channel
Received: from mh3.mail.rice.edu ([127.0.0.1])
 by mh3.mail.rice.edu (mh3.mail.rice.edu [127.0.0.1]) (amavis, port 10026)
 with ESMTP id NRwx4diKBFgP; Sat,  2 Feb 2013 12:34:33 -0600 (CST)
Received: from adsl-216-63-78-18.dsl.hstntx.swbell.net
 (adsl-216-63-78-18.dsl.hstntx.swbell.net [216.63.78.18])
 (using TLSv1 with cipher RC4-MD5 (128/128 bits))
 (No client certificate requested) (Authenticated sender: alc)
 by mh3.mail.rice.edu (Postfix) with ESMTPSA id DB61640182;
 Sat,  2 Feb 2013 12:34:32 -0600 (CST)
Message-ID: <510D5C37.6000507@rice.edu>
Date: Sat, 02 Feb 2013 12:34:31 -0600
From: Alan Cox <alc@rice.edu>
User-Agent: Mozilla/5.0 (X11; FreeBSD i386;
 rv:17.0) Gecko/20130127 Thunderbird/17.0.2
MIME-Version: 1.0
To: Andriy Gapon <avg@FreeBSD.org>
Subject: Re: kva size on amd64
References: <507E7E59.8060201@FreeBSD.org> <51098743.2050603@FreeBSD.org>
 <CAJUyCcOvHXauk76LnahQPGmdcHbkDOiR1_=4w+DW=sZ6i6EJ+A@mail.gmail.com>
 <510A2C09.6030709@FreeBSD.org> <510AB848.3010806@rice.edu>
 <510B8F2B.5070609@FreeBSD.org>
In-Reply-To: <510B8F2B.5070609@FreeBSD.org>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: alc@FreeBSD.org, Alan Cox <alan.l.cox@gmail.com>, freebsd-arch@FreeBSD.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 02 Feb 2013 18:34:40 -0000

On 02/01/2013 03:47, Andriy Gapon wrote:
> on 31/01/2013 20:30 Alan Cox said the following:
>> Try developing a different allocation strategy for the kmem_map. 
>> First-fit is clearly not working well for the ZFS ARC, because of
>> fragmentation.  For example, instead of further enlarging the kmem_map,
>> try splitting it into multiple submaps of the same total size,
>> kmem_map1, kmem_map2, etc.  Then, utilize these akin to the "old" and
>> "new" spaces of a copying garbage collector or storage segments in a
>> log-structured file system.  However, actual copying from an "old" space
>> to a "new" space may not be necessary.  By the time that the "new" space
>> from which you are currently allocating fills up or becomes sufficiently
>> fragmented that you can't satisfy an allocation, you've likely created
>> enough contiguous space in an "old" space.
>>
>> I'll hypothesize that just a couple kmem_map submaps that are .625 of
>> physical memory size would suffice.  The bottom line is that the total
>> virtual address space should be less than 2x physical memory.
>>
>> In fact, maybe the system starts off with just a single kmem_map, and
>> you only create additional kmem_maps on demand.  As someone who doesn't
>> use ZFS that would actually save me physical memory that is currently
>> being wasted on unnecessary preallocated page table pages for my
>> kmem_map.  This begins to sound like option (1) that you propose above.
>>
>> This might also help to keep physical memory fragmentation in check.
> Alan,
>
> very interesting suggestions, thank you!
>
> Of course, this is quite a bit more work than just jacking up some limit :-)
> So, it could be a while before any code materializes.
>
> Actually, I have been obsessed quite for some time with an idea of confining ZFS
> to its own submap.  But ZFS does its allocations through malloc(9) and uma(9)
> (depending on configuration). It seemed like a bit of work to provide support
> for per-zone or per-tag submaps in uma and malloc.
> What is your opinion of this approach?

I'm skeptical that it would accomplish anything.  Specifically, I don't
think that it would have any impact on the fragmentation problem that we
have with ZFS.  On amd64, with its direct map, all small allocations are
implemented by uma_small_alloc() and accessed through the direct map,
rather than coming from the kmem map.  Outside of ZFS, large, multipage
allocations from the kmem map aren't that common.  So, for all practical
purposes, ZFS has the kmem map to itself.

While I'm here, I'll offer some other food for thought.  In HEAD, we
have a new-ish function, vm_page_alloc_contig(), that can allocate
contiguous, unmapped physical pages either to an arbitrary vm object or
VM_ALLOC_NOOBJ, just like vm_page_alloc().  Moreover, just like
vm_page_alloc(), it honors the VM_ALLOC_{NORMAL,SYSTEM,INTERRUPT}
thresholds and wakes the page daemon when appropriate.

Using this function, you could rewrite the multipage allocation code to
first attempt allocation through vm_page_alloc_contig() and then fall
back to the kmem map only if vm_page_alloc_contig() fails.

> P.S.
> BTW, do I understand correctly that the reservation of kernel page tables
> happens through vm_map_insert -> pmap_growkernel ?
>

I believe kib@ already answered this, but, yes, that is correct.


From owner-freebsd-arch@FreeBSD.ORG  Sat Feb  2 21:47:17 2013
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 945863D3;
 Sat,  2 Feb 2013 21:47:17 +0000 (UTC)
 (envelope-from marius@alchemy.franken.de)
Received: from alchemy.franken.de (alchemy.franken.de [194.94.249.214])
 by mx1.freebsd.org (Postfix) with ESMTP id 2E8377A7;
 Sat,  2 Feb 2013 21:47:16 +0000 (UTC)
Received: from alchemy.franken.de (localhost [127.0.0.1])
 by alchemy.franken.de (8.14.5/8.14.5/ALCHEMY.FRANKEN.DE) with ESMTP id
 r12Ll9JK099456; Sat, 2 Feb 2013 22:47:09 +0100 (CET)
 (envelope-from marius@alchemy.franken.de)
Received: (from marius@localhost)
 by alchemy.franken.de (8.14.5/8.14.5/Submit) id r12Ll9dm099455;
 Sat, 2 Feb 2013 22:47:09 +0100 (CET) (envelope-from marius)
Date: Sat, 2 Feb 2013 22:47:09 +0100
From: Marius Strobl <marius@alchemy.franken.de>
To: Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: Physbio changes final call for tests and reviews
Message-ID: <20130202214709.GA99418@alchemy.franken.de>
References: <20130202163322.GA2522@kib.kiev.ua>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20130202163322.GA2522@kib.kiev.ua>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: powerpc@freebsd.org, mips@freebsd.org, current@freebsd.org,
 jeff@freebsd.org, ia64@freebsd.org, arch@freebsd.org, sparc64@freebsd.org,
 arm@freebsd.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 02 Feb 2013 21:47:17 -0000

On Sat, Feb 02, 2013 at 06:33:22PM +0200, Konstantin Belousov wrote:
> Hi,
> I finished the last (insignificant) missed bits in the Jeff' physbio
> work. Now I am asking for the last round of testing and review, esp. for
> the !x86 architectures. Another testing focus are the SCSI HBAs and RAID
> controllers which drivers are changed by the patchset. Please do test
> this before the patchset is committed into HEAD !
> 
> The plan is to commit the patch somewhere in two weeks from this moment.
> The patch is required for the finalizing of the unmapped I/O work for UFS
> I did in parallel, which I hope to finish shortly after the commit.
> 
> Patch is available at http://people.freebsd.org/~kib/misc/physbio.5.diff
> 

First tests on sparc64 with ata(4), mpt(4) and sym(4) look good (to
be sure I still need to test with a machine using a streaming buffer
in addition to the IOMMU, though).
However, by accident I noticed that your patch (i.e. stock head is
fine) somehow breaks smartd of smartmontools with ata(4):
root@b1k2:/root # smartd
ata3: timeout waiting for write DRQ
The machine just hangs at this point (it's also strange that the above
message is from the PIO rather than from the DMA path).

One note: mjacob@ probably will be annoyed if you don't wrap the
changes to isp(4) in __FreeBSD_version so the same source still
compiles on older ones.

Marius