From owner-freebsd-arch@FreeBSD.ORG  Mon Jul 26 16:07:24 2010
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 11263106566B
	for <freebsd-arch@freebsd.org>; Mon, 26 Jul 2010 16:07:24 +0000 (UTC)
	(envelope-from avg@freebsd.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 4BB308FC18
	for <freebsd-arch@freebsd.org>; Mon, 26 Jul 2010 16:07:22 +0000 (UTC)
Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua
	[212.40.38.101])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id TAA26491
	for <freebsd-arch@FreeBSD.org>; Mon, 26 Jul 2010 19:07:21 +0300 (EEST)
	(envelope-from avg@freebsd.org)
Message-ID: <4C4DB2B8.9080404@freebsd.org>
Date: Mon, 26 Jul 2010 19:07:20 +0300
From: Andriy Gapon <avg@freebsd.org>
User-Agent: Thunderbird 2.0.0.24 (X11/20100517)
MIME-Version: 1.0
To: freebsd-arch@freebsd.org
X-Enigmail-Version: 0.95.7
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Subject: amd64: change VM_KMEM_SIZE_SCALE to 1?
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Jul 2010 16:07:24 -0000


Anyone knows any reason why VM_KMEM_SIZE_SCALE on amd64 should not be set to 1?
I mean things potentially breaking, or some unpleasant surprise for an
administrator/user...

-- 
Andriy Gapon

From owner-freebsd-arch@FreeBSD.ORG  Mon Jul 26 17:05:07 2010
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 461D61065672
	for <freebsd-arch@freebsd.org>; Mon, 26 Jul 2010 17:05:07 +0000 (UTC)
	(envelope-from mdf356@gmail.com)
Received: from mail-px0-f182.google.com (mail-px0-f182.google.com
	[209.85.212.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 16E9D8FC12
	for <freebsd-arch@freebsd.org>; Mon, 26 Jul 2010 17:05:07 +0000 (UTC)
Received: by pxi8 with SMTP id 8so173798pxi.13
	for <multiple recipients>; Mon, 26 Jul 2010 10:05:06 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:received:in-reply-to
	:references:date:message-id:subject:from:to:cc:content-type;
	bh=EqN/H7CaSAZ1SzK6adoTVXNoG0x2DfVCKIoz6sAPAvQ=;
	b=CMFkHLTt/pPXUDPyfS5cunKeBzbaGvP9iLz2KCNWveO7v/SP7R1RAaEJn/0ufzgIsV
	Ps/t0BdpTpYcGDeLKIYYFHwOxM6bMW+HBkL6E5WyUf4jATg+Qur28gTNlHyS+XOt0HcY
	SJc7iXMJtXa8jUXxnJX4aQb5xcbwfeeSsA5t0=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	b=FhwqqqKnYM5tSPaV4IteFi45jtluziDyOwkFiAdTFRggPaq4pCIKE7oY6r/TK/K/Ca
	xZNe34hfPyQXO0B895WksIfIAMD7YYqcC79+FN/hBhILs6bKooYKMrDyRzL73gqlrUr1
	T+Xywd7yPMynxoIMJAmmGhCG+/ffibhRUK2gE=
MIME-Version: 1.0
Received: by 10.114.103.9 with SMTP id a9mr11787324wac.174.1280163898288; Mon, 
	26 Jul 2010 10:04:58 -0700 (PDT)
Received: by 10.42.6.85 with HTTP; Mon, 26 Jul 2010 10:04:58 -0700 (PDT)
In-Reply-To: <4C4DB2B8.9080404@freebsd.org>
References: <4C4DB2B8.9080404@freebsd.org>
Date: Mon, 26 Jul 2010 10:04:58 -0700
Message-ID: <AANLkTikY+nPTgBtDWcphNkOrW-Aif5TRSCuCn8BsK3p7@mail.gmail.com>
From: Matthew Fleming <mdf356@gmail.com>
To: Andriy Gapon <avg@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
Cc: freebsd-arch@freebsd.org
Subject: Re: amd64: change VM_KMEM_SIZE_SCALE to 1?
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Jul 2010 17:05:07 -0000

On Mon, Jul 26, 2010 at 9:07 AM, Andriy Gapon <avg@freebsd.org> wrote:
>
> Anyone knows any reason why VM_KMEM_SIZE_SCALE on amd64 should not be set to 1?
> I mean things potentially breaking, or some unpleasant surprise for an
> administrator/user...

As I understand it, it's merely a resource usage issue.  amd64 needs
page table entries for the expected virtual address space, so allowing
more than e.g. 1/3 of physical memory means needing more PTEs.  But
the memory overhead isn't all that large IIRC: each 4k physical memory
devoted to PTEs maps 512 4k virtual addresses, or 2MB, so e.g. it
takes about 4MB reserved as PTE pages to map 2GB of kernel virtual
address space.

Having cut my OS teeth on AIX/PowerPC where virutal address space is
free and has no relation to the size of the hardware page table, the
FreeBSD architecture limiting the size of the kernel virtual space
seemed weird to me.  However, since FreeBSD also does not page kernel
data to disk, there's a good reason to limit the size of the kernel's
virtual space, since that also limits the kernel's physical space.

In other words, setting it to 1 could lead to the system being out of
memory but not trying to fail kernel malloc requests.  I'm not
entirely sure this is a new problem since one could also chew through
physical memory with sub-page uma allocations as well on amd64.

Corrections to the above gratefully accepted.  This is just my current
understanding of it.

Thanks,
matthew

From owner-freebsd-arch@FreeBSD.ORG  Mon Jul 26 18:19:28 2010
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9D6F61065676
	for <freebsd-arch@freebsd.org>; Mon, 26 Jul 2010 18:19:28 +0000 (UTC)
	(envelope-from avg@freebsd.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id E1DB98FC1A
	for <freebsd-arch@freebsd.org>; Mon, 26 Jul 2010 18:19:27 +0000 (UTC)
Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua
	[212.40.38.100])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id VAA28708;
	Mon, 26 Jul 2010 21:19:25 +0300 (EEST)
	(envelope-from avg@freebsd.org)
Received: from localhost.topspin.kiev.ua ([127.0.0.1])
	by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
	id 1OdSGr-000H3b-3K; Mon, 26 Jul 2010 21:19:25 +0300
Message-ID: <4C4DD1AA.3050906@freebsd.org>
Date: Mon, 26 Jul 2010 21:19:22 +0300
From: Andriy Gapon <avg@freebsd.org>
User-Agent: Thunderbird 2.0.0.24 (X11/20100603)
MIME-Version: 1.0
To: Matthew Fleming <mdf356@gmail.com>, freebsd-arch@freebsd.org
References: <4C4DB2B8.9080404@freebsd.org>
	<AANLkTikY+nPTgBtDWcphNkOrW-Aif5TRSCuCn8BsK3p7@mail.gmail.com>
In-Reply-To: <AANLkTikY+nPTgBtDWcphNkOrW-Aif5TRSCuCn8BsK3p7@mail.gmail.com>
X-Enigmail-Version: 0.96.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: 
Subject: Re: amd64: change VM_KMEM_SIZE_SCALE to 1?
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Jul 2010 18:19:28 -0000

on 26/07/2010 20:04 Matthew Fleming said the following:
> On Mon, Jul 26, 2010 at 9:07 AM, Andriy Gapon <avg@freebsd.org> wrote:
>> Anyone knows any reason why VM_KMEM_SIZE_SCALE on amd64 should not be set to 1?
>> I mean things potentially breaking, or some unpleasant surprise for an
>> administrator/user...
> 
> As I understand it, it's merely a resource usage issue.  amd64 needs
> page table entries for the expected virtual address space, so allowing
> more than e.g. 1/3 of physical memory means needing more PTEs.  But
> the memory overhead isn't all that large IIRC: each 4k physical memory
> devoted to PTEs maps 512 4k virtual addresses, or 2MB, so e.g. it
> takes about 4MB reserved as PTE pages to map 2GB of kernel virtual
> address space.

My understanding is that paging entries are only allocated when actual
(physical) memory allocation is done.  But I am not sure.

> Having cut my OS teeth on AIX/PowerPC where virutal address space is
> free and has no relation to the size of the hardware page table, the
> FreeBSD architecture limiting the size of the kernel virtual space
> seemed weird to me.  However, since FreeBSD also does not page kernel
> data to disk, there's a good reason to limit the size of the kernel's
> virtual space, since that also limits the kernel's physical space.
> 
> In other words, setting it to 1 could lead to the system being out of
> memory but not trying to fail kernel malloc requests.  I'm not
> entirely sure this is a new problem since one could also chew through
> physical memory with sub-page uma allocations as well on amd64.

Well, personally I would prefer kernel eating a lot of memory over getting
"kmem_map too small" panic.  Unexpectedly large memory usage by kernel can be
detected and diagnosed, and then proper limits and (auto-)tuning could be put in
place.  Panic at some random allocation is not that helpful.
Besides, presently there are more and more workloads that require a lot of
kernel memory - e.g. ZFS is gaining popularity.

Hence, the question/suggestion.

Of course, the things can be tuned by hand, but I think that
VM_KMEM_SIZE_SCALE=1 would be a more reasonable default than current value.

-- 
Andriy Gapon

From owner-freebsd-arch@FreeBSD.ORG  Mon Jul 26 19:18:50 2010
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A221810656B0;
	Mon, 26 Jul 2010 19:18:50 +0000 (UTC)
	(envelope-from alan.l.cox@gmail.com)
Received: from mail-pv0-f182.google.com (mail-pv0-f182.google.com
	[74.125.83.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 6C1888FC12;
	Mon, 26 Jul 2010 19:18:50 +0000 (UTC)
Received: by pvh1 with SMTP id 1so247366pvh.13
	for <multiple recipients>; Mon, 26 Jul 2010 12:18:50 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:received:reply-to
	:in-reply-to:references:date:message-id:subject:from:to:cc
	:content-type; bh=C5M8sh625HpmpNqugV+mGGDAeJhwtqJNCxz4LQrvlPk=;
	b=mPzahy4rIGS5KDKELPcTU38OC4RXiw1+4yBy1ru3zUcR5K4w4Cxyy7QFa+nsWisvv4
	IEM8VVm4uqDq5DC7rDUcwTcr6NiF1Ws5XPxZgNqlRS1tr7ZlZWtvCS8a9xbn4OY6aI5s
	6P/y98qRbUN3WQRfUHvzkijbI13ZYKLvvEyGs=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:reply-to:in-reply-to:references:date:message-id
	:subject:from:to:cc:content-type;
	b=VW1cDfsL6cetH9oQsMxcmKny43CkK6cqDzcDya397OFshmZY/EGOFreQiVFtT97BTm
	6/UjgyUqeWgt4smWrlKH6dT8NKHP4NlhGMDYn8jQqaVLWlvtq6F09DWqb3gmu3xPBr9g
	cAwGtU3xoH5HJBhp0WTXIAWuoMJQD+hDHH9Xk=
MIME-Version: 1.0
Received: by 10.142.223.12 with SMTP id v12mr9388422wfg.76.1280170129445; Mon, 
	26 Jul 2010 11:48:49 -0700 (PDT)
Received: by 10.229.239.5 with HTTP; Mon, 26 Jul 2010 11:48:48 -0700 (PDT)
In-Reply-To: <AANLkTikY+nPTgBtDWcphNkOrW-Aif5TRSCuCn8BsK3p7@mail.gmail.com>
References: <4C4DB2B8.9080404@freebsd.org>
	<AANLkTikY+nPTgBtDWcphNkOrW-Aif5TRSCuCn8BsK3p7@mail.gmail.com>
Date: Mon, 26 Jul 2010 13:48:48 -0500
Message-ID: <AANLkTikSJKVqhn9CWfYDoniB=tGu3C9giekyr6orOO9Y@mail.gmail.com>
From: Alan Cox <alan.l.cox@gmail.com>
To: Matthew Fleming <mdf356@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: Andriy Gapon <avg@freebsd.org>, freebsd-arch@freebsd.org
Subject: Re: amd64: change VM_KMEM_SIZE_SCALE to 1?
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: alc@freebsd.org
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Jul 2010 19:18:50 -0000

On Mon, Jul 26, 2010 at 12:04 PM, Matthew Fleming <mdf356@gmail.com> wrote:

> On Mon, Jul 26, 2010 at 9:07 AM, Andriy Gapon <avg@freebsd.org> wrote:
> >
> > Anyone knows any reason why VM_KMEM_SIZE_SCALE on amd64 should not be set
> to 1?
> > I mean things potentially breaking, or some unpleasant surprise for an
> > administrator/user...
>
> As I understand it, it's merely a resource usage issue.  amd64 needs
> page table entries for the expected virtual address space, so allowing
> more than e.g. 1/3 of physical memory means needing more PTEs.  But
> the memory overhead isn't all that large IIRC: each 4k physical memory
> devoted to PTEs maps 512 4k virtual addresses, or 2MB, so e.g. it
> takes about 4MB reserved as PTE pages to map 2GB of kernel virtual
> address space.
>
> Having cut my OS teeth on AIX/PowerPC where virutal address space is
> free and has no relation to the size of the hardware page table, the
> FreeBSD architecture limiting the size of the kernel virtual space
> seemed weird to me.  However, since FreeBSD also does not page kernel
> data to disk, there's a good reason to limit the size of the kernel's
> virtual space, since that also limits the kernel's physical space.
>
>
This last answer is the answer that I would give as well.  As you say, the
page table memory isn't that significant.


> In other words, setting it to 1 could lead to the system being out of
> memory but not trying to fail kernel malloc requests.  I'm not
> entirely sure this is a new problem since one could also chew through
> physical memory with sub-page uma allocations as well on amd64.
>
>
Yes, on both counts.  However, many of the things that we might allocate
with uma_small_alloc() have caps, e.g., vnode structures, mitigating the
risk somewhat.

Alan

From owner-freebsd-arch@FreeBSD.ORG  Mon Jul 26 19:29:07 2010
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7824A1065676
	for <freebsd-arch@freebsd.org>; Mon, 26 Jul 2010 19:29:07 +0000 (UTC)
	(envelope-from peter@wemm.org)
Received: from mail-pz0-f54.google.com (mail-pz0-f54.google.com
	[209.85.210.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 5272B8FC13
	for <freebsd-arch@freebsd.org>; Mon, 26 Jul 2010 19:29:07 +0000 (UTC)
Received: by pzk7 with SMTP id 7so1306491pzk.13
	for <multiple recipients>; Mon, 26 Jul 2010 12:29:06 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.142.200.21 with SMTP id x21mr3698302wff.207.1280172546682; 
	Mon, 26 Jul 2010 12:29:06 -0700 (PDT)
Received: by 10.229.237.73 with HTTP; Mon, 26 Jul 2010 12:29:05 -0700 (PDT)
In-Reply-To: <4C4DD1AA.3050906@freebsd.org>
References: <4C4DB2B8.9080404@freebsd.org>
	<AANLkTikY+nPTgBtDWcphNkOrW-Aif5TRSCuCn8BsK3p7@mail.gmail.com>
	<4C4DD1AA.3050906@freebsd.org>
Date: Mon, 26 Jul 2010 12:29:05 -0700
Message-ID: <AANLkTinytwzBTbkhpMhODHQX=kMKFD-7Qr-483tx8TTo@mail.gmail.com>
From: Peter Wemm <peter@wemm.org>
To: Andriy Gapon <avg@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
Cc: Matthew Fleming <mdf356@gmail.com>, freebsd-arch@freebsd.org
Subject: Re: amd64: change VM_KMEM_SIZE_SCALE to 1?
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Jul 2010 19:29:07 -0000

On Mon, Jul 26, 2010 at 11:19 AM, Andriy Gapon <avg@freebsd.org> wrote:
> on 26/07/2010 20:04 Matthew Fleming said the following:
>> On Mon, Jul 26, 2010 at 9:07 AM, Andriy Gapon <avg@freebsd.org> wrote:
>>> Anyone knows any reason why VM_KMEM_SIZE_SCALE on amd64 should not be set to 1?
>>> I mean things potentially breaking, or some unpleasant surprise for an
>>> administrator/user...

The amd64 kernel has a fixed size limit of kva, of which the kmem_map
must fit inside.  Most consumers of malloc() use the free direct map
region, but there are some notable abusers of malloc (zfs being the
prime offender) that prevent the use of the free direct map region for
its allocations.

I'm not familiar with how VM_KMEM_SIZE_SCALE's calculations work but I
think it would be a crying shame to waste a huge chunk of finite kva
space on systems that aren't handicapped by ZFS's abuse of malloc().
We've run out of kva space on amd64 in the past.

To recap.. the amd64 kernel has a place to do temporary mappings.
This space is finite. 6G on newer systems, 2G on older ones.  This is
most often used to remap discontiguous pages into virtually contiguous
address space.  The kernel also sets up a 1:1 virtual<->physical map
region so it can get to any page on the system without requiring a kva
mapping.

If its clear that changing VM_KMEM_SIZE_SCALE makes sense for the
common case then that's different.

Of course, with machines with 128G / 256G of physical ram either
already here or just around the corner, its time to start thinking
hard about physical ram based scaling calculations again.

That hard limit of 512G of physical ram doesn't seem so distant anymore..
-- 
Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com; KI6FJV
"All of this is for nothing if we don't go to the stars" - JMS/B5
"If Java had true garbage collection, most programs would delete
themselves upon execution." -- Robert Sewell

From owner-freebsd-arch@FreeBSD.ORG  Mon Jul 26 19:31:01 2010
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 82E491065687;
	Mon, 26 Jul 2010 19:31:01 +0000 (UTC)
	(envelope-from alan.l.cox@gmail.com)
Received: from mail-qw0-f54.google.com (mail-qw0-f54.google.com
	[209.85.216.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 1965A8FC08;
	Mon, 26 Jul 2010 19:31:00 +0000 (UTC)
Received: by qwk3 with SMTP id 3so354383qwk.13
	for <multiple recipients>; Mon, 26 Jul 2010 12:31:00 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:received:reply-to
	:in-reply-to:references:date:message-id:subject:from:to:cc
	:content-type; bh=dt5WTdO37i/Yn+WXGEQ6KnBEh4qdMtyGDgnq1vATrHQ=;
	b=Nt0gGxm+lebWcQJgt0TU0121OayRTf1yfscoJRgR7OTaCRJVV72t332nddOZpdt4Qq
	ky9iZbVsX9H1Dqt/48t8phdXkMeR8fLVUCGjwytSIvrYUa6WouLOi4zkDHm7XPY1vZra
	mHIDb+yzmC9jej9HN/+4etgx4hLmYBzOVQooU=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:reply-to:in-reply-to:references:date:message-id
	:subject:from:to:cc:content-type;
	b=VrMgD9/7CCACjEHZvFSKdyRZZGke+mowec9GkGdVMY86WLVl7SgqccjoT8EADdi/dn
	Tmut31HRTCNWQ+Ap8VRypkIW8jFlBkAYDWL30SHMYfpKdmY52HPwQHfRR0fqz0QO2gbK
	bftiz2PS4NBqGJYyv1HniuloQBc6WWFzxwTsE=
MIME-Version: 1.0
Received: by 10.224.65.138 with SMTP id j10mr6460469qai.147.1280172659950; 
	Mon, 26 Jul 2010 12:30:59 -0700 (PDT)
Received: by 10.229.239.5 with HTTP; Mon, 26 Jul 2010 12:30:59 -0700 (PDT)
In-Reply-To: <4C4DD1AA.3050906@freebsd.org>
References: <4C4DB2B8.9080404@freebsd.org>
	<AANLkTikY+nPTgBtDWcphNkOrW-Aif5TRSCuCn8BsK3p7@mail.gmail.com>
	<4C4DD1AA.3050906@freebsd.org>
Date: Mon, 26 Jul 2010 14:30:59 -0500
Message-ID: <AANLkTimWcXHAz=K1UM6ECa=6xR5KuS-sf_nDhbFEgehq@mail.gmail.com>
From: Alan Cox <alan.l.cox@gmail.com>
To: Andriy Gapon <avg@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: Matthew Fleming <mdf356@gmail.com>, freebsd-arch@freebsd.org
Subject: Re: amd64: change VM_KMEM_SIZE_SCALE to 1?
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: alc@freebsd.org
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Jul 2010 19:31:01 -0000

On Mon, Jul 26, 2010 at 1:19 PM, Andriy Gapon <avg@freebsd.org> wrote:

> on 26/07/2010 20:04 Matthew Fleming said the following:
> > On Mon, Jul 26, 2010 at 9:07 AM, Andriy Gapon <avg@freebsd.org> wrote:
> >> Anyone knows any reason why VM_KMEM_SIZE_SCALE on amd64 should not be
> set to 1?
> >> I mean things potentially breaking, or some unpleasant surprise for an
> >> administrator/user...
> >
> > As I understand it, it's merely a resource usage issue.  amd64 needs
> > page table entries for the expected virtual address space, so allowing
> > more than e.g. 1/3 of physical memory means needing more PTEs.  But
> > the memory overhead isn't all that large IIRC: each 4k physical memory
> > devoted to PTEs maps 512 4k virtual addresses, or 2MB, so e.g. it
> > takes about 4MB reserved as PTE pages to map 2GB of kernel virtual
> > address space.
>
> My understanding is that paging entries are only allocated when actual
> (physical) memory allocation is done.  But I am not sure.
>
> > Having cut my OS teeth on AIX/PowerPC where virutal address space is
> > free and has no relation to the size of the hardware page table, the
> > FreeBSD architecture limiting the size of the kernel virtual space
> > seemed weird to me.  However, since FreeBSD also does not page kernel
> > data to disk, there's a good reason to limit the size of the kernel's
> > virtual space, since that also limits the kernel's physical space.
> >
> > In other words, setting it to 1 could lead to the system being out of
> > memory but not trying to fail kernel malloc requests.  I'm not
> > entirely sure this is a new problem since one could also chew through
> > physical memory with sub-page uma allocations as well on amd64.
>
> Well, personally I would prefer kernel eating a lot of memory over getting
> "kmem_map too small" panic.  Unexpectedly large memory usage by kernel can
> be
> detected and diagnosed, and then proper limits and (auto-)tuning could be
> put in
> place.  Panic at some random allocation is not that helpful.
> Besides, presently there are more and more workloads that require a lot of
> kernel memory - e.g. ZFS is gaining popularity.
>
>
Like what exactly?  Since I increased the size of the kernel address space
for amd64 to 512GB, and thus the size of the kernel heap was no longer
limited by virtual address space size, but only by the auto-tuning based
upon physical memory size, I am not aware of any "kmem_map to small" panics
that are not ZFS/ARC related.


> Hence, the question/suggestion.
>
> Of course, the things can be tuned by hand, but I think that
> VM_KMEM_SIZE_SCALE=1 would be a more reasonable default than current value.
>
>
Even this would not eliminate the ZFS/ARC panics.  I have heard that some
people must configure the kmem_map to 1.5 times a machine's physical memory
size to avoid panics.  The reason is that unlike the traditional FreeBSD way
of caching file data, the ZFS/ARC wants to have every page of cached data
*mapped* (and wired) in the kernel address space.  Over time, the available,
unused space in the kmem_map becomes fragmented, and even though the ARC
thinks that it has not reached its size limit, kmem_malloc() cannot find
contiguous space to satisfy the allocation request.  To see this described
in great detail, do a web search for an e-mail by Ben Kelly with the subject
"[patch] zfs kmem fragmentation".

As far as eliminating or reducing the manual tuning that many ZFS users do,
I would love to see someone tackle the overly conservative hard limit that
we place on the number of vnode structures.  The current hard limit was put
in place when we had just introduced mutexes into many structures and more a
mutex was much larger than it is today.

Alan

From owner-freebsd-arch@FreeBSD.ORG  Mon Jul 26 19:35:15 2010
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7605A106566C;
	Mon, 26 Jul 2010 19:35:15 +0000 (UTC)
	(envelope-from alan.l.cox@gmail.com)
Received: from mail-px0-f182.google.com (mail-px0-f182.google.com
	[209.85.212.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 407D58FC1E;
	Mon, 26 Jul 2010 19:35:15 +0000 (UTC)
Received: by pxi8 with SMTP id 8so253501pxi.13
	for <multiple recipients>; Mon, 26 Jul 2010 12:35:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:received:reply-to
	:in-reply-to:references:date:message-id:subject:from:to:cc
	:content-type; bh=YswMUv+kpVmcWpa2XAWn8SAhKfs40blsor/dZ+FMZbk=;
	b=D9hO8kPZfm0RFqw9+vmRw4xPxk63NN/17afvZPTcVKzKZWiOV7uPxQ3eQiX0hTdjUC
	SPxKZtWntL/6fPDmQCpiQ+zqmIc6j0Scih4jlPfeb1AMLCTH8YaVke53fxhsV3mehkX6
	bZ7oIhWFGEselPvYEuJTzi26s6RRltG/0sSUo=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:reply-to:in-reply-to:references:date:message-id
	:subject:from:to:cc:content-type;
	b=d/VG4lZQGwyrHCibwJdml/S0r9O+79mDdllPIqTL/ssDfkqHdRh0MXcO2YXRCqK45i
	ZZycfHT+2IMnoGiQj1cefigPyh1Qj274pMlW3oFxT2eAj2YBgWnWZwzH284Eu9Sq0ivy
	cQ3IOKdbp7OAcr+n99ht33zDxazA+8jy1K/uM=
MIME-Version: 1.0
Received: by 10.114.13.12 with SMTP id 12mr7811003wam.90.1280172914679; Mon, 
	26 Jul 2010 12:35:14 -0700 (PDT)
Received: by 10.229.239.5 with HTTP; Mon, 26 Jul 2010 12:35:13 -0700 (PDT)
In-Reply-To: <AANLkTinytwzBTbkhpMhODHQX=kMKFD-7Qr-483tx8TTo@mail.gmail.com>
References: <4C4DB2B8.9080404@freebsd.org>
	<AANLkTikY+nPTgBtDWcphNkOrW-Aif5TRSCuCn8BsK3p7@mail.gmail.com>
	<4C4DD1AA.3050906@freebsd.org>
	<AANLkTinytwzBTbkhpMhODHQX=kMKFD-7Qr-483tx8TTo@mail.gmail.com>
Date: Mon, 26 Jul 2010 14:35:13 -0500
Message-ID: <AANLkTinPtfCYO1a+BcSf6vnOU_Wf2sb1BTAcqRFptMS-@mail.gmail.com>
From: Alan Cox <alan.l.cox@gmail.com>
To: Peter Wemm <peter@wemm.org>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: Matthew Fleming <mdf356@gmail.com>, Andriy Gapon <avg@freebsd.org>,
	freebsd-arch@freebsd.org
Subject: Re: amd64: change VM_KMEM_SIZE_SCALE to 1?
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: alc@freebsd.org
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Jul 2010 19:35:15 -0000

Peter,

In FreeBSD >= 7.3, the kernel address space limit is no longer 6GB.  It is
now 512GB.

Alan

From owner-freebsd-arch@FreeBSD.ORG  Mon Jul 26 19:43:26 2010
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id EB9B91065670;
	Mon, 26 Jul 2010 19:43:26 +0000 (UTC) (envelope-from avg@freebsd.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 14BA88FC1C;
	Mon, 26 Jul 2010 19:43:25 +0000 (UTC)
Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua
	[212.40.38.100])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id WAA29829;
	Mon, 26 Jul 2010 22:43:24 +0300 (EEST)
	(envelope-from avg@freebsd.org)
Received: from localhost.topspin.kiev.ua ([127.0.0.1])
	by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
	id 1OdTa7-000H9t-Uw; Mon, 26 Jul 2010 22:43:24 +0300
Message-ID: <4C4DE555.2020503@freebsd.org>
Date: Mon, 26 Jul 2010 22:43:17 +0300
From: Andriy Gapon <avg@freebsd.org>
User-Agent: Thunderbird 2.0.0.24 (X11/20100603)
MIME-Version: 1.0
To: alc@freebsd.org
References: <4C4DB2B8.9080404@freebsd.org>	<AANLkTikY+nPTgBtDWcphNkOrW-Aif5TRSCuCn8BsK3p7@mail.gmail.com>	<4C4DD1AA.3050906@freebsd.org>
	<AANLkTimWcXHAz=K1UM6ECa=6xR5KuS-sf_nDhbFEgehq@mail.gmail.com>
In-Reply-To: <AANLkTimWcXHAz=K1UM6ECa=6xR5KuS-sf_nDhbFEgehq@mail.gmail.com>
X-Enigmail-Version: 0.96.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: Matthew Fleming <mdf356@gmail.com>, freebsd-arch@freebsd.org
Subject: Re: amd64: change VM_KMEM_SIZE_SCALE to 1?
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Jul 2010 19:43:27 -0000

on 26/07/2010 22:30 Alan Cox said the following:
> On Mon, Jul 26, 2010 at 1:19 PM, Andriy Gapon <avg@freebsd.org
> <mailto:avg@freebsd.org>> wrote:
> 
>     on 26/07/2010 20:04 Matthew Fleming said the following:
>     > On Mon, Jul 26, 2010 at 9:07 AM, Andriy Gapon <avg@freebsd.org
>     <mailto:avg@freebsd.org>> wrote:
>     >> Anyone knows any reason why VM_KMEM_SIZE_SCALE on amd64 should
>     not be set to 1?
>     >> I mean things potentially breaking, or some unpleasant surprise
>     for an
>     >> administrator/user...
>     >
>     > As I understand it, it's merely a resource usage issue.  amd64 needs
>     > page table entries for the expected virtual address space, so allowing
>     > more than e.g. 1/3 of physical memory means needing more PTEs.  But
>     > the memory overhead isn't all that large IIRC: each 4k physical memory
>     > devoted to PTEs maps 512 4k virtual addresses, or 2MB, so e.g. it
>     > takes about 4MB reserved as PTE pages to map 2GB of kernel virtual
>     > address space.
> 
>     My understanding is that paging entries are only allocated when actual
>     (physical) memory allocation is done.  But I am not sure.
> 
>     > Having cut my OS teeth on AIX/PowerPC where virutal address space is
>     > free and has no relation to the size of the hardware page table, the
>     > FreeBSD architecture limiting the size of the kernel virtual space
>     > seemed weird to me.  However, since FreeBSD also does not page kernel
>     > data to disk, there's a good reason to limit the size of the kernel's
>     > virtual space, since that also limits the kernel's physical space.
>     >
>     > In other words, setting it to 1 could lead to the system being out of
>     > memory but not trying to fail kernel malloc requests.  I'm not
>     > entirely sure this is a new problem since one could also chew through
>     > physical memory with sub-page uma allocations as well on amd64.
> 
>     Well, personally I would prefer kernel eating a lot of memory over
>     getting
>     "kmem_map too small" panic.  Unexpectedly large memory usage by
>     kernel can be
>     detected and diagnosed, and then proper limits and (auto-)tuning
>     could be put in
>     place.  Panic at some random allocation is not that helpful.
>     Besides, presently there are more and more workloads that require a
>     lot of
>     kernel memory - e.g. ZFS is gaining popularity.
> 
> 
> Like what exactly?  Since I increased the size of the kernel address
> space for amd64 to 512GB, and thus the size of the kernel heap was no
> longer limited by virtual address space size, but only by the
> auto-tuning based upon physical memory size, I am not aware of any
> "kmem_map to small" panics that are not ZFS/ARC related.

Well, I meant exactly these.

>     Hence, the question/suggestion.
> 
>     Of course, the things can be tuned by hand, but I think that
>     VM_KMEM_SIZE_SCALE=1 would be a more reasonable default than current
>     value.
> 
> 
> Even this would not eliminate the ZFS/ARC panics.  I have heard that
> some people must configure the kmem_map to 1.5 times a machine's
> physical memory size to avoid panics.  The reason is that unlike the
> traditional FreeBSD way of caching file data, the ZFS/ARC wants to have
> every page of cached data *mapped* (and wired) in the kernel address
> space.  Over time, the available, unused space in the kmem_map becomes
> fragmented, and even though the ARC thinks that it has not reached its
> size limit, kmem_malloc() cannot find contiguous space to satisfy the
> allocation request.  To see this described in great detail, do a web
> search for an e-mail by Ben Kelly with the subject "[patch] zfs kmem
> fragmentation".

Yes, I am aware of the fragmentation issue.
But I haven't hit that panic myself since setting vm.kmem_size_scale="1" in
loader.conf.
Of course, what I propose would not fix the fragmentation issue.
But... it's something that ZFS users (especially serious ZFS users like file
servers) would want to do anyway and it won't cause any harm for others.

> As far as eliminating or reducing the manual tuning that many ZFS users
> do, I would love to see someone tackle the overly conservative hard
> limit that we place on the number of vnode structures.  The current hard
> limit was put in place when we had just introduced mutexes into many
> structures and more a mutex was much larger than it is today.

I agree.  But that's a little bit different topic.

-- 
Andriy Gapon

From owner-freebsd-arch@FreeBSD.ORG  Mon Jul 26 20:12:54 2010
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0FEE6106564A;
	Mon, 26 Jul 2010 20:12:54 +0000 (UTC)
	(envelope-from scottl@samsco.org)
Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57])
	by mx1.freebsd.org (Postfix) with ESMTP id 8E5E08FC19;
	Mon, 26 Jul 2010 20:12:53 +0000 (UTC)
Received: from [127.0.0.1] (pooker.samsco.org [168.103.85.57])
	(authenticated bits=0)
	by pooker.samsco.org (8.14.4/8.14.4) with ESMTP id o6QJglSv029077;
	Mon, 26 Jul 2010 13:42:47 -0600 (MDT)
	(envelope-from scottl@samsco.org)
Mime-Version: 1.0 (Apple Message framework v1078)
Content-Type: text/plain; charset=us-ascii
From: Scott Long <scottl@samsco.org>
In-Reply-To: <AANLkTinPtfCYO1a+BcSf6vnOU_Wf2sb1BTAcqRFptMS-@mail.gmail.com>
Date: Mon, 26 Jul 2010 13:42:47 -0600
Content-Transfer-Encoding: quoted-printable
Message-Id: <CA536156-3DF2-4F9B-B620-44B69C866C62@samsco.org>
References: <4C4DB2B8.9080404@freebsd.org>
	<AANLkTikY+nPTgBtDWcphNkOrW-Aif5TRSCuCn8BsK3p7@mail.gmail.com>
	<4C4DD1AA.3050906@freebsd.org>
	<AANLkTinytwzBTbkhpMhODHQX=kMKFD-7Qr-483tx8TTo@mail.gmail.com>
	<AANLkTinPtfCYO1a+BcSf6vnOU_Wf2sb1BTAcqRFptMS-@mail.gmail.com>
To: alc@freebsd.org
X-Mailer: Apple Mail (2.1078)
X-Spam-Status: No, score=-50.0 required=3.8 tests=ALL_TRUSTED,
	T_RP_MATCHES_RCVD autolearn=unavailable version=3.3.0
X-Spam-Checker-Version: SpamAssassin 3.3.0 (2010-01-18) on pooker.samsco.org
Cc: Matthew Fleming <mdf356@gmail.com>, Andriy Gapon <avg@freebsd.org>,
	freebsd-arch@freebsd.org
Subject: Re: amd64: change VM_KMEM_SIZE_SCALE to 1?
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Jul 2010 20:12:54 -0000

On Jul 26, 2010, at 1:35 PM, Alan Cox wrote:
> Peter,
>=20
> In FreeBSD >=3D 7.3, the kernel address space limit is no longer 6GB.  =
It is
> now 512GB.
>=20

Ok, I mistakenly thought that it was still 2GB/6GB as well.  So to be =
clear, KVA maxes out at ? and kmem maxes out at ?

Scott



From owner-freebsd-arch@FreeBSD.ORG  Mon Jul 26 20:55:51 2010
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 79EB81065677;
	Mon, 26 Jul 2010 20:55:51 +0000 (UTC) (envelope-from avg@freebsd.org)
Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140])
	by mx1.freebsd.org (Postfix) with ESMTP id 8B9618FC2B;
	Mon, 26 Jul 2010 20:55:50 +0000 (UTC)
Received: from porto.topspin.kiev.ua (porto-e.starpoint.kiev.ua
	[212.40.38.100])
	by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id XAA00770;
	Mon, 26 Jul 2010 23:55:36 +0300 (EEST)
	(envelope-from avg@freebsd.org)
Received: from localhost.topspin.kiev.ua ([127.0.0.1])
	by porto.topspin.kiev.ua with esmtp (Exim 4.34 (FreeBSD))
	id 1OdUhz-000HEi-Po; Mon, 26 Jul 2010 23:55:35 +0300
Message-ID: <4C4DF646.9090206@freebsd.org>
Date: Mon, 26 Jul 2010 23:55:34 +0300
From: Andriy Gapon <avg@freebsd.org>
User-Agent: Thunderbird 2.0.0.24 (X11/20100603)
MIME-Version: 1.0
To: Scott Long <scottl@samsco.org>
References: <4C4DB2B8.9080404@freebsd.org>
	<AANLkTikY+nPTgBtDWcphNkOrW-Aif5TRSCuCn8BsK3p7@mail.gmail.com>
	<4C4DD1AA.3050906@freebsd.org>
	<AANLkTinytwzBTbkhpMhODHQX=kMKFD-7Qr-483tx8TTo@mail.gmail.com>
	<AANLkTinPtfCYO1a+BcSf6vnOU_Wf2sb1BTAcqRFptMS-@mail.gmail.com>
	<CA536156-3DF2-4F9B-B620-44B69C866C62@samsco.org>
In-Reply-To: <CA536156-3DF2-4F9B-B620-44B69C866C62@samsco.org>
X-Enigmail-Version: 0.96.0
Content-Type: text/plain; charset=windows-1251
Content-Transfer-Encoding: 7bit
Cc: alc@freebsd.org, Matthew Fleming <mdf356@gmail.com>,
	freebsd-arch@freebsd.org
Subject: Re: amd64: change VM_KMEM_SIZE_SCALE to 1?
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Jul 2010 20:55:51 -0000

on 26/07/2010 22:42 Scott Long said the following:
> On Jul 26, 2010, at 1:35 PM, Alan Cox wrote:
>> Peter,
>> 
>> In FreeBSD >= 7.3, the kernel address space limit is no longer 6GB.  It is 
>> now 512GB.
>> 
> 
> Ok, I mistakenly thought that it was still 2GB/6GB as well.  So to be clear,
> KVA maxes out at ? 

As Alan said - 512GB.

> and kmem maxes out at ?

There is a formula with bunch of tunables, but normally it's 1/3 of available
physical memory.

Unless I am mistaken.

-- 
Andriy Gapon

From owner-freebsd-arch@FreeBSD.ORG  Mon Jul 26 23:06:07 2010
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2E2DA1065675;
	Mon, 26 Jul 2010 23:06:07 +0000 (UTC)
	(envelope-from peterjeremy@acm.org)
Received: from fallbackmx10.syd.optusnet.com.au
	(fallbackmx10.syd.optusnet.com.au [211.29.132.251])
	by mx1.freebsd.org (Postfix) with ESMTP id 5817B8FC16;
	Mon, 26 Jul 2010 23:06:05 +0000 (UTC)
Received: from mail17.syd.optusnet.com.au (mail17.syd.optusnet.com.au
	[211.29.132.198])
	by fallbackmx10.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	o6QL6OXW006316; Tue, 27 Jul 2010 07:06:24 +1000
Received: from server.vk2pj.dyndns.org
	(c211-30-160-13.belrs4.nsw.optusnet.com.au [211.30.160.13])
	by mail17.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	o6QL6JAq025144
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Tue, 27 Jul 2010 07:06:21 +1000
X-Bogosity: Ham, spamicity=0.000000
Received: from server.vk2pj.dyndns.org (localhost.vk2pj.dyndns.org [127.0.0.1])
	by server.vk2pj.dyndns.org (8.14.4/8.14.4) with ESMTP id o6QL6JlW003135;
	Tue, 27 Jul 2010 07:06:19 +1000 (EST)
	(envelope-from peter@server.vk2pj.dyndns.org)
Received: (from peter@localhost)
	by server.vk2pj.dyndns.org (8.14.4/8.14.4/Submit) id o6QL6Jlr003134;
	Tue, 27 Jul 2010 07:06:19 +1000 (EST) (envelope-from peter)
Date: Tue, 27 Jul 2010 07:06:19 +1000
From: Peter Jeremy <peterjeremy@acm.org>
To: Peter Wemm <peter@wemm.org>
Message-ID: <20100726210619.GC2921@server.vk2pj.dyndns.org>
References: <4C4DB2B8.9080404@freebsd.org>
	<AANLkTikY+nPTgBtDWcphNkOrW-Aif5TRSCuCn8BsK3p7@mail.gmail.com>
	<4C4DD1AA.3050906@freebsd.org>
	<AANLkTinytwzBTbkhpMhODHQX=kMKFD-7Qr-483tx8TTo@mail.gmail.com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="0vzXIDBeUiKkjNJl"
Content-Disposition: inline
In-Reply-To: <AANLkTinytwzBTbkhpMhODHQX=kMKFD-7Qr-483tx8TTo@mail.gmail.com>
X-PGP-Key: http://members.optusnet.com.au/peterjeremy/pubkey.asc
User-Agent: Mutt/1.5.20 (2009-06-14)
Cc: Matthew Fleming <mdf356@gmail.com>, Andriy Gapon <avg@freebsd.org>,
	freebsd-arch@freebsd.org
Subject: Re: amd64: change VM_KMEM_SIZE_SCALE to 1?
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Jul 2010 23:06:07 -0000


--0vzXIDBeUiKkjNJl
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On 2010-Jul-26 12:29:05 -0700, Peter Wemm <peter@wemm.org> wrote:
>That hard limit of 512G of physical ram doesn't seem so distant anymore..

You can put 512GB of 16GB DIMMs into one of these:
http://www.supermicro.com/a_images/products/Aplus/MB/H8QGi-F_spec.jpg

And, I don't have the link but at least one of Dell's higher-end boxes
allows you to select 1TB RAM in the configurator.

--=20
Peter Jeremy

--0vzXIDBeUiKkjNJl
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.15 (FreeBSD)

iEYEARECAAYFAkxN+MsACgkQ/opHv/APuIeW2ACgvg3/85MUTDg42Xg60T5Qb1U8
J2AAnA/sCwy3nTo8z9cn4+HM1w1UmC+Z
=QQVm
-----END PGP SIGNATURE-----

--0vzXIDBeUiKkjNJl--

From owner-freebsd-arch@FreeBSD.ORG  Tue Jul 27 14:16:29 2010
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4D7A5106564A;
	Tue, 27 Jul 2010 14:16:29 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 1E3588FC0C;
	Tue, 27 Jul 2010 14:16:29 +0000 (UTC)
Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net
	[66.111.2.69])
	by cyrus.watson.org (Postfix) with ESMTPSA id C233E46B38;
	Tue, 27 Jul 2010 10:16:28 -0400 (EDT)
Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9])
	by bigwig.baldwin.cx (Postfix) with ESMTPSA id E8E748A04E;
	Tue, 27 Jul 2010 10:16:27 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-arch@freebsd.org,
 alc@freebsd.org
Date: Tue, 27 Jul 2010 09:35:52 -0400
User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20100217; KDE/4.4.5; amd64; ; )
References: <4C4DB2B8.9080404@freebsd.org> <4C4DD1AA.3050906@freebsd.org>
	<AANLkTimWcXHAz=K1UM6ECa=6xR5KuS-sf_nDhbFEgehq@mail.gmail.com>
In-Reply-To: <AANLkTimWcXHAz=K1UM6ECa=6xR5KuS-sf_nDhbFEgehq@mail.gmail.com>
MIME-Version: 1.0
Message-Id: <201007270935.52082.jhb@freebsd.org>
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1
	(bigwig.baldwin.cx); Tue, 27 Jul 2010 10:16:27 -0400 (EDT)
X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx
X-Virus-Status: Clean
X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00 autolearn=ham
	version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx
Cc: Matthew Fleming <mdf356@gmail.com>, Andriy Gapon <avg@freebsd.org>
Subject: Re: amd64: change VM_KMEM_SIZE_SCALE to 1?
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 27 Jul 2010 14:16:29 -0000

On Monday, July 26, 2010 3:30:59 pm Alan Cox wrote:
> As far as eliminating or reducing the manual tuning that many ZFS users do,
> I would love to see someone tackle the overly conservative hard limit that
> we place on the number of vnode structures.  The current hard limit was put
> in place when we had just introduced mutexes into many structures and more a
> mutex was much larger than it is today.

I have a strawman of that (relative to 7).  It simply adjusts the hardcoded 
maximum to instead be a function of the amount of physical memory.

Index: vfs_subr.c
===================================================================
--- vfs_subr.c	(revision 210934)
+++ vfs_subr.c	(working copy)
@@ -288,6 +288,7 @@
 static void
 vntblinit(void *dummy __unused)
 {
+	int vnodes;
 
 	/*
 	 * Desiredvnodes is a function of the physical memory size and
@@ -299,10 +300,19 @@
 	desiredvnodes = min(maxproc + cnt.v_page_count / 4, 2 * vm_kmem_size /
 	    (5 * (sizeof(struct vm_object) + sizeof(struct vnode))));
 	if (desiredvnodes > MAXVNODES_MAX) {
+
+		/*
+		 * If there is a lot of physical memory, allow the cap
+		 * on vnodes to expand to using a little under 1% of
+		 * available RAM.
+		 */
+		vnodes = max(MAXVNODES_MAX, cnt.v_page_count * (PAGE_SIZE /
+		    128) / (sizeof(struct vm_object) + sizeof(struct vnode)));
+		KASSERT(vnodes < desiredvnodes, ("capped vnodes too big"));
 		if (bootverbose)
 			printf("Reducing kern.maxvnodes %d -> %d\n",
-			    desiredvnodes, MAXVNODES_MAX);
-		desiredvnodes = MAXVNODES_MAX;
+			    desiredvnodes, vnodes);
+		desiredvnodes = vnodes;
 	}
 	wantfreevnodes = desiredvnodes / 4;
 	mtx_init(&mntid_mtx, "mntid", NULL, MTX_DEF);

-- 
John Baldwin

From owner-freebsd-arch@FreeBSD.ORG  Tue Jul 27 16:45:36 2010
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 02E78106566B;
	Tue, 27 Jul 2010 16:45:36 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail05.syd.optusnet.com.au (mail05.syd.optusnet.com.au
	[211.29.132.186])
	by mx1.freebsd.org (Postfix) with ESMTP id 8C0218FC12;
	Tue, 27 Jul 2010 16:45:35 +0000 (UTC)
Received: from c122-106-147-41.carlnfd1.nsw.optusnet.com.au
	(c122-106-147-41.carlnfd1.nsw.optusnet.com.au [122.106.147.41])
	by mail05.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	o6RGjVXa022996
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Wed, 28 Jul 2010 02:45:32 +1000
Date: Wed, 28 Jul 2010 02:45:31 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@delplex.bde.org
To: alc@FreeBSD.org
In-Reply-To: <AANLkTimWcXHAz=K1UM6ECa=6xR5KuS-sf_nDhbFEgehq@mail.gmail.com>
Message-ID: <20100728001247.B899@delplex.bde.org>
References: <4C4DB2B8.9080404@freebsd.org>
	<AANLkTikY+nPTgBtDWcphNkOrW-Aif5TRSCuCn8BsK3p7@mail.gmail.com>
	<4C4DD1AA.3050906@freebsd.org>
	<AANLkTimWcXHAz=K1UM6ECa=6xR5KuS-sf_nDhbFEgehq@mail.gmail.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: Matthew Fleming <mdf356@gmail.com>, Andriy Gapon <avg@FreeBSD.org>,
	freebsd-arch@FreeBSD.org
Subject: Re: amd64: change VM_KMEM_SIZE_SCALE to 1?
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 27 Jul 2010 16:45:36 -0000

On Mon, 26 Jul 2010, Alan Cox wrote:

> On Mon, Jul 26, 2010 at 1:19 PM, Andriy Gapon <avg@freebsd.org> wrote:
>
>> on 26/07/2010 20:04 Matthew Fleming said the following:
>>> On Mon, Jul 26, 2010 at 9:07 AM, Andriy Gapon <avg@freebsd.org> wrote:
>>>> Anyone knows any reason why VM_KMEM_SIZE_SCALE on amd64 should not be
>> set to 1?
>>>> I mean things potentially breaking, or some unpleasant surprise for an
>>>> administrator/user...

Shouldn't it be a fraction (of about 1/(2**32)) so that you can map things
sparsely into about 2**64 bytes of KVA?

Actually mapping 2**64 bytes of KVA would take too many resources, but is
does it take too many resources to reserve that amount and to be prepared
to actually map lots more than now?

>>> As I understand it, it's merely a resource usage issue.  amd64 needs
>>> page table entries for the expected virtual address space, so allowing
>>> more than e.g. 1/3 of physical memory means needing more PTEs.  But
>>> the memory overhead isn't all that large IIRC: each 4k physical memory
>>> devoted to PTEs maps 512 4k virtual addresses, or 2MB, so e.g. it
>>> takes about 4MB reserved as PTE pages to map 2GB of kernel virtual
>>> address space.

That's not small, but isn't it 1024 times less due to 4MB pages in the
kernel?  But I guess 4MB pages are no good for sparse mappings.

>> ...
>> Well, personally I would prefer kernel eating a lot of memory over getting
>> "kmem_map too small" panic.  Unexpectedly large memory usage by kernel can
>> be
>> detected and diagnosed, and then proper limits and (auto-)tuning could be
>> put in
>> place.  Panic at some random allocation is not that helpful.
>> Besides, presently there are more and more workloads that require a lot of
>> kernel memory - e.g. ZFS is gaining popularity.
>>
> Like what exactly?  Since I increased the size of the kernel address space
> for amd64 to 512GB, and thus the size of the kernel heap was no longer
> limited by virtual address space size, but only by the auto-tuning based
> upon physical memory size, I am not aware of any "kmem_map to small" panics
> that are not ZFS/ARC related.
>
>> Hence, the question/suggestion.
>>
>> Of course, the things can be tuned by hand, but I think that
>> VM_KMEM_SIZE_SCALE=1 would be a more reasonable default than current value.
>>
> Even this would not eliminate the ZFS/ARC panics.  I have heard that some
> people must configure the kmem_map to 1.5 times a machine's physical memory
> size to avoid panics.

2**32 times larger whould avoid this even better (up to 4GB physical memory)
:-).

With 512GB virtual and 4GB physical, 128 times larger (VM_KMEM_SIZE_SCALE=(1/
128.0) is almost possible and 32 larger seems practical (leave 3/4 for
other things).  However, it seems wrong to scale by physical memory
at all.  If you are prepared to map 512GB, why not allow a significant
fraction of that (say 1/4) to be used for kmem?  The only problem that
I see is that there will be more rounds of physical memory and disk
sizes increasing faster than virtual memory limits; on every round
algorithms based on sparse mappings break.

> The reason is that unlike the traditional FreeBSD way
> of caching file data, the ZFS/ARC wants to have every page of cached data
> *mapped* (and wired) in the kernel address space.

Traditional BSD (Net/2 at least, and perhaps even FreeBSD-1), mapped
and wired every page of cached data (all ~2MB of it) sparsely into
buffer map part of the kernel address space (all ~16MB or 32MB of it
in 386BSD or FreeBSD-early, but 256MB in FreeBSD-1.1.5).  I like the
simplicity of this.  It would have worked perfectly in FreeBSD-1.1.5
since physical memory and disk sizes were still much smaller than i386
address space.  It would work adequately even now (since nbuf now only
needs to be large enough to limit thrashing of VMIO mappings).

> Over time, the available,
> unused space in the kmem_map becomes fragmented, and even though the ARC
> thinks that it has not reached its size limit, kmem_malloc() cannot find
> contiguous space to satisfy the allocation request.  To see this described
> in great detail, do a web search for an e-mail by Ben Kelly with the subject
> "[patch] zfs kmem fragmentation".

This is exactly what happened several with the buffer map(s) in
FreeBSD-[1-2][3-4?], except with memory sizes scaled by 3, then 2,
then 1 decimal orders of magnitude.  In FreeBSD-1, plain malloc() was
used for buffers, and kmem_map was far too small (16MB) for this to
work well.  In FreeBSD-[2-current], a much more complicated method is
used to allocate buffers (and to map VMIO pages into buffers).  This
is essentially a private version of malloc() with lots of specialization
for buffers and a separate map so that it doesn't have to fight with
other users of malloc().  Despite its specialization, this still had
problems with fragmentation.  It wasn't until FreeBSD-4 that the
specialization became complicated enough to mostly avoid these problems.

Bruce

From owner-freebsd-arch@FreeBSD.ORG  Wed Jul 28 22:45:52 2010
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E732A1065672;
	Wed, 28 Jul 2010 22:45:52 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id C50A28FC0C;
	Wed, 28 Jul 2010 22:45:52 +0000 (UTC)
Received: from fledge.watson.org (fledge.watson.org [65.122.17.41])
	by cyrus.watson.org (Postfix) with ESMTPS id 67DC046B45;
	Wed, 28 Jul 2010 18:45:52 -0400 (EDT)
Date: Wed, 28 Jul 2010 23:45:52 +0100 (BST)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: freebsd-net@FreeBSD.org
Message-ID: <alpine.BSF.2.00.1007282337310.14245@fledge.watson.org>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII
Cc: freebsd-arch@FreeBSD.org
Subject: Future of netnatm: volunteer wanted -- and/or -- removal notice
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 28 Jul 2010 22:45:53 -0000


Dear all:

When the new link layer framework was introduced in 8.0, one of our ATM 
stacks, netnatm, was left behind.  As a result, it neither compiles nor runs 
in 8.x and 9.x.  This e-mail serves two purposes:

(1) To solicit a volunteer who can work on the netnatm stack in 9.x, with 
potential merge to 8.x, to get it back to functionality before 9.0 ships. 
This is the preferred course of action.

(2) To serve as notice that if we can't find a volunteer to do this, we will 
remove netnatm and associated parts from the tree in 9.0 since they'll have 
gone one major version neither compiling nor running.  This is the fallback 
plan.

I'm in no great rush to remove netnatm, having spent quite a bit of time 
making it work in our MPSAFE world order a couple of years ago.  However, the 
code is bitrotting and requires urgent attention if it's going to work again 
easily (the stack is changing around it, and because netnatm doesn't build, it 
will get only cursory and likely incorrect updates).  I'm happy to help funnel 
changes into the tree from non-committers, as well as answer questions about 
the network stack, but I have no hardware facilities for debugging or testing 
netnatm changes myself, nor, unfortunately, the time to work on the code.

In order to provide further motivation for potentially interested parties, 
here's the proposed six-month removal schedule:

28 July 2010		- Notice of proposed removal
28 October 2010		- Transmit of notice of proposed removal
28 January 2011		- Proposed removal date

This schedule may be updated as the 9.0 release schedule becomes more clear, 
or if there are obvious signs of improvement and just a couple more months 
would get it fixed :-).  And, if worst comes to worst and we can't find a 
volunteer, the code will live on in the source repository history if there's a 
desire to rejuvenate it in the future.

Thanks,

Robert

Robert N M Watson
Computer Laboratory
University of Cambridge

From owner-freebsd-arch@FreeBSD.ORG  Thu Jul 29 15:47:44 2010
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0FD411065676
	for <freebsd-arch@freebsd.org>; Thu, 29 Jul 2010 15:47:44 +0000 (UTC)
	(envelope-from tijl@coosemans.org)
Received: from mailrelay004.isp.belgacom.be (mailrelay004.isp.belgacom.be
	[195.238.6.170])
	by mx1.freebsd.org (Postfix) with ESMTP id 9FEE58FC16
	for <freebsd-arch@freebsd.org>; Thu, 29 Jul 2010 15:47:43 +0000 (UTC)
X-Belgacom-Dynamic: yes
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AkwFANc3UUxbsVuf/2dsb2JhbACTNIxTcsAohTgE
Received: from 159.91-177-91.adsl-dyn.isp.belgacom.be (HELO
	kalimero.tijl.coosemans.org) ([91.177.91.159])
	by relay.skynet.be with ESMTP; 29 Jul 2010 17:18:13 +0200
Received: from kalimero.tijl.coosemans.org (kalimero.tijl.coosemans.org
	[127.0.0.1])
	by kalimero.tijl.coosemans.org (8.14.4/8.14.4) with ESMTP id
	o6TFIDUI005066; Thu, 29 Jul 2010 17:18:13 +0200 (CEST)
	(envelope-from tijl@coosemans.org)
From: Tijl Coosemans <tijl@coosemans.org>
To: freebsd-arch@freebsd.org
Date: Thu, 29 Jul 2010 17:18:03 +0200
User-Agent: KMail/1.13.5 (FreeBSD/8.1-PRERELEASE; KDE/4.4.5; i386; ; )
MIME-Version: 1.0
Content-Type: multipart/signed; boundary="nextPart1799404.mOfT1x0N4T";
	protocol="application/pgp-signature"; micalg=pgp-sha256
Content-Transfer-Encoding: 7bit
Message-Id: <201007291718.12687.tijl@coosemans.org>
Cc: pluknet <pluknet@gmail.com>
Subject: Support for cc -m32
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 29 Jul 2010 15:47:44 -0000

--nextPart1799404.mOfT1x0N4T
Content-Type: text/plain;
  charset="us-ascii"
Content-Transfer-Encoding: 7bit

Hi,

I've put the initial version of some patches online to support cross
compilation of 32 bit binaries on amd64. It's modelled after how NetBSD
does this.

With these patches something like "cc -m32 -o test test.c -pthread -lm"
generates a program that runs on FreeBSD/i386.

http://people.freebsd.org/~tijl/cc-m32-1.diff
http://people.freebsd.org/~tijl/cc-m32-2.diff
http://people.freebsd.org/~tijl/cc-m32-3.diff

*cc-m32-1.diff* : Let ld and cc find 32 bit libraries.

*cc-m32-2.diff* : Install i386 headers on amd64.

With this patch headers for a particular $arch are always installed
under /usr/include/$arch and /usr/include/machine becomes a symlink.

A question I have here is how best to clean up the old machine
directory. The patch currently uses 'rm -rf'.

Another problem I encountered was that during the build of
usr.bin/kdump all headers are searched for definitions of ioctl
requests and a C source code file is generated that includes all those
headers. This fails when both i386 and amd64 headers are installed
because they can't both be included at the same time. For now the patch
simply blacklists /usr/include/i386, but actually all $arch should be
excluded. The ioctl requests can still be found through the machine
symlink. If someone has a better idea...

*cc-m32-3.diff* : Modify amd64 headers to include i386 headers when
                  __i386__ is defined.

This patch modifies the amd64 headers to follow this format:

  #ifndef _AMD64_HEADER_H
  #define _AMD64_HEADER_H

  #ifdef __i386__
  #include <i386/header.h>
  #else

  ...

  #endif /* __i386__ */
  #endif /* !_AMD64_HEADER_H */

This way including <machine/header.h> works for -m32. There are a few
i386 headers which don't exist for amd64:

apm_segments.h
bootinfo.h
cserial.h
elan_mmcr.h
if_wl_wavelan.h
ioctl_bt848.h
ioctl_meteor.h
npx.h
pcaudioio.h
pcb_ext.h
perfmon.h
privatespace.h
smapi.h
speaker.h
vm86.h
xbox.h

Theoretically a dummy amd64 header should be created for each of them
that just includes the i386 header. The patch does this for npx.h. The
other headers seem to be really i386 specific or even outdated. If it
were ever necessary to cross-compile code that uses them, it would be
easy to modify that code to directly include <i386/header.h>.


Feel free to test the patches and to comment on any part of them.


--nextPart1799404.mOfT1x0N4T
Content-Type: application/pgp-signature; name=signature.asc 
Content-Description: This is a digitally signed message part.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.15 (FreeBSD)

iF4EABEIAAYFAkxRm7QACgkQfoCS2CCgtivT8AD/VOb8YCDFbvGNqKiPfx+D1oSz
CiOH80+ChiWcjC/cDPIA/Azn52ZFrE4eCDs1Cr/pQAIWAP71soOk1oNrvoWeYOI4
=eL00
-----END PGP SIGNATURE-----

--nextPart1799404.mOfT1x0N4T--

From owner-freebsd-arch@FreeBSD.ORG  Thu Jul 29 17:01:40 2010
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 6699B1065676
	for <freebsd-arch@freebsd.org>; Thu, 29 Jul 2010 17:01:40 +0000 (UTC)
	(envelope-from mdf356@gmail.com)
Received: from mail-px0-f182.google.com (mail-px0-f182.google.com
	[209.85.212.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 365D38FC25
	for <freebsd-arch@freebsd.org>; Thu, 29 Jul 2010 17:01:39 +0000 (UTC)
Received: by pxi8 with SMTP id 8so216812pxi.13
	for <multiple recipients>; Thu, 29 Jul 2010 10:01:39 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:received:sender:received:date
	:x-google-sender-auth:message-id:subject:from:to:cc:content-type;
	bh=KIrV2jo9MaNakAL1H67VVIzBfeBmPDIuDwku9dc8UnM=;
	b=ZEDzGXc3J6LzSxkUbYNwTirN0R/W7tzREOYl7NtHPRaSVkjKlT0c/3BVntN+NOSKYf
	RDznAeXARCZAoeBQSdWRrBpruD6pj07lVR7vwqW26irI2p56F0ZJEKH+rZ3YgJ8z26Tf
	aPxsJwXCQkMUucB6tathCYWAN2BwHH5I7LpY8=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:sender:date:x-google-sender-auth:message-id:subject
	:from:to:cc:content-type;
	b=Vjs3wmxJcHfQmQyw1yw+5mjkVNUTA5qJ4G6ViZrRPm+yU56kvYHOZvLWrkME+Dg5gQ
	rjBLNeYujBan59JNAb1bqRx5YH3arxqzCvM9jCjicn/2/o24OWdIXPvkmtJTN63SSt/E
	nKGTefgHBDlbZva1rWbZkXp/eyXNSMo2hFCzY=
MIME-Version: 1.0
Received: by 10.142.153.8 with SMTP id a8mr381059wfe.272.1280422899466; Thu, 
	29 Jul 2010 10:01:39 -0700 (PDT)
Sender: mdf356@gmail.com
Received: by 10.42.6.85 with HTTP; Thu, 29 Jul 2010 10:01:39 -0700 (PDT)
Date: Thu, 29 Jul 2010 10:01:39 -0700
X-Google-Sender-Auth: QdJZyDriMqxt9m0PI0t2dIVklQs
Message-ID: <AANLkTikhJZh3QXZbO0YJcsj+2H=HDpTnYgtD9=8hz+G4@mail.gmail.com>
From: mdf@FreeBSD.org
To: freebsd-arch@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
Cc: dwmalone@maths.tcd.ie, alc@freebsd.org, iedowse@freebsd.org
Subject: memguard(9) rewrite, part 2
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 29 Jul 2010 17:01:40 -0000

Back in March I asked about interest in a memguard(9) redo.  I've had
the time to get the code to a place I'm pretty happy with, and we've
successfully used it at work without running into some of the resource
limitations that the original memguard(9) gave.

http://people.freebsd.org/~mdf/bsd-memguard.diff

The gist of the new implementation is to reserve a lot of KVA for
memguard(9) to use, and then to avoid re-using KVA as long as
possible.  Rather than keep the physical pages around, though, on
free(9) the pages are returned to the system.  The KVA is allocated
using vm_map_findspace() from a current pointer into the memguard_map,
which is incremented until the end of the map is encountered, at which
time it wraps.  This is a "free" way to avoid re-use of KVA as long as
possible; any other scheme requires more than O(1) data to track what
has been used.

I've limited the KVA to 2x ram size, and also limited the physical
memory that memguard(9) can take to vm_memguard_divisor fraction of
physical memory (instead of limiting both KVA and physical to
vm_memguard_divisor as the original code did).

This patch also allows for tweaking which malloc type is guarded at
run time, will randomly guard allocations of any type if requested,
has a knob to always guard allocations of PAGE_SIZE or larger since it
won't waste any memory, will optionally add guard pages of unmapped
KVA at the beginning and end of the allocation to catch overruns more
easily, and also can impose minimum allocation sizes on guarded memory
so that the page promotions don't waste too much space.

Assuming alc@ is happy with the VM changes and no one has any further
suggestions, I'd like to commit this some time next week.  I'd also
like to MFC to stable/8 and stable/7 since this patch doesn't
introduce any KBI/ABI/KPI/API changes.

Apart from the general desire to have production systems run as fast
as possible, I'd really like more tools like memguard(9) to be
always-on, to help catch bugs the first time instead of requiring
multiple recreates.

Thanks,
matthew

From owner-freebsd-arch@FreeBSD.ORG  Thu Jul 29 22:27:18 2010
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B07C91065676
	for <freebsd-arch@freebsd.org>; Thu, 29 Jul 2010 22:27:18 +0000 (UTC)
	(envelope-from nwhitehorn@freebsd.org)
Received: from mail.icecube.wisc.edu (trout.icecube.wisc.edu [128.104.255.119])
	by mx1.freebsd.org (Postfix) with ESMTP id 86CB88FC1F
	for <freebsd-arch@freebsd.org>; Thu, 29 Jul 2010 22:27:18 +0000 (UTC)
Received: from localhost (localhost.localdomain [127.0.0.1])
	by mail.icecube.wisc.edu (Postfix) with ESMTP id ABC1D582C7;
	Thu, 29 Jul 2010 17:27:17 -0500 (CDT)
X-Virus-Scanned: amavisd-new at icecube.wisc.edu
Received: from mail.icecube.wisc.edu ([127.0.0.1])
	by localhost (trout.icecube.wisc.edu [127.0.0.1]) (amavisd-new,
	port 10030)
	with ESMTP id r3HHItGN-leA; Thu, 29 Jul 2010 17:27:17 -0500 (CDT)
Received: from wanderer.tachypleus.net
	(adsl-75-50-88-235.dsl.mdsnwi.sbcglobal.net [75.50.88.235])
	by mail.icecube.wisc.edu (Postfix) with ESMTP id 2F1DE582C4;
	Thu, 29 Jul 2010 17:27:17 -0500 (CDT)
Message-ID: <4C520044.5020002@freebsd.org>
Date: Fri, 30 Jul 2010 00:27:16 +0200
From: Nathan Whitehorn <nwhitehorn@freebsd.org>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US;
	rv:1.9.1.11) Gecko/20100727 Thunderbird/3.0.6
MIME-Version: 1.0
To: Tijl Coosemans <tijl@coosemans.org>
References: <201007291718.12687.tijl@coosemans.org>
In-Reply-To: <201007291718.12687.tijl@coosemans.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: pluknet <pluknet@gmail.com>, freebsd-arch@freebsd.org
Subject: Re: Support for cc -m32
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 29 Jul 2010 22:27:18 -0000

On 07/29/10 17:18, Tijl Coosemans wrote:
> Hi,
>
> I've put the initial version of some patches online to support cross
> compilation of 32 bit binaries on amd64. It's modelled after how NetBSD
> does this.
>
> With these patches something like "cc -m32 -o test test.c -pthread -lm"
> generates a program that runs on FreeBSD/i386.
>
> http://people.freebsd.org/~tijl/cc-m32-1.diff
> http://people.freebsd.org/~tijl/cc-m32-2.diff
> http://people.freebsd.org/~tijl/cc-m32-3.diff
>
> *cc-m32-1.diff* : Let ld and cc find 32 bit libraries.
>
> *cc-m32-2.diff* : Install i386 headers on amd64.
>    
Why not use the GCC multilib code for what patch 1 does? There is 
already code in cc_tools/Makefile to handle this for powerpc64 (where cc 
-m32 already works).
-Nathan

From owner-freebsd-arch@FreeBSD.ORG  Fri Jul 30 08:53:30 2010
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A4C61106567C
	for <freebsd-arch@freebsd.org>; Fri, 30 Jul 2010 08:53:30 +0000 (UTC)
	(envelope-from tijl@coosemans.org)
Received: from mailrelay004.isp.belgacom.be (mailrelay004.isp.belgacom.be
	[195.238.6.170])
	by mx1.freebsd.org (Postfix) with ESMTP id 16E128FC19
	for <freebsd-arch@freebsd.org>; Fri, 30 Jul 2010 08:53:29 +0000 (UTC)
X-Belgacom-Dynamic: yes
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AvsEAAYwUkxbscBD/2dsb2JhbACgF3K+aYU5BA
Received: from 67.192-177-91.adsl-dyn.isp.belgacom.be (HELO
	kalimero.tijl.coosemans.org) ([91.177.192.67])
	by relay.skynet.be with ESMTP; 30 Jul 2010 10:53:28 +0200
Received: from kalimero.tijl.coosemans.org (kalimero.tijl.coosemans.org
	[127.0.0.1])
	by kalimero.tijl.coosemans.org (8.14.4/8.14.4) with ESMTP id
	o6U8rRvK002422; Fri, 30 Jul 2010 10:53:28 +0200 (CEST)
	(envelope-from tijl@coosemans.org)
From: Tijl Coosemans <tijl@coosemans.org>
To: Nathan Whitehorn <nwhitehorn@freebsd.org>
Date: Fri, 30 Jul 2010 10:53:17 +0200
User-Agent: KMail/1.13.5 (FreeBSD/8.1-PRERELEASE; KDE/4.4.5; i386; ; )
References: <201007291718.12687.tijl@coosemans.org>
	<4C520044.5020002@freebsd.org>
In-Reply-To: <4C520044.5020002@freebsd.org>
MIME-Version: 1.0
Content-Type: multipart/signed; boundary="nextPart2641494.0OStBKXvGs";
	protocol="application/pgp-signature"; micalg=pgp-sha256
Content-Transfer-Encoding: 7bit
Message-Id: <201007301053.27407.tijl@coosemans.org>
Cc: pluknet <pluknet@gmail.com>, freebsd-arch@freebsd.org
Subject: Re: Support for cc -m32
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 30 Jul 2010 08:53:30 -0000

--nextPart2641494.0OStBKXvGs
Content-Type: Text/Plain;
  charset="us-ascii"
Content-Transfer-Encoding: 7bit

On Friday 30 July 2010 00:27:16 Nathan Whitehorn wrote:
> On 07/29/10 17:18, Tijl Coosemans wrote:
>> I've put the initial version of some patches online to support cross
>> compilation of 32 bit binaries on amd64. It's modelled after how NetBSD
>> does this.
>>
>> With these patches something like "cc -m32 -o test test.c -pthread -lm"
>> generates a program that runs on FreeBSD/i386.
>>
>> http://people.freebsd.org/~tijl/cc-m32-1.diff
>> http://people.freebsd.org/~tijl/cc-m32-2.diff
>> http://people.freebsd.org/~tijl/cc-m32-3.diff
>>
>> *cc-m32-1.diff* : Let ld and cc find 32 bit libraries.
>
> Why not use the GCC multilib code for what patch 1 does? There is
> already code in cc_tools/Makefile to handle this for powerpc64 (where
> cc -m32 already works).

Thanks, it's indeed better to specify this per architecture so I've
updated the patch. It changes the output of -print-search-dirs though.

With the previous patch "cc -m32 -print-search-dirs" printed:

  install: /usr/libexec/
  programs: =/usr/bin/:/usr/bin/:/usr/libexec/:/usr/libexec/:/usr/libexec/
  libraries: =/usr/lib32/:/usr/lib32/

And now it prints:

  install: /usr/libexec/
  programs: =/usr/bin/:/usr/bin/:/usr/libexec/:/usr/libexec/:/usr/libexec/
  libraries: =/usr/lib/32/:/usr/lib/../lib32/:/usr/lib/:/usr/lib/

That works, but it's not entirely correct.

--nextPart2641494.0OStBKXvGs
Content-Type: application/pgp-signature; name=signature.asc 
Content-Description: This is a digitally signed message part.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.15 (FreeBSD)

iF4EABEIAAYFAkxSkwcACgkQfoCS2CCgtisPOAD/QduCN05QUX07YjqhZfH3FTKc
tCUmX/svoR98579BkDIA+wbjmTP5n5LnT7E3B6JktpYe9ByYjB8nL2rhBwH4s5SY
=SAB4
-----END PGP SIGNATURE-----

--nextPart2641494.0OStBKXvGs--

From owner-freebsd-arch@FreeBSD.ORG  Fri Jul 30 12:11:17 2010
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C3443106566B
	for <freebsd-arch@freebsd.org>; Fri, 30 Jul 2010 12:11:17 +0000 (UTC)
	(envelope-from nwhitehorn@freebsd.org)
Received: from mail.icecube.wisc.edu (trout.icecube.wisc.edu [128.104.255.119])
	by mx1.freebsd.org (Postfix) with ESMTP id 86B618FC08
	for <freebsd-arch@freebsd.org>; Fri, 30 Jul 2010 12:11:17 +0000 (UTC)
Received: from localhost (localhost.localdomain [127.0.0.1])
	by mail.icecube.wisc.edu (Postfix) with ESMTP id B6EA6582C7;
	Fri, 30 Jul 2010 07:11:16 -0500 (CDT)
X-Virus-Scanned: amavisd-new at icecube.wisc.edu
Received: from mail.icecube.wisc.edu ([127.0.0.1])
	by localhost (trout.icecube.wisc.edu [127.0.0.1]) (amavisd-new,
	port 10030)
	with ESMTP id ppjpAzczgcxs; Fri, 30 Jul 2010 07:11:16 -0500 (CDT)
Received: from wanderer.tachypleus.net
	(adsl-75-50-88-235.dsl.mdsnwi.sbcglobal.net [75.50.88.235])
	by mail.icecube.wisc.edu (Postfix) with ESMTP id 297DD582C2;
	Fri, 30 Jul 2010 07:11:16 -0500 (CDT)
Message-ID: <4C52C163.9010601@freebsd.org>
Date: Fri, 30 Jul 2010 14:11:15 +0200
From: Nathan Whitehorn <nwhitehorn@freebsd.org>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US;
	rv:1.9.1.11) Gecko/20100727 Thunderbird/3.0.6
MIME-Version: 1.0
To: Tijl Coosemans <tijl@coosemans.org>
References: <201007291718.12687.tijl@coosemans.org>	<4C520044.5020002@freebsd.org>
	<201007301053.27407.tijl@coosemans.org>
In-Reply-To: <201007301053.27407.tijl@coosemans.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: pluknet <pluknet@gmail.com>, freebsd-arch@freebsd.org
Subject: Re: Support for cc -m32
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 30 Jul 2010 12:11:17 -0000

On 07/30/10 10:53, Tijl Coosemans wrote:
> On Friday 30 July 2010 00:27:16 Nathan Whitehorn wrote:
>    
>> On 07/29/10 17:18, Tijl Coosemans wrote:
>>      
>>> I've put the initial version of some patches online to support cross
>>> compilation of 32 bit binaries on amd64. It's modelled after how NetBSD
>>> does this.
>>>
>>> With these patches something like "cc -m32 -o test test.c -pthread -lm"
>>> generates a program that runs on FreeBSD/i386.
>>>
>>> http://people.freebsd.org/~tijl/cc-m32-1.diff
>>> http://people.freebsd.org/~tijl/cc-m32-2.diff
>>> http://people.freebsd.org/~tijl/cc-m32-3.diff
>>>
>>> *cc-m32-1.diff* : Let ld and cc find 32 bit libraries.
>>>        
>> Why not use the GCC multilib code for what patch 1 does? There is
>> already code in cc_tools/Makefile to handle this for powerpc64 (where
>> cc -m32 already works).
>>      
> Thanks, it's indeed better to specify this per architecture so I've
> updated the patch. It changes the output of -print-search-dirs though.
>    
> With the previous patch "cc -m32 -print-search-dirs" printed:
>
>    install: /usr/libexec/
>    programs: =/usr/bin/:/usr/bin/:/usr/libexec/:/usr/libexec/:/usr/libexec/
>    libraries: =/usr/lib32/:/usr/lib32/
>
> And now it prints:
>
>    install: /usr/libexec/
>    programs: =/usr/bin/:/usr/bin/:/usr/libexec/:/usr/libexec/:/usr/libexec/
>    libraries: =/usr/lib/32/:/usr/lib/../lib32/:/usr/lib/:/usr/lib/
>
> That works, but it's not entirely correct.
>    
That's just an artifact of the way multilib works, I'm afraid. Is there 
a reason it could be harmful?
-Nathan


From owner-freebsd-arch@FreeBSD.ORG  Fri Jul 30 19:20:17 2010
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 530A0106566B;
	Fri, 30 Jul 2010 19:20:17 +0000 (UTC) (envelope-from alc@cs.rice.edu)
Received: from mail.cs.rice.edu (mail.cs.rice.edu [128.42.1.31])
	by mx1.freebsd.org (Postfix) with ESMTP id 28D0A8FC0A;
	Fri, 30 Jul 2010 19:20:17 +0000 (UTC)
Received: from mail.cs.rice.edu (localhost.localdomain [127.0.0.1])
	by mail.cs.rice.edu (Postfix) with ESMTP id A7F142C2ACE;
	Fri, 30 Jul 2010 13:50:09 -0500 (CDT)
X-Virus-Scanned: by amavis-2.4.0 at mail.cs.rice.edu
Received: from mail.cs.rice.edu ([127.0.0.1])
	by mail.cs.rice.edu (mail.cs.rice.edu [127.0.0.1]) (amavisd-new,
	port 10024)
	with LMTP id RJ6E4GYb6DLF; Fri, 30 Jul 2010 13:50:02 -0500 (CDT)
Received: from adsl-216-63-78-18.dsl.hstntx.swbell.net
	(adsl-216-63-78-18.dsl.hstntx.swbell.net [216.63.78.18])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mail.cs.rice.edu (Postfix) with ESMTP id 9F7332C2B32;
	Fri, 30 Jul 2010 13:50:00 -0500 (CDT)
Message-ID: <4C531ED7.9010601@cs.rice.edu>
Date: Fri, 30 Jul 2010 13:49:59 -0500
From: Alan Cox <alc@cs.rice.edu>
User-Agent: Thunderbird 2.0.0.24 (X11/20100501)
MIME-Version: 1.0
To: John Baldwin <jhb@freebsd.org>
References: <4C4DB2B8.9080404@freebsd.org> <4C4DD1AA.3050906@freebsd.org>
	<AANLkTimWcXHAz=K1UM6ECa=6xR5KuS-sf_nDhbFEgehq@mail.gmail.com>
	<201007270935.52082.jhb@freebsd.org>
In-Reply-To: <201007270935.52082.jhb@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: alc@freebsd.org, Matthew Fleming <mdf356@gmail.com>,
	Andriy Gapon <avg@freebsd.org>, freebsd-arch@freebsd.org
Subject: Re: amd64: change VM_KMEM_SIZE_SCALE to 1?
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 30 Jul 2010 19:20:17 -0000

John Baldwin wrote:
> On Monday, July 26, 2010 3:30:59 pm Alan Cox wrote:
>   
>> As far as eliminating or reducing the manual tuning that many ZFS users do,
>> I would love to see someone tackle the overly conservative hard limit that
>> we place on the number of vnode structures.  The current hard limit was put
>> in place when we had just introduced mutexes into many structures and more a
>> mutex was much larger than it is today.
>>     
>
>   

I took a look at the history of the "desiredvnodes" computation.  Prior 
to r115266, in May of 2003, the computation was based on physical memory 
and there was no MAXVNODES_MAX limit.  It was simply:

desiredvnodes = maxproc + cnt.v_page_count / 4;

r115266 introduced the min() that also took into account the virtual 
address space limit on the heap.  As I recall, it was to stop "kmem_map 
too small" panics.  In fact, I was asked to make this change by re@.

Finally, in August 2004, r133038, introduced MAXVNODES_MAX.  The commit 
message doesn't say, but I think the motivation was again to stop 
"kmem_map too small" panics.  In effect, the virtual address space limit 
introduced by r115266 wasn't working.

Enough history, here are some data points for the "desiredvnodes" 
computation on amd64 and i386 above and below the point where 
MAXVNODES_MAX has an effect.  "phys" is the number of vnodes that would 
be allowed based upon physical memory size, and "virt" is the number of 
vnodes that would be allowed based upon virtual memory size.

amd64:

2GB

phys: 132668
virt: 397057

1.5GB
phys: 100862
virt: 297228

1GB
phys: 69056
virt: 197398

512MB
phys: 35106
virt: 97569
 
i386:

2GB

phys: 134106
virt: 328965

1.5GB

phys: 101916
virt: 328965

1GB

phys: 69725
virt: 328965

512MB

phys: 35576
virt: 168875

For both architectures, the "phys" limit is the limiting factor until we 
reach about 1.5GB of physical memory.  MAXVNODES_MAX is only a factor 
machines on machines with more than 1.5GB of RAM.  So, whatever change 
we might make to MAXVNODES_MAX shouldn't affect the small embedded 
systems that are running FreeBSD.

Even though "virt" is never a factor on amd64, it's worth noticing that 
in both absolute and relative terms "virt" grows faster than "phys".  On 
i386, "virt" starts out larger than on amd64 because a vnode and a 
vm_object are smaller relative to vm_kmem_size, but "virt" reaches its 
maximum by 1GB of RAM because vm_kmem_size has already reached its 
maximum by then.  Nonetheless, even on i386, "virt" is never a factor.  
(For what it's worth, if I extrapolate, an i386/PAE machine with greater 
than 5GB of RAM will have a larger "phys" than "virt".)

> I have a strawman of that (relative to 7).  It simply adjusts the hardcoded 
> maximum to instead be a function of the amount of physical memory.
>
>   

Unless I'm misreading this patch, it would allow "desiredvnodes" to grow 
(slowly) on i386/PAE starting at 5GB of RAM until we reach the (too 
high) "virt" limit of about 329,000.  Yes?  For example, an 8GB i386/PAE 
machine would have 60% more vnodes than was allowed by MAXVNODE_MAX, and 
it would not stop there.  I think that we should be concerned about 
that, because MAXVNODE_MAX came about because the "virt" limit wasn't 
working.

As the numbers above show, we could more than halve the growth rate for 
"virt" and it would have no effect on either amd64 or i386 machines with 
up to 1.5GB of RAM.  They would have just as many vnodes.  Then, with 
that slower growth rate, we could simply eliminate MAXVNODES_MAX (or at 
least configure it to some absurdly large value), thereby relieving the 
fixed cap on amd64, where it isn't needed.

With that in mind, the following patch slows the growth of "virt" from 
2/5 of vm_kmem_size to 1/7.  This has no effect on amd64.  However, on 
i386. it allows desiredvnodes to grow slowly for machines with 1.5GB to 
about 2.5GB of RAM, ultimately exceeding the old desiredvnodes cap by 
about 17%.  Once we exceed the old cap, we increase desiredvnodes at a 
marginal rate that is almost the same as your patch, about 1% of 
physical memory.  It's just computed differently.

Using 1/8 instead of 1/7, amd64 machines with less than about 1.5GB lose 
about 7% of their vnodes, but they catch up and pass the old limit by 
1.625GB.  Perhaps, more importantly, i386 machines only exceed the old 
cap by 3%.

Thoughts?

Index: kern/vfs_subr.c
===================================================================
--- kern/vfs_subr.c     (revision 210504)
+++ kern/vfs_subr.c     (working copy)
@@ -284,21 +284,29 @@ SYSCTL_INT(_debug, OID_AUTO, vnlru_nowhere, CTLFLA
  * Initialize the vnode management data structures.
  */
 #ifndef        MAXVNODES_MAX
-#define        MAXVNODES_MAX   100000
+#define        MAXVNODES_MAX   8388608 /* Reevaluate when physmem 
exceeds 512GB. */
 #endif
 static void
 vntblinit(void *dummy __unused)
 {
+       int physvnodes, virtvnodes;
 
        /*
-        * Desiredvnodes is a function of the physical memory size and
-        * the kernel's heap size.  Specifically, desiredvnodes scales
-        * in proportion to the physical memory size until two fifths
-        * of the kernel's heap size is consumed by vnodes and vm
-        * objects.
+        * Desiredvnodes is a function of the physical memory size and the
+        * kernel's heap size.  Generally speaking, it scales with the
+        * physical memory size.  The ratio of desiredvnodes to physical 
pages
+        * is one to four until desiredvnodes exceeds 96K.  Thereafter, the
+        * marginal ratio of desiredvnodes to physical pages is one to 
sixteen.
+        * However, desiredvnodes is limited by the kernel's heap size.  The
+        * memory required by desiredvnodes vnodes and vm objects may not
+        * exceed one seventh of the kernel's heap size.
         */
-       desiredvnodes = min(maxproc + cnt.v_page_count / 4, 2 * 
vm_kmem_size /
-           (5 * (sizeof(struct vm_object) + sizeof(struct vnode))));
+       physvnodes = maxproc + cnt.v_page_count / 16 + 3 * min(393216,
+           cnt.v_page_count) / 16;
+       virtvnodes = vm_kmem_size / (7 * (sizeof(struct vm_object) +
+           sizeof(struct vnode)));
+       printf("physvnodes = %d\nvirtvnodes = %d\n", physvnodes, 
virtvnodes);
+       desiredvnodes = min(physvnodes, virtvnodes);
        if (desiredvnodes > MAXVNODES_MAX) {
                if (bootverbose)
                        printf("Reducing kern.maxvnodes %d -> %d\n",


> Index: vfs_subr.c
> ===================================================================
> --- vfs_subr.c	(revision 210934)
> +++ vfs_subr.c	(working copy)
> @@ -288,6 +288,7 @@
>  static void
>  vntblinit(void *dummy __unused)
>  {
> +	int vnodes;
>  
>  	/*
>  	 * Desiredvnodes is a function of the physical memory size and
> @@ -299,10 +300,19 @@
>  	desiredvnodes = min(maxproc + cnt.v_page_count / 4, 2 * vm_kmem_size /
>  	    (5 * (sizeof(struct vm_object) + sizeof(struct vnode))));
>  	if (desiredvnodes > MAXVNODES_MAX) {
> +
> +		/*
> +		 * If there is a lot of physical memory, allow the cap
> +		 * on vnodes to expand to using a little under 1% of
> +		 * available RAM.
> +		 */
> +		vnodes = max(MAXVNODES_MAX, cnt.v_page_count * (PAGE_SIZE /
> +		    128) / (sizeof(struct vm_object) + sizeof(struct vnode)));
> +		KASSERT(vnodes < desiredvnodes, ("capped vnodes too big"));
>  		if (bootverbose)
>  			printf("Reducing kern.maxvnodes %d -> %d\n",
> -			    desiredvnodes, MAXVNODES_MAX);
> -		desiredvnodes = MAXVNODES_MAX;
> +			    desiredvnodes, vnodes);
> +		desiredvnodes = vnodes;
>  	}
>  	wantfreevnodes = desiredvnodes / 4;
>  	mtx_init(&mntid_mtx, "mntid", NULL, MTX_DEF);
>
>   


From owner-freebsd-arch@FreeBSD.ORG  Fri Jul 30 21:19:08 2010
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 583211065677;
	Fri, 30 Jul 2010 21:19:08 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 271C68FC0A;
	Fri, 30 Jul 2010 21:19:08 +0000 (UTC)
Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net
	[66.111.2.69])
	by cyrus.watson.org (Postfix) with ESMTPSA id 2876946B7F;
	Fri, 30 Jul 2010 17:19:07 -0400 (EDT)
Received: from jhbbsd.localnet (smtp.hudson-trading.com [209.249.190.9])
	by bigwig.baldwin.cx (Postfix) with ESMTPSA id 250A18A03C;
	Fri, 30 Jul 2010 17:19:06 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: Alan Cox <alc@cs.rice.edu>
Date: Fri, 30 Jul 2010 16:14:40 -0400
User-Agent: KMail/1.13.5 (FreeBSD/7.3-CBSD-20100217; KDE/4.4.5; amd64; ; )
References: <4C4DB2B8.9080404@freebsd.org> <201007270935.52082.jhb@freebsd.org>
	<4C531ED7.9010601@cs.rice.edu>
In-Reply-To: <4C531ED7.9010601@cs.rice.edu>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201007301614.40768.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1
	(bigwig.baldwin.cx); Fri, 30 Jul 2010 17:19:06 -0400 (EDT)
X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx
X-Virus-Status: Clean
X-Spam-Status: No, score=-2.6 required=4.2 tests=AWL,BAYES_00 autolearn=ham
	version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx
Cc: alc@freebsd.org, Matthew Fleming <mdf356@gmail.com>,
	Andriy Gapon <avg@freebsd.org>, freebsd-arch@freebsd.org
Subject: Re: amd64: change VM_KMEM_SIZE_SCALE to 1?
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 30 Jul 2010 21:19:08 -0000

On Friday, July 30, 2010 2:49:59 pm Alan Cox wrote:
> John Baldwin wrote:
> > I have a strawman of that (relative to 7).  It simply adjusts the hardcoded 
> > maximum to instead be a function of the amount of physical memory.
> >
> >   
> 
> Unless I'm misreading this patch, it would allow "desiredvnodes" to grow 
> (slowly) on i386/PAE starting at 5GB of RAM until we reach the (too 
> high) "virt" limit of about 329,000.  Yes?  For example, an 8GB i386/PAE 
> machine would have 60% more vnodes than was allowed by MAXVNODE_MAX, and 
> it would not stop there.  I think that we should be concerned about 
> that, because MAXVNODE_MAX came about because the "virt" limit wasn't 
> working.

Agreed.

> As the numbers above show, we could more than halve the growth rate for 
> "virt" and it would have no effect on either amd64 or i386 machines with 
> up to 1.5GB of RAM.  They would have just as many vnodes.  Then, with 
> that slower growth rate, we could simply eliminate MAXVNODES_MAX (or at 
> least configure it to some absurdly large value), thereby relieving the 
> fixed cap on amd64, where it isn't needed.
> 
> With that in mind, the following patch slows the growth of "virt" from 
> 2/5 of vm_kmem_size to 1/7.  This has no effect on amd64.  However, on 
> i386. it allows desiredvnodes to grow slowly for machines with 1.5GB to 
> about 2.5GB of RAM, ultimately exceeding the old desiredvnodes cap by 
> about 17%.  Once we exceed the old cap, we increase desiredvnodes at a 
> marginal rate that is almost the same as your patch, about 1% of 
> physical memory.  It's just computed differently.
> 
> Using 1/8 instead of 1/7, amd64 machines with less than about 1.5GB lose 
> about 7% of their vnodes, but they catch up and pass the old limit by 
> 1.625GB.  Perhaps, more importantly, i386 machines only exceed the old 
> cap by 3%.
> 
> Thoughts?

I think this is much better.  My strawman was rather hackish in that it was
layering a hack on top of the existing calculations.  I prefer your approach.
I do not think penalizing amd64 machines with less than 1.5GB is a big worry
as most x86 machines with a small amount of memory are probably running as
i386 anyway.  Given that, I would probably lean towards 1/8 instead of 1/7,
but I would be happy with either one.

> Index: kern/vfs_subr.c
> ===================================================================
> --- kern/vfs_subr.c     (revision 210504)
> +++ kern/vfs_subr.c     (working copy)
> @@ -284,21 +284,29 @@ SYSCTL_INT(_debug, OID_AUTO, vnlru_nowhere, CTLFLA
>   * Initialize the vnode management data structures.
>   */
>  #ifndef        MAXVNODES_MAX
> -#define        MAXVNODES_MAX   100000
> +#define        MAXVNODES_MAX   8388608 /* Reevaluate when physmem 
> exceeds 512GB. */
>  #endif

How is this value computed?  I would prefer something like:

'512 * 1024 * 1024 * 1024 / (sizeof(struct vnode) + sizeof(struct vm_object) / N'

if that is how it is computed.  A brief note about the magic number of 393216
would also be nice to have (and if it could be a constant with a similar
formula value that would be nice, too.).

>  static void
>  vntblinit(void *dummy __unused)
>  {
> +       int physvnodes, virtvnodes;
>  
>         /*
> -        * Desiredvnodes is a function of the physical memory size and
> -        * the kernel's heap size.  Specifically, desiredvnodes scales
> -        * in proportion to the physical memory size until two fifths
> -        * of the kernel's heap size is consumed by vnodes and vm
> -        * objects.
> +        * Desiredvnodes is a function of the physical memory size and the
> +        * kernel's heap size.  Generally speaking, it scales with the
> +        * physical memory size.  The ratio of desiredvnodes to physical 
> pages
> +        * is one to four until desiredvnodes exceeds 96K.  Thereafter, the
> +        * marginal ratio of desiredvnodes to physical pages is one to 
> sixteen.
> +        * However, desiredvnodes is limited by the kernel's heap size.  The
> +        * memory required by desiredvnodes vnodes and vm objects may not
> +        * exceed one seventh of the kernel's heap size.
>          */
> -       desiredvnodes = min(maxproc + cnt.v_page_count / 4, 2 * 
> vm_kmem_size /
> -           (5 * (sizeof(struct vm_object) + sizeof(struct vnode))));
> +       physvnodes = maxproc + cnt.v_page_count / 16 + 3 * min(393216,
> +           cnt.v_page_count) / 16;
> +       virtvnodes = vm_kmem_size / (7 * (sizeof(struct vm_object) +
> +           sizeof(struct vnode)));
> +       printf("physvnodes = %d\nvirtvnodes = %d\n", physvnodes, 
> virtvnodes);
> +       desiredvnodes = min(physvnodes, virtvnodes);
>         if (desiredvnodes > MAXVNODES_MAX) {
>                 if (bootverbose)
>                         printf("Reducing kern.maxvnodes %d -> %d\n",
> 
> 

-- 
John Baldwin

From owner-freebsd-arch@FreeBSD.ORG  Sat Jul 31 05:36:32 2010
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 335271065676
	for <freebsd-arch@freebsd.org>; Sat, 31 Jul 2010 05:36:32 +0000 (UTC)
	(envelope-from peterjeremy@acm.org)
Received: from mail11.syd.optusnet.com.au (mail11.syd.optusnet.com.au
	[211.29.132.192])
	by mx1.freebsd.org (Postfix) with ESMTP id B879F8FC18
	for <freebsd-arch@freebsd.org>; Sat, 31 Jul 2010 05:36:31 +0000 (UTC)
Received: from server.vk2pj.dyndns.org
	(c211-30-160-13.belrs4.nsw.optusnet.com.au [211.30.160.13])
	by mail11.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	o6V5aOWb018725
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Sat, 31 Jul 2010 15:36:26 +1000
X-Bogosity: Ham, spamicity=0.000000
Received: from server.vk2pj.dyndns.org (localhost.vk2pj.dyndns.org [127.0.0.1])
	by server.vk2pj.dyndns.org (8.14.4/8.14.4) with ESMTP id o6V5aNqc027887;
	Sat, 31 Jul 2010 15:36:23 +1000 (EST)
	(envelope-from peter@server.vk2pj.dyndns.org)
Received: (from peter@localhost)
	by server.vk2pj.dyndns.org (8.14.4/8.14.4/Submit) id o6V5aLrS027886;
	Sat, 31 Jul 2010 15:36:21 +1000 (EST) (envelope-from peter)
Date: Sat, 31 Jul 2010 15:36:21 +1000
From: Peter Jeremy <peterjeremy@acm.org>
To: Tijl Coosemans <tijl@coosemans.org>
Message-ID: <20100731053621.GA27772@server.vk2pj.dyndns.org>
References: <201007291718.12687.tijl@coosemans.org>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="ReaqsoxgOBHFXBhH"
Content-Disposition: inline
In-Reply-To: <201007291718.12687.tijl@coosemans.org>
X-PGP-Key: http://members.optusnet.com.au/peterjeremy/pubkey.asc
User-Agent: Mutt/1.5.20 (2009-06-14)
Cc: pluknet <pluknet@gmail.com>, freebsd-arch@freebsd.org
Subject: Re: Support for cc -m32
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 31 Jul 2010 05:36:32 -0000


--ReaqsoxgOBHFXBhH
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On 2010-Jul-29 17:18:03 +0200, Tijl Coosemans <tijl@coosemans.org> wrote:
>I've put the initial version of some patches online to support cross
>compilation of 32 bit binaries on amd64. It's modelled after how NetBSD
>does this.

I presume you are aware of gnu/112215 (and maybe others).

>With these patches something like "cc -m32 -o test test.c -pthread -lm"
>generates a program that runs on FreeBSD/i386.

That's an improvement on my patches (in 112215) - they resulted in
the i386 binaries having references to /libexec/ld-elf32.so.1 and
/usr/lib32/*.so - so they would run in i386 compatibility mode on
amd64 but not on native i386.

>Feel free to test the patches and to comment on any part of them.

I hope to get some time to do this in a few days.

--=20
Peter Jeremy

--ReaqsoxgOBHFXBhH
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.15 (FreeBSD)

iEYEARECAAYFAkxTtlUACgkQ/opHv/APuIfUPwCgq6fgWy1GnMAcCZzFSW/CqvoR
5zMAn0JcWX8kNLllX+WA9oQsijaUanNP
=r5tm
-----END PGP SIGNATURE-----

--ReaqsoxgOBHFXBhH--

From owner-freebsd-arch@FreeBSD.ORG  Sat Jul 31 21:39:50 2010
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 721D21065670;
	Sat, 31 Jul 2010 21:39:50 +0000 (UTC) (envelope-from alc@cs.rice.edu)
Received: from mail.cs.rice.edu (mail.cs.rice.edu [128.42.1.31])
	by mx1.freebsd.org (Postfix) with ESMTP id 3BE2C8FC1E;
	Sat, 31 Jul 2010 21:39:49 +0000 (UTC)
Received: from mail.cs.rice.edu (localhost.localdomain [127.0.0.1])
	by mail.cs.rice.edu (Postfix) with ESMTP id 825672C2B32;
	Sat, 31 Jul 2010 16:39:49 -0500 (CDT)
X-Virus-Scanned: by amavis-2.4.0 at mail.cs.rice.edu
Received: from mail.cs.rice.edu ([127.0.0.1])
	by mail.cs.rice.edu (mail.cs.rice.edu [127.0.0.1]) (amavisd-new,
	port 10024)
	with LMTP id bt9g5GN86489; Sat, 31 Jul 2010 16:39:41 -0500 (CDT)
Received: from adsl-216-63-78-18.dsl.hstntx.swbell.net
	(adsl-216-63-78-18.dsl.hstntx.swbell.net [216.63.78.18])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mail.cs.rice.edu (Postfix) with ESMTP id 579342C2ACA;
	Sat, 31 Jul 2010 16:39:41 -0500 (CDT)
Message-ID: <4C54981B.9080209@cs.rice.edu>
Date: Sat, 31 Jul 2010 16:39:39 -0500
From: Alan Cox <alc@cs.rice.edu>
User-Agent: Thunderbird 2.0.0.24 (X11/20100501)
MIME-Version: 1.0
To: John Baldwin <jhb@freebsd.org>
References: <4C4DB2B8.9080404@freebsd.org> <201007270935.52082.jhb@freebsd.org>
	<4C531ED7.9010601@cs.rice.edu> <201007301614.40768.jhb@freebsd.org>
In-Reply-To: <201007301614.40768.jhb@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: alc@freebsd.org, freebsd-arch@freebsd.org
Subject: Re: amd64: change VM_KMEM_SIZE_SCALE to 1?
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 31 Jul 2010 21:39:50 -0000

John Baldwin wrote:
> On Friday, July 30, 2010 2:49:59 pm Alan Cox wrote:
>   
>> John Baldwin wrote:
>>     
>>> I have a strawman of that (relative to 7).  It simply adjusts the hardcoded 
>>> maximum to instead be a function of the amount of physical memory.
>>>
>>>   
>>>       
>> Unless I'm misreading this patch, it would allow "desiredvnodes" to grow 
>> (slowly) on i386/PAE starting at 5GB of RAM until we reach the (too 
>> high) "virt" limit of about 329,000.  Yes?  For example, an 8GB i386/PAE 
>> machine would have 60% more vnodes than was allowed by MAXVNODE_MAX, and 
>> it would not stop there.  I think that we should be concerned about 
>> that, because MAXVNODE_MAX came about because the "virt" limit wasn't 
>> working.
>>     
>
> Agreed.
>
>   
>> As the numbers above show, we could more than halve the growth rate for 
>> "virt" and it would have no effect on either amd64 or i386 machines with 
>> up to 1.5GB of RAM.  They would have just as many vnodes.  Then, with 
>> that slower growth rate, we could simply eliminate MAXVNODES_MAX (or at 
>> least configure it to some absurdly large value), thereby relieving the 
>> fixed cap on amd64, where it isn't needed.
>>
>> With that in mind, the following patch slows the growth of "virt" from 
>> 2/5 of vm_kmem_size to 1/7.  This has no effect on amd64.  However, on 
>> i386. it allows desiredvnodes to grow slowly for machines with 1.5GB to 
>> about 2.5GB of RAM, ultimately exceeding the old desiredvnodes cap by 
>> about 17%.  Once we exceed the old cap, we increase desiredvnodes at a 
>> marginal rate that is almost the same as your patch, about 1% of 
>> physical memory.  It's just computed differently.
>>
>> Using 1/8 instead of 1/7, amd64 machines with less than about 1.5GB lose 
>> about 7% of their vnodes, but they catch up and pass the old limit by 
>> 1.625GB.  Perhaps, more importantly, i386 machines only exceed the old 
>> cap by 3%.
>>
>> Thoughts?
>>     
>
> I think this is much better.  My strawman was rather hackish in that it was
> layering a hack on top of the existing calculations.  I prefer your approach.
> I do not think penalizing amd64 machines with less than 1.5GB is a big worry
> as most x86 machines with a small amount of memory are probably running as
> i386 anyway.  Given that, I would probably lean towards 1/8 instead of 1/7,
> but I would be happy with either one.
>
>   

I've looked a bit at an i386/PAE system with 8GB.  I don't think that a 
default configuration, e.g., no changes to the mbuf limits, is at risk 
with 1/7.

>> Index: kern/vfs_subr.c
>> ===================================================================
>> --- kern/vfs_subr.c     (revision 210504)
>> +++ kern/vfs_subr.c     (working copy)
>> @@ -284,21 +284,29 @@ SYSCTL_INT(_debug, OID_AUTO, vnlru_nowhere, CTLFLA
>>   * Initialize the vnode management data structures.
>>   */
>>  #ifndef        MAXVNODES_MAX
>> -#define        MAXVNODES_MAX   100000
>> +#define        MAXVNODES_MAX   8388608 /* Reevaluate when physmem 
>> exceeds 512GB. */
>>  #endif
>>     
>
> How is this value computed?  I would prefer something like:
>
> '512 * 1024 * 1024 * 1024 / (sizeof(struct vnode) + sizeof(struct vm_object) / N'
>
> if that is how it is computed.  A brief note about the magic number of 393216
> would also be nice to have (and if it could be a constant with a similar
> formula value that would be nice, too.).
>
>   

I've tried to explain this computation below.

Index: kern/vfs_subr.c
===================================================================
--- kern/vfs_subr.c     (revision 210702)
+++ kern/vfs_subr.c     (working copy)
@@ -282,23 +282,34 @@ SYSCTL_INT(_debug, OID_AUTO, vnlru_nowhere, CTLFLA
 
 /*
  * Initialize the vnode management data structures.
+ *
+ * Reevaluate the following cap on the number of vnodes after the physical
+ * memory size exceeds 512GB.  In the limit, as the physical memory size
+ * grows, the ratio of physical pages to vnodes approaches sixteen to one.
  */
 #ifndef        MAXVNODES_MAX
-#define        MAXVNODES_MAX   100000
+#define        MAXVNODES_MAX   (512 * (1024 * 1024 * 1024 / PAGE_SIZE / 
16))
 #endif
 static void
 vntblinit(void *dummy __unused)
 {
+       int physvnodes, virtvnodes;
 
        /*
-        * Desiredvnodes is a function of the physical memory size and
-        * the kernel's heap size.  Specifically, desiredvnodes scales
-        * in proportion to the physical memory size until two fifths
-        * of the kernel's heap size is consumed by vnodes and vm
-        * objects.
+        * Desiredvnodes is a function of the physical memory size and the
+        * kernel's heap size.  Generally speaking, it scales with the
+        * physical memory size.  The ratio of desiredvnodes to physical 
pages
+        * is one to four until desiredvnodes exceeds 98,304.  
Thereafter, the
+        * marginal ratio of desiredvnodes to physical pages is one to
+        * sixteen.  However, desiredvnodes is limited by the kernel's heap
+        * size.  The memory required by desiredvnodes vnodes and vm objects
+        * may not exceed one seventh of the kernel's heap size.
         */
-       desiredvnodes = min(maxproc + cnt.v_page_count / 4, 2 * 
vm_kmem_size /
-           (5 * (sizeof(struct vm_object) + sizeof(struct vnode))));
+       physvnodes = maxproc + cnt.v_page_count / 16 + 3 * min(98304 * 4,
+           cnt.v_page_count) / 16;
+       virtvnodes = vm_kmem_size / (7 * (sizeof(struct vm_object) +
+           sizeof(struct vnode)));
+       desiredvnodes = min(physvnodes, virtvnodes);
        if (desiredvnodes > MAXVNODES_MAX) {
                if (bootverbose)
                        printf("Reducing kern.maxvnodes %d -> %d\n",