From owner-freebsd-arch@FreeBSD.ORG  Thu Dec  1 15:44:00 2011
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id EEF81106566C;
	Thu,  1 Dec 2011 15:44:00 +0000 (UTC)
	(envelope-from nwhitehorn@freebsd.org)
Received: from argol.doit.wisc.edu (argol.doit.wisc.edu [144.92.197.212])
	by mx1.freebsd.org (Postfix) with ESMTP id BC1A88FC13;
	Thu,  1 Dec 2011 15:44:00 +0000 (UTC)
MIME-version: 1.0
Content-transfer-encoding: 7BIT
Content-type: text/plain; CHARSET=US-ASCII; format=flowed
Received: from avs-daemon.smtpauth3.wiscmail.wisc.edu by
	smtpauth3.wiscmail.wisc.edu
	(Sun Java(tm) System Messaging Server 7u2-7.05 32bit (built Jul 30
	2009)) id <0LVJ00H024XB9U00@smtpauth3.wiscmail.wisc.edu>; Thu,
	01 Dec 2011 08:43:59 -0600 (CST)
Received: from comporellon.tachypleus.net ([unknown] [76.210.61.201])
	by smtpauth3.wiscmail.wisc.edu
	(Sun Java(tm) System Messaging Server 7u2-7.05 32bit (built Jul 30
	2009))
	with ESMTPSA id <0LVJ006GK4X9F330@smtpauth3.wiscmail.wisc.edu>; Thu,
	01 Dec 2011 08:43:57 -0600 (CST)
Date: Thu, 01 Dec 2011 08:43:56 -0600
From: Nathan Whitehorn <nwhitehorn@freebsd.org>
In-reply-to: <CAJUyCcMPh818n-XxmBBCHUVJVZYQGaQN2AzGY9K8pEFm3rz-_w@mail.gmail.com>
To: alc@freebsd.org
Message-id: <4ED792AC.4000501@freebsd.org>
X-Spam-Report: AuthenticatedSender=yes, SenderIP=76.210.61.201
X-Spam-PmxInfo: Server=avs-14, Version=5.6.1.2065439,
	Antispam-Engine: 2.7.2.376379, Antispam-Data: 2011.12.1.143315,
	SenderIP=76.210.61.201
References: <4ED5BE19.70805@fgznet.ch>
	<20111130162236.GA50300@deviant.kiev.zoral.com.ua>
	<4ED65F70.7050700@fgznet.ch>
	<20111130170936.GB50300@deviant.kiev.zoral.com.ua>
	<4ED66B75.3060409@fgznet.ch>
	<CAGE5yCpe8rfZp3ErXrf_SFwY_KNYQDyF87TAypxajJa-FSqcpQ@mail.gmail.com>
	<CAJUyCcMPh818n-XxmBBCHUVJVZYQGaQN2AzGY9K8pEFm3rz-_w@mail.gmail.com>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:8.0) Gecko/20111113
	Thunderbird/8.0
Cc: Kostik Belousov <kostikbel@gmail.com>,
	Andreas Tobler <andreast-list@fgznet.ch>, Alan Cox <alan.l.cox@gmail.com>,
	FreeBSD Arch <freebsd-arch@freebsd.org>
Subject: Re: powerpc64 malloc limit?
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 01 Dec 2011 15:44:01 -0000

On 11/30/11 15:50, Alan Cox wrote:
> On Wed, Nov 30, 2011 at 12:12 PM, Peter Wemm<peter@wemm.org>  wrote:
>
>> On Wed, Nov 30, 2011 at 9:44 AM, Andreas Tobler<andreast-list@fgznet.ch>
>> wrote:
>>> On 30.11.11 18:09, Kostik Belousov wrote:
>>>> On Wed, Nov 30, 2011 at 05:53:04PM +0100, Andreas Tobler wrote:
>>>>> On 30.11.11 17:22, Kostik Belousov wrote:
>>>>>> On Wed, Nov 30, 2011 at 06:24:41AM +0100, Andreas Tobler wrote:
>>>>>>> All,
>>>>>>>
>>>>>>> while working on gcc I found a very strange situation which renders
>> my
>>>>>>> powerpc64 machine unusable.
>>>>>>> The test case below tries to allocate that much memory as 'wanted'.
>> The
>>>>>>> same test case on amd64 returns w/o trying to allocate mem because
>> the
>>>>>>> size is far to big.
>>>>>>>
>>>>>>> I couldn't find the reason so far, that's why I'm here.
>>>>>>>
>>>>>>> As Nathan pointed out the VM_MAXUSER_SIZE is the biggest on
>> powerpc64:
>>>>>>> #define VM_MAXUSER_ADDRESS      (0x7ffffffffffff000UL)
>>>>>>>
>>>>>>> So, I'd expect a system to return an allocation error when a user
>> tries
>>>>>>> to allocate too much memory and not really trying it and going to be
>>>>>>> unusable. Iow, I'd exepect the situation on powerpc64 as I see on
>>>>>>> amd64.
>>>>>>>
>>>>>>> Can anybody explain me the situation, why do I not have a working
>> limit
>>>>>>> on powerpc64?
>>>>>>>
>>>>>>> The machine itself has 7GB RAM and 12GB swap. The amd64 where I
>>>>>>> compared
>>>>>>> has around 4GB/4GB RAM/swap.
>>>>>>>
>>>>>>> TIA,
>>>>>>> Andreas
>>>>>>>
>>>>>>> include<stdlib.h>
>>>>>>> #include<stdio.h>
>>>>>>>
>>>>>>> int main()
>>>>>>> {
>>>>>>>           void *p;
>>>>>>>
>>>>>>>           p = (void*) malloc (1152921504606846968ULL);
>>>>>>>           if (p != NULL)
>>>>>>>                   printf("p = %p\n", p);
>>>>>>>
>>>>>>>           printf("p = %p\n", p);
>>>>>>>           return (0);
>>>>>>> }
>>>>>> First, you should provide details of what consistutes 'the unusable
>>>>>> machine situation' on powerpc.
>>>>> I can not login anymore, everything is stuck except the core control
>>>>> mechanisms for example the fan controller.
>>>>>
>>>>> Top reports 'ugly' figures, below from a earlier try:
>>>>>
>>>>> last pid:  6790;  load averages:  0.78,  0.84,  0.86    up 0+00:34:52
>>>>> 22:42:29 47 processes:  1 running, 46 sleeping
>>>>> CPU:  0.0% user,  0.0% nice, 15.4% system, 11.8% interrupt, 72.8% idle
>>>>> Mem: 5912M Active, 570M Inact, 280M Wired, 26M Cache, 104M Buf, 352K
>> Free
>>>>> Swap: 12G Total, 9904M Used, 2383M Free, 80% Inuse, 178M Out
>>>>>
>>>>>     PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU
>>>>> COMMAND
>>>>>    6768 andreast      1  52    01073741824G  6479M pfault  1   0:58
>>>>> 18.90% 31370.
>>>>>
>>>>> And after my mem and swap are full I see swap_pager_getswapspace(16)
>>>>> failed.
>>>>>
>>>>> In this state I can only power-cycle the machine.
>>>>>
>>>>>> That said, on amd64 the user map is between 0 and 0x7fffffffffff,
>> which
>>>>>> obviously less then the requested allocation size 0x100000000000000.
>>>>>> If you look at the kdump output on amd64, you will see that malloc()
>>>>>> tries to mmap() the area, fails and retries with obreak(). Default
>>>>>> virtual memory limit is unlimited, so my best quess is that on amd64
>>>>>> vm_map_findspace() returns immediately.
>>>>>>
>>>>>> On powerpc64, I see no reason why vm_map_entry cannot be allocated,
>> but
>>>>>> please note that vm object and pages shall be only allocated on
>> demand.
>>>>>> So I am curious how does your machine breaks and where.
>>>>> I would expect that the 'system' does not allow me to allocate that
>> much
>>>>> of ram.
>>>> Does the issue with machine going into limbo reproducable with the code
>>>> you posted ?
>>> If I understand you correctly, yes. I can launch the test case and the
>>> machine is immediately unusable. Means I can not kill the process nor
>> can I
>>> log in. Also, top does not show anything useful.
>>>
>>> The original test case where I discovered this behavior behaves a bit
>>> different.
>>>
>> http://gcc.gnu.org/viewcvs/trunk/libstdc%2B%2B-v3/testsuite/23_containers/vector/bool/modifiers/insert/31370.cc?revision=169421&view=markup
>>> Here I can follow how the ram and swap is eaten up. Top is reporting the
>>> figures. If everything is 'full', the swaper errors start to appear on
>> the
>>> console.
>>>
>>>> Or, do you need to actually touch the pages in the allocated region ?
>>> If I have to, how would I do that?
>>>
>>>> If the later (and I do expect that), then how many pages do you need
>>>> to touch before machine breaks ? Is it single access that causes the
>>>> havoc, or you need to touch the amount approximately equal to RAM+swap ?
>>> Andreas
>> ia64 had some vaguely related excitement earlier in its life.    If
>> you created a 1TB sparse file and mmap'ed it over and over, tens,
>> maybe hundreds of thousands of times, certain VM internal state got
>> way out of hand.  mmaping was fine, but unmapping took 36 hours of cpu
>> runtime when I killed the process.  It got so far out of hand because
>> of the way ia64 handled just-in-time mappings on vhpt misses.
>>
>>
> There is a fundamental scalability problem with the powerpc64/aim pmap.
> See revision 212360.  In a nutshell, unlike amd64, ia64, and most other
> pmap implementations, the powerpc64/aim pmap implementation doesn't link
> together all of the pv entries belonging to a pmap into a list.  So, the
> powerpc64/aim implementations of range operations, like pmap_remove(),
> don't handle large, sparsely populated ranges as efficiently as ia64 does.
> Moreover, powerpc64/aim can't effectively implement pmap_remove_pages(),
> and so it doesn't even try.
>

This is really irritating to fix. The PMAP layer is really designed 
around a tree-based page table layout, which PowerPC/AIM does not have. 
One fix for at least some issues might be to also link the PVO 
structures into the pmap -- some things would not be efficient, since it 
would be a flat list, but it's at least not that complicated.
-Nathan