From owner-freebsd-arm@FreeBSD.ORG  Mon Oct 19 15:34:03 2009
Return-Path: <owner-freebsd-arm@FreeBSD.ORG>
Delivered-To: freebsd-arm@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2D5D11065670;
	Mon, 19 Oct 2009 15:34:03 +0000 (UTC)
	(envelope-from raj@semihalf.com)
Received: from smtp.semihalf.com (smtp.semihalf.com [213.17.239.109])
	by mx1.freebsd.org (Postfix) with ESMTP id 941438FC0C;
	Mon, 19 Oct 2009 15:34:02 +0000 (UTC)
Received: from localhost (unknown [213.17.239.109])
	by smtp.semihalf.com (Postfix) with ESMTP id 16EE3C3BA8;
	Mon, 19 Oct 2009 17:28:34 +0200 (CEST)
X-Virus-Scanned: by amavisd-new at semihalf.com
Received: from smtp.semihalf.com ([213.17.239.109])
	by localhost (smtp.semihalf.com [213.17.239.109]) (amavisd-new,
	port 10024)
	with ESMTP id kfyITc-hCZhX; Mon, 19 Oct 2009 17:28:32 +0200 (CEST)
Received: from [10.0.0.34] (cardhu.semihalf.com [213.17.239.108])
	by smtp.semihalf.com (Postfix) with ESMTPSA id D924FC3B9C;
	Mon, 19 Oct 2009 17:28:32 +0200 (CEST)
Mime-Version: 1.0 (Apple Message framework v1076)
Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes
From: Rafal Jaworowski <raj@semihalf.com>
In-Reply-To: <4ADB38FA.2080604@freebsd.org>
Date: Mon, 19 Oct 2009 17:33:59 +0200
Content-Transfer-Encoding: 7bit
Message-Id: <05B19969-B238-4E3A-8326-624067F0362B@semihalf.com>
References: <200910081613.n98GDt7r053539@casselton.net>
	<4A95E6D9-7BA5-4D8A-99A1-6BC6A7EABC18@semihalf.com>
	<20091012153628.9196951f.stas@deglitch.com>
	<fd183dc60910120529h5c741449rc8ad20b29fecd2ba@mail.gmail.com>
	<4AD32D76.3090401@freebsd.org>
	<6C1CF2D3-A473-4A73-92CB-C45BEEABCE0E@semihalf.com>
	<4AD39C78.5050309@freebsd.org> <4ADB38FA.2080604@freebsd.org>
To: Nathan Whitehorn <nwhitehorn@freebsd.org>
X-Mailer: Apple Mail (2.1076)
Cc: Guillaume Ballet <gballet@gmail.com>,
	Mark Tinguely <tinguely@casselton.net>, freebsd-arm@freebsd.org,
	Stanislav Sedov <stas@deglitch.com>
Subject: Re: Adding members to struct cpu_functions
X-BeenThere: freebsd-arm@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Porting FreeBSD to the StrongARM Processor <freebsd-arm.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arm>,
	<mailto:freebsd-arm-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arm>
List-Post: <mailto:freebsd-arm@freebsd.org>
List-Help: <mailto:freebsd-arm-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arm>,
	<mailto:freebsd-arm-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 19 Oct 2009 15:34:03 -0000


On 2009-10-18, at 17:49, Nathan Whitehorn wrote:

>>>> One thing that might be worth looking at while thinking about  
>>>> this is how this is done on PowerPC. We have run-time selectable  
>>>> PMAP modules using KOBJ to handle CPUs with different MMU  
>>>> designs, as well as a platform module scheme, again using KOBJ,  
>>>> to pick the appropriate PMAP for the board as well as determine  
>>>> the physical memory layout and such things. One of the nice  
>>>> things about the approach is that it is easy to subclass if you  
>>>> have a new, marginally different, design, and it avoids #ifdef  
>>>> hell as well as letting you build a GENERIC kernel with support  
>>>> for multiple MMU designs and board types (the last less of a  
>>>> concern on ARM, though).
>>>
>>> What always concerned me was the performance cost this imposes,  
>>> and it would be a really useful exercise to measure what is the  
>>> actual impact of KOBJ-tized pmap we have in PowerPC; with an often- 
>>> called interface like pmap it might occur the penalty is not that  
>>> little..
>> Using the KOBJ cache means that it is only marginally more  
>> expensive than a standard function pointer call. There's a 9-year- 
>> old note in the commit log for sys/sys/kobj.h that it takes about  
>> 30% longer to call a function that does nothing via KOBJ versus a  
>> direct call on a 300 MHz P2 (a 10 ns time difference). Given that  
>> and that pmap methods do, in fact, do things besides get called and  
>> immediately return, I suspect non-KOBJ related execution time will  
>> dwarf any time loss from the indirection. I'll try to repeat the  
>> measurement in the next few days, however, since this is important  
>> to know.
>> -Nathan
> I just did the measurements on a 1.8 GHz PowerPC G5. There were four  
> tests, each repeated 1 million times. "Load and store" involves  
> incrementing a volatile int from 0 to 1e6 inline. "Direct calls"  
> involves a branch to a function that returns 0 and does nothing  
> else. "Function ptr" calls the same function via a pointer stored in  
> a register, and "KOBJ calls" calls it via KOBJ. Here are the results  
> (errors are +/- 0.5 ns for the function call measurements due to  
> compiler optimization jitter, and 0 for load and store, since that  
> takes a deterministic number of clock cycles):
>
> 32-bit kernel:
> Load and store:  26.1 ns
> Direct calls:   7.2 ns
> Function ptr:   8.4 ns
> KOBJ calls:     17.8 ns
>
> 64-bit kernel:
> Load and store:  9.2 ns
> Direct calls:   6.1 ns
> Function ptr:   8.3 ns
> KOBJ calls:     40.5 ns
>
> ABI changes make a large difference, as you can see. The cost of  
> calling via KOBJ is non-negligible, but small, especially compared  
> to the cost of doing anything involving memory. I don't know how  
> this changes with ARM calling conventions.

Very interesting, thanks! Could you elaborate on the testing details  
and share the diagnostic code so we could repeat this with other CPU  
variations like Book-E PowerPC, or ARM?

Rafal