Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 18 Oct 2009 10:49:14 -0500
From:      Nathan Whitehorn <nwhitehorn@freebsd.org>
To:        Rafal Jaworowski <raj@semihalf.com>
Cc:        Guillaume Ballet <gballet@gmail.com>, Mark Tinguely <tinguely@casselton.net>, freebsd-arm@freebsd.org, Stanislav Sedov <stas@deglitch.com>
Subject:   Re: Adding members to struct cpu_functions
Message-ID:  <4ADB38FA.2080604@freebsd.org>
In-Reply-To: <4AD39C78.5050309@freebsd.org>
References:  <200910081613.n98GDt7r053539@casselton.net> <4A95E6D9-7BA5-4D8A-99A1-6BC6A7EABC18@semihalf.com> <20091012153628.9196951f.stas@deglitch.com> <fd183dc60910120529h5c741449rc8ad20b29fecd2ba@mail.gmail.com> <4AD32D76.3090401@freebsd.org> <6C1CF2D3-A473-4A73-92CB-C45BEEABCE0E@semihalf.com> <4AD39C78.5050309@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Nathan Whitehorn wrote:
> Rafal Jaworowski wrote:
>>
>> On 2009-10-12, at 15:21, Nathan Whitehorn wrote:
>>
>>>>>> I was wondering whether a separate pmap module for ARMv6-7 would not
>>>>>> be the best approach. After all v6-7 should be considered an 
>>>>>> entirely
>>>>>> new architecture variation, and we would avoid the very likely 
>>>>>> #ifdefs
>>>>>> hell in case of a single pmap.c.
>>>>>>
>>>>>>
>>>>> Yeah, I think that would be the best solution.  We could 
>>>>> conditionally
>>>>> select the right pmap.c file based on the target CPU selected (just
>>>>> like we do for board variations for at91/marvell).
>>>>>
>>>>>
>>>>
>>>> pmap.c is a very large file that seems to change very often. I fear
>>>> having several versions is going to be difficult to maintain. Granted,
>>>> I haven't read the whole file line after line. Yet it seems to me its
>>>> content can be abstracted to rely on arch-specific functions that
>>>> would be found in cpufuncs instead of hardcoded macros. Is there
>>>> something fundamentally wrong with enhancing struct cpufunc in order
>>>> to let the portmeisters decide what the MMU and caching bits should
>>>> look like? This is a blocking issue for me, since it looks like the
>>>> omap has some problem with backward compatibility mode. Without fixing
>>>> up the TLBs in my initarm function, it doesn't work.
>>>>
>>>> Speaking of #ifdef hell, why not breaking cpufuncs.c into several
>>>> cpufuncs_<myarch>.c? That would be a good way to start that
>>>> reorganization Mark has been talking about in his email.
>>>>
>>> One thing that might be worth looking at while thinking about this 
>>> is how this is done on PowerPC. We have run-time selectable PMAP 
>>> modules using KOBJ to handle CPUs with different MMU designs, as 
>>> well as a platform module scheme, again using KOBJ, to pick the 
>>> appropriate PMAP for the board as well as determine the physical 
>>> memory layout and such things. One of the nice things about the 
>>> approach is that it is easy to subclass if you have a new, 
>>> marginally different, design, and it avoids #ifdef hell as well as 
>>> letting you build a GENERIC kernel with support for multiple MMU 
>>> designs and board types (the last less of a concern on ARM, though).
>>
>> What always concerned me was the performance cost this imposes, and 
>> it would be a really useful exercise to measure what is the actual 
>> impact of KOBJ-tized pmap we have in PowerPC; with an often-called 
>> interface like pmap it might occur the penalty is not that little..
> Using the KOBJ cache means that it is only marginally more expensive 
> than a standard function pointer call. There's a 9-year-old note in 
> the commit log for sys/sys/kobj.h that it takes about 30% longer to 
> call a function that does nothing via KOBJ versus a direct call on a 
> 300 MHz P2 (a 10 ns time difference). Given that and that pmap methods 
> do, in fact, do things besides get called and immediately return, I 
> suspect non-KOBJ related execution time will dwarf any time loss from 
> the indirection. I'll try to repeat the measurement in the next few 
> days, however, since this is important to know.
> -Nathan 
I just did the measurements on a 1.8 GHz PowerPC G5. There were four 
tests, each repeated 1 million times. "Load and store" involves 
incrementing a volatile int from 0 to 1e6 inline. "Direct calls" 
involves a branch to a function that returns 0 and does nothing else. 
"Function ptr" calls the same function via a pointer stored in a 
register, and "KOBJ calls" calls it via KOBJ. Here are the results 
(errors are +/- 0.5 ns for the function call measurements due to 
compiler optimization jitter, and 0 for load and store, since that takes 
a deterministic number of clock cycles):

32-bit kernel:
Load and store:  26.1 ns
Direct calls:   7.2 ns
Function ptr:   8.4 ns
KOBJ calls:     17.8 ns

64-bit kernel:
Load and store:  9.2 ns
Direct calls:   6.1 ns
Function ptr:   8.3 ns
KOBJ calls:     40.5 ns

ABI changes make a large difference, as you can see. The cost of calling 
via KOBJ is non-negligible, but small, especially compared to the cost 
of doing anything involving memory. I don't know how this changes with 
ARM calling conventions.
-Nathan



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4ADB38FA.2080604>