From owner-freebsd-arm@FreeBSD.ORG Sun Oct 18 15:49:19 2009 Return-Path: Delivered-To: freebsd-arm@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7BB701065672 for ; Sun, 18 Oct 2009 15:49:19 +0000 (UTC) (envelope-from nwhitehorn@freebsd.org) Received: from argol.doit.wisc.edu (argol.doit.wisc.edu [144.92.197.212]) by mx1.freebsd.org (Postfix) with ESMTP id 4BE078FC13 for ; Sun, 18 Oct 2009 15:49:18 +0000 (UTC) MIME-version: 1.0 Content-transfer-encoding: 7BIT Content-type: text/plain; charset=ISO-8859-1; format=flowed Received: from avs-daemon.smtpauth3.wiscmail.wisc.edu by smtpauth3.wiscmail.wisc.edu (Sun Java(tm) System Messaging Server 7.0-5.01 32bit (built Feb 19 2009)) id <0KRP00702VY5BG00@smtpauth3.wiscmail.wisc.edu> for freebsd-arm@freebsd.org; Sun, 18 Oct 2009 10:49:17 -0500 (CDT) Received: from comporellon.tachypleus.net (adsl-75-50-88-75.dsl.mdsnwi.sbcglobal.net [75.50.88.75]) by smtpauth3.wiscmail.wisc.edu (Sun Java(tm) System Messaging Server 7.0-5.01 32bit (built Feb 19 2009)) with ESMTPSA id <0KRP004ILVY3BO20@smtpauth3.wiscmail.wisc.edu>; Sun, 18 Oct 2009 10:49:16 -0500 (CDT) Date: Sun, 18 Oct 2009 10:49:14 -0500 From: Nathan Whitehorn In-reply-to: <4AD39C78.5050309@freebsd.org> To: Rafal Jaworowski Message-id: <4ADB38FA.2080604@freebsd.org> X-Spam-Report: AuthenticatedSender=yes, SenderIP=75.50.88.75 X-Spam-PmxInfo: Server=avs-14, Version=5.5.5.374460, Antispam-Engine: 2.7.1.369594, Antispam-Data: 2009.10.18.153320, SenderIP=75.50.88.75 References: <200910081613.n98GDt7r053539@casselton.net> <4A95E6D9-7BA5-4D8A-99A1-6BC6A7EABC18@semihalf.com> <20091012153628.9196951f.stas@deglitch.com> <4AD32D76.3090401@freebsd.org> <6C1CF2D3-A473-4A73-92CB-C45BEEABCE0E@semihalf.com> <4AD39C78.5050309@freebsd.org> User-Agent: Thunderbird 2.0.0.23 (X11/20090905) Cc: Guillaume Ballet , Mark Tinguely , freebsd-arm@freebsd.org, Stanislav Sedov Subject: Re: Adding members to struct cpu_functions X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the StrongARM Processor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 18 Oct 2009 15:49:19 -0000 Nathan Whitehorn wrote: > Rafal Jaworowski wrote: >> >> On 2009-10-12, at 15:21, Nathan Whitehorn wrote: >> >>>>>> I was wondering whether a separate pmap module for ARMv6-7 would not >>>>>> be the best approach. After all v6-7 should be considered an >>>>>> entirely >>>>>> new architecture variation, and we would avoid the very likely >>>>>> #ifdefs >>>>>> hell in case of a single pmap.c. >>>>>> >>>>>> >>>>> Yeah, I think that would be the best solution. We could >>>>> conditionally >>>>> select the right pmap.c file based on the target CPU selected (just >>>>> like we do for board variations for at91/marvell). >>>>> >>>>> >>>> >>>> pmap.c is a very large file that seems to change very often. I fear >>>> having several versions is going to be difficult to maintain. Granted, >>>> I haven't read the whole file line after line. Yet it seems to me its >>>> content can be abstracted to rely on arch-specific functions that >>>> would be found in cpufuncs instead of hardcoded macros. Is there >>>> something fundamentally wrong with enhancing struct cpufunc in order >>>> to let the portmeisters decide what the MMU and caching bits should >>>> look like? This is a blocking issue for me, since it looks like the >>>> omap has some problem with backward compatibility mode. Without fixing >>>> up the TLBs in my initarm function, it doesn't work. >>>> >>>> Speaking of #ifdef hell, why not breaking cpufuncs.c into several >>>> cpufuncs_.c? That would be a good way to start that >>>> reorganization Mark has been talking about in his email. >>>> >>> One thing that might be worth looking at while thinking about this >>> is how this is done on PowerPC. We have run-time selectable PMAP >>> modules using KOBJ to handle CPUs with different MMU designs, as >>> well as a platform module scheme, again using KOBJ, to pick the >>> appropriate PMAP for the board as well as determine the physical >>> memory layout and such things. One of the nice things about the >>> approach is that it is easy to subclass if you have a new, >>> marginally different, design, and it avoids #ifdef hell as well as >>> letting you build a GENERIC kernel with support for multiple MMU >>> designs and board types (the last less of a concern on ARM, though). >> >> What always concerned me was the performance cost this imposes, and >> it would be a really useful exercise to measure what is the actual >> impact of KOBJ-tized pmap we have in PowerPC; with an often-called >> interface like pmap it might occur the penalty is not that little.. > Using the KOBJ cache means that it is only marginally more expensive > than a standard function pointer call. There's a 9-year-old note in > the commit log for sys/sys/kobj.h that it takes about 30% longer to > call a function that does nothing via KOBJ versus a direct call on a > 300 MHz P2 (a 10 ns time difference). Given that and that pmap methods > do, in fact, do things besides get called and immediately return, I > suspect non-KOBJ related execution time will dwarf any time loss from > the indirection. I'll try to repeat the measurement in the next few > days, however, since this is important to know. > -Nathan I just did the measurements on a 1.8 GHz PowerPC G5. There were four tests, each repeated 1 million times. "Load and store" involves incrementing a volatile int from 0 to 1e6 inline. "Direct calls" involves a branch to a function that returns 0 and does nothing else. "Function ptr" calls the same function via a pointer stored in a register, and "KOBJ calls" calls it via KOBJ. Here are the results (errors are +/- 0.5 ns for the function call measurements due to compiler optimization jitter, and 0 for load and store, since that takes a deterministic number of clock cycles): 32-bit kernel: Load and store: 26.1 ns Direct calls: 7.2 ns Function ptr: 8.4 ns KOBJ calls: 17.8 ns 64-bit kernel: Load and store: 9.2 ns Direct calls: 6.1 ns Function ptr: 8.3 ns KOBJ calls: 40.5 ns ABI changes make a large difference, as you can see. The cost of calling via KOBJ is non-negligible, but small, especially compared to the cost of doing anything involving memory. I don't know how this changes with ARM calling conventions. -Nathan