Date: Sun, 18 Oct 2009 10:49:14 -0500 From: Nathan Whitehorn <nwhitehorn@freebsd.org> To: Rafal Jaworowski <raj@semihalf.com> Cc: Guillaume Ballet <gballet@gmail.com>, Mark Tinguely <tinguely@casselton.net>, freebsd-arm@freebsd.org, Stanislav Sedov <stas@deglitch.com> Subject: Re: Adding members to struct cpu_functions Message-ID: <4ADB38FA.2080604@freebsd.org> In-Reply-To: <4AD39C78.5050309@freebsd.org> References: <200910081613.n98GDt7r053539@casselton.net> <4A95E6D9-7BA5-4D8A-99A1-6BC6A7EABC18@semihalf.com> <20091012153628.9196951f.stas@deglitch.com> <fd183dc60910120529h5c741449rc8ad20b29fecd2ba@mail.gmail.com> <4AD32D76.3090401@freebsd.org> <6C1CF2D3-A473-4A73-92CB-C45BEEABCE0E@semihalf.com> <4AD39C78.5050309@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Nathan Whitehorn wrote: > Rafal Jaworowski wrote: >> >> On 2009-10-12, at 15:21, Nathan Whitehorn wrote: >> >>>>>> I was wondering whether a separate pmap module for ARMv6-7 would not >>>>>> be the best approach. After all v6-7 should be considered an >>>>>> entirely >>>>>> new architecture variation, and we would avoid the very likely >>>>>> #ifdefs >>>>>> hell in case of a single pmap.c. >>>>>> >>>>>> >>>>> Yeah, I think that would be the best solution. We could >>>>> conditionally >>>>> select the right pmap.c file based on the target CPU selected (just >>>>> like we do for board variations for at91/marvell). >>>>> >>>>> >>>> >>>> pmap.c is a very large file that seems to change very often. I fear >>>> having several versions is going to be difficult to maintain. Granted, >>>> I haven't read the whole file line after line. Yet it seems to me its >>>> content can be abstracted to rely on arch-specific functions that >>>> would be found in cpufuncs instead of hardcoded macros. Is there >>>> something fundamentally wrong with enhancing struct cpufunc in order >>>> to let the portmeisters decide what the MMU and caching bits should >>>> look like? This is a blocking issue for me, since it looks like the >>>> omap has some problem with backward compatibility mode. Without fixing >>>> up the TLBs in my initarm function, it doesn't work. >>>> >>>> Speaking of #ifdef hell, why not breaking cpufuncs.c into several >>>> cpufuncs_<myarch>.c? That would be a good way to start that >>>> reorganization Mark has been talking about in his email. >>>> >>> One thing that might be worth looking at while thinking about this >>> is how this is done on PowerPC. We have run-time selectable PMAP >>> modules using KOBJ to handle CPUs with different MMU designs, as >>> well as a platform module scheme, again using KOBJ, to pick the >>> appropriate PMAP for the board as well as determine the physical >>> memory layout and such things. One of the nice things about the >>> approach is that it is easy to subclass if you have a new, >>> marginally different, design, and it avoids #ifdef hell as well as >>> letting you build a GENERIC kernel with support for multiple MMU >>> designs and board types (the last less of a concern on ARM, though). >> >> What always concerned me was the performance cost this imposes, and >> it would be a really useful exercise to measure what is the actual >> impact of KOBJ-tized pmap we have in PowerPC; with an often-called >> interface like pmap it might occur the penalty is not that little.. > Using the KOBJ cache means that it is only marginally more expensive > than a standard function pointer call. There's a 9-year-old note in > the commit log for sys/sys/kobj.h that it takes about 30% longer to > call a function that does nothing via KOBJ versus a direct call on a > 300 MHz P2 (a 10 ns time difference). Given that and that pmap methods > do, in fact, do things besides get called and immediately return, I > suspect non-KOBJ related execution time will dwarf any time loss from > the indirection. I'll try to repeat the measurement in the next few > days, however, since this is important to know. > -Nathan I just did the measurements on a 1.8 GHz PowerPC G5. There were four tests, each repeated 1 million times. "Load and store" involves incrementing a volatile int from 0 to 1e6 inline. "Direct calls" involves a branch to a function that returns 0 and does nothing else. "Function ptr" calls the same function via a pointer stored in a register, and "KOBJ calls" calls it via KOBJ. Here are the results (errors are +/- 0.5 ns for the function call measurements due to compiler optimization jitter, and 0 for load and store, since that takes a deterministic number of clock cycles): 32-bit kernel: Load and store: 26.1 ns Direct calls: 7.2 ns Function ptr: 8.4 ns KOBJ calls: 17.8 ns 64-bit kernel: Load and store: 9.2 ns Direct calls: 6.1 ns Function ptr: 8.3 ns KOBJ calls: 40.5 ns ABI changes make a large difference, as you can see. The cost of calling via KOBJ is non-negligible, but small, especially compared to the cost of doing anything involving memory. I don't know how this changes with ARM calling conventions. -Nathan
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4ADB38FA.2080604>