Date: Mon, 8 Mar 2010 03:16:42 +0100 From: Bernd Walter <ticso@cicely7.cicely.de> To: Mark Tinguely <tinguely@casselton.net> Cc: freebsd-arm@freebsd.org Subject: Re: Performance of SheevaPlug on 8-stable Message-ID: <20100308021642.GQ11192@cicely7.cicely.de> In-Reply-To: <20100308013105.GP11192@cicely7.cicely.de> References: <FB81E027-0CCC-4DF6-A29F-88920A39556B@semihalf.com> <201003072125.o27LPfFb000968@casselton.net> <20100308002704.GL11192@cicely7.cicely.de> <20100308013105.GP11192@cicely7.cicely.de>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Mar 08, 2010 at 02:31:05AM +0100, Bernd Walter wrote: > On Mon, Mar 08, 2010 at 01:27:04AM +0100, Bernd Walter wrote: > > On Sun, Mar 07, 2010 at 03:25:41PM -0600, Mark Tinguely wrote: > > > > > > FreeBSD-current has kernel and user witness turned on. Witness is for > > > locks, so it should not change the performance of a tight arithmetic loop > > > like this. > > > > I have no kernel debugging enabled. > > I have no malloc.conf on current, but I have on the 8.0-current system, > > so malloc debugging is enabled on one machine, but it shouldn't hurt in > > this case since it is not allocating anything. > > > > > I don't know the marvell interals, and from what I tell, their technial > > > docs require NDA. That said, many of the ARM processors also have a > > > instruction internal cache (instruction prefetch) in addition to the > > > instruction cache. I don't think the prefetch has an enable/disable. > > > > > > It looks like from the cpu identification that the the branch prediction > > > is turned on. Branch prediction compensates for the longer pipelines. > > > I can't see how in the tight loop how that could go astray. > > > > > > Thus says the ARM ARM: > > > > > > ARM implementations are free to choose how far ahead of the > > > current point of execution they prefetch instructions; either > > > a fixed or a dynamically varying number of instructions. As well > > > as being free to choose how many instructions to prefetch, an ARM > > > implementation can choose which possible future execution path to > > > prefetch along. For example, after a branch instruction, it can > > > choose to prefetch either the instruction following the branch > > > or the instruction at the branch target. This is known as branch > > > prediction. > > > > > > There are a few data dangling allocations that I would like to see > > > closed from the multiple kernel allocation fix. *IN THEORY, IF* a page > > > is allocated via the arm_nocache (DMA COHERENT) or a sendfile, then > > > it is never marked as unallocated. *IN THEORY*, if that page is used > > > again, then we could falsely believe that page is being shared and > > > we turn off the cache, eventhough it is not shared. > > > > > > http://www.casselton.net/~tinguely/arm_pmap_unmanaged.diff > > > > > > * Disclaimer: I am not sure if DMA COHERENT nor sendfiles are used in > > > the Sheeva implementation. This is a theoritical observation of a side > > > effect of the multiple kernel mapping patch that we did just before > > > FreeBSD 8-release. > > This sounds possible. > My 8.0-current system should be before that change and it is much faster > than my current system. > It is still slower than the calculated ~80s and the difference looks > a bit large to just think it is a stalled pipeline because of the branch. > Has anyone access to a RM9200 system running Linux? With your patch my current system is faster as well. [55]chipmunk.cicely.de# ./test 207.000u 0.000s 4:01.13 86.0% 46+1516k 0+0io 0pf+0w [56]chipmunk.cicely.de# ./test 207.000u 0.000s 3:55.66 87.9% 45+1516k 0+0io 0pf+0w It is still puzzling me why it is not near 80 seconds. This would mean it is loosing something about 5-6 cycles. Well - Ok - the pipeline might be that long and real loops are mostly some instructions longer. But I would still be interested to see Linux results on RM9200. -- B.Walter <bernd@bwct.de> http://www.bwct.de Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100308021642.GQ11192>