Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 8 Mar 2010 02:31:05 +0100
From:      Bernd Walter <ticso@cicely7.cicely.de>
To:        Mark Tinguely <tinguely@casselton.net>
Cc:        freebsd-arm@freebsd.org
Subject:   Re: Performance of SheevaPlug on 8-stable
Message-ID:  <20100308013105.GP11192@cicely7.cicely.de>
In-Reply-To: <20100308002704.GL11192@cicely7.cicely.de>
References:  <FB81E027-0CCC-4DF6-A29F-88920A39556B@semihalf.com> <201003072125.o27LPfFb000968@casselton.net> <20100308002704.GL11192@cicely7.cicely.de>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Mar 08, 2010 at 01:27:04AM +0100, Bernd Walter wrote:
> On Sun, Mar 07, 2010 at 03:25:41PM -0600, Mark Tinguely wrote:
> > 
> > FreeBSD-current has kernel and user witness turned on. Witness is for
> > locks, so it should not change the performance of a tight arithmetic loop
> > like this.
> 
> I have no kernel debugging enabled.
> I have no malloc.conf on current, but I have on the 8.0-current system,
> so malloc debugging is enabled on one machine, but it shouldn't hurt in
> this case since it is not allocating anything.
> 
> > I don't know the marvell interals, and from what I tell, their technial
> > docs require NDA. That said, many of the ARM processors also have a
> > instruction internal cache (instruction prefetch) in addition to the
> > instruction cache. I don't think the prefetch has an enable/disable.
> > 
> > It looks like from the cpu identification that the the branch prediction
> > is turned on. Branch prediction compensates for the longer pipelines.
> > I can't see how in the tight loop how that could go astray.
> > 
> > Thus says the ARM ARM:
> > 
> > 	ARM implementations are free to choose how far ahead of the
> > 	current point of execution they prefetch instructions; either
> > 	a fixed or a dynamically varying number of instructions. As well
> > 	as being free to choose how many instructions to prefetch, an ARM
> > 	implementation can choose which possible future execution path to
> > 	prefetch along. For example, after a branch instruction, it can
> > 	choose to prefetch either the instruction following the branch
> > 	or the instruction at the branch target. This is known as branch
> > 	prediction.
> > 
> > There are a few data dangling allocations that I would like to see
> > closed from the multiple kernel allocation fix. *IN THEORY, IF* a page
> > is allocated via the arm_nocache (DMA COHERENT) or a sendfile, then
> > it is never marked as unallocated. *IN THEORY*, if that page is used
> > again, then we could falsely believe that page is being shared and
> > we turn off the cache, eventhough it is not shared.
> > 
> > 	http://www.casselton.net/~tinguely/arm_pmap_unmanaged.diff
> > 
> > * Disclaimer: I am not sure if DMA COHERENT nor sendfiles are used in
> > the Sheeva implementation. This is a theoritical observation of a side
> > effect of the multiple kernel mapping patch that we did just before
> > FreeBSD 8-release.

This sounds possible.
My 8.0-current system should be before that change and it is much faster
than my current system.
It is still slower than the calculated ~80s and the difference looks
a bit large to just think it is a stalled pipeline because of the branch.
Has anyone access to a RM9200 system running Linux?

-- 
B.Walter <bernd@bwct.de> http://www.bwct.de
Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100308013105.GP11192>