Date: Mon, 08 Mar 2010 16:50:58 +0100 From: Grzegorz Bernacki <gjb@semihalf.com> To: Mark Tinguely <tinguely@casselton.net> Cc: freebsd-arm@freebsd.org Subject: Re: Performance of SheevaPlug on 8-stable Message-ID: <4B951CE2.6040507@semihalf.com> In-Reply-To: <201003072125.o27LPfFb000968@casselton.net> References: <201003072125.o27LPfFb000968@casselton.net>
next in thread | previous in thread | raw e-mail | index | archive | help
Mark Tinguely wrote: > FreeBSD-current has kernel and user witness turned on. Witness is for > locks, so it should not change the performance of a tight arithmetic loop > like this. > > I don't know the marvell interals, and from what I tell, their technial > docs require NDA. That said, many of the ARM processors also have a > instruction internal cache (instruction prefetch) in addition to the > instruction cache. I don't think the prefetch has an enable/disable. > > It looks like from the cpu identification that the the branch prediction > is turned on. Branch prediction compensates for the longer pipelines. > I can't see how in the tight loop how that could go astray. > > Thus says the ARM ARM: > > ARM implementations are free to choose how far ahead of the > current point of execution they prefetch instructions; either > a fixed or a dynamically varying number of instructions. As well > as being free to choose how many instructions to prefetch, an ARM > implementation can choose which possible future execution path to > prefetch along. For example, after a branch instruction, it can > choose to prefetch either the instruction following the branch > or the instruction at the branch target. This is known as branch > prediction. > > There are a few data dangling allocations that I would like to see > closed from the multiple kernel allocation fix. *IN THEORY, IF* a page > is allocated via the arm_nocache (DMA COHERENT) or a sendfile, then > it is never marked as unallocated. *IN THEORY*, if that page is used > again, then we could falsely believe that page is being shared and > we turn off the cache, eventhough it is not shared. > > http://www.casselton.net/~tinguely/arm_pmap_unmanaged.diff > > * Disclaimer: I am not sure if DMA COHERENT nor sendfiles are used in > the Sheeva implementation. This is a theoritical observation of a side > effect of the multiple kernel mapping patch that we did just before > FreeBSD 8-release. > > --Mark Tinguely This is probably caused by mechanism which turns of cache for shared pages. When I add applied following path: diff --git a/sys/arm/arm/pmap.c b/sys/arm/arm/pmap.c index 390dc3c..d17c0cc 100644 --- a/sys/arm/arm/pmap.c +++ b/sys/arm/arm/pmap.c @@ -1401,6 +1401,8 @@ pmap_fix_cache(struct vm_page *pg, pmap_t pm, vm_offset_t va) */ TAILQ_FOREACH(pv, &pg->md.pv_list, pv_list) { + if (pv->pv_flags & PVF_EXEC) + return; /* generate a count of the pv_entry uses */ if (pv->pv_flags & PVF_WRITE) { if (pv->pv_pmap == pmap_kernel()) execution time of 'test' program is: mv78100-4# time ./test 5.000u 0.000s 0:05.40 99.8% 40+1324k 0+0io 0pf+0w and without this path is: mv78100-4# time ./test 295.000u 0.000s 4:56.01 99.7% 40+1322k 0+0io 0pf+0w I think we need to handle executable pages in different way. grzesiek
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4B951CE2.6040507>