Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 8 Mar 2010 13:37:23 -0600 (CST)
From:      Mark Tinguely <tinguely@casselton.net>
To:        ticso@cicely.de
Cc:        freebsd-arm@freebsd.org
Subject:   Re: Performance of SheevaPlug on 8-stable
Message-ID:  <201003081937.o28JbNcU049003@casselton.net>
In-Reply-To: <20100308184147.GB11192@cicely7.cicely.de>

next in thread | previous in thread | raw e-mail | index | archive | help

	<deleted>
>
>  This puzzled me as well.
>  What is the requirement for such a handling with shared pages?
>  I though handing over shared data is done by cache-flush, barriers or
>  whatever an architectur has for this.
>  Most systems we talk about are single CPU, so it is just DMA and
>  handing over dcache writes to icache, but we don't support self
>  modifying code, so it is always done in a controlled way.
>  And even for SMP systems handing over data requires using
>  cache coherence mechanisms - e.g. those embedded in mutexes.
>  So what is wrong in my picture and requires us to do special handling
>  for shared pages on ARM?
>
>  > And if there's only one copy of 'test' running, why does it hit the
>  > 'shared' case for this code?
>  > 
>  > Warner
>
>  -- 
>  B.Walter <bernd@bwct.de> http://www.bwct.de
>  Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.

ARMv4/ARMv5 use virtual indexed / virtual tagged level one caches.
They may or may not have level two caches. This is the ARM chips
that we currently support, and I will explain the rules below.

Newest processors the ARMv6 can be virtual index / physical tagged or
physical index / physical tagged level one caches; The ARM7 must have
physical index / physical tag level one caches. The ARMv6 and ARMv7 
have more pde/pte bit explaining the cache status on the "inner"
and "outter" caches. The ARMv7 has the more mature cache management;
it defines the "level of unity" and "level of coherence" for the caches.
There is also a level snooping for the ARMv7 mulit-core, that I will
just dance around. PIPT cache must be synced to the "level of coherency"
before DMA and when modified from another process - think debugger in
another address space modifying instruction code. ARMv6/ARMv7 have
special address spaces to avoid tlb flushes. If they are not used, then
tlbs have to be flushed on context switch. This is close to the i386/amd64
with the exception of DMA, the i386/amd64 have self snooping cache buses.

VIVT cache rules:

 1) flush cache and tlb on context change. 

 2) USER cache must be disabled if a physical page has AT LEAST one writable
    user mapping AND is also mapped more than one time in the same user
    address space. (multiple read mappings and no writes are fine, they take
    up multiple cache entries. Obviously, a single read or a single write
    is fine. If the mappings are in different user address spaces, we will
    be okay because the flush on context change will sync things up).

 3) KERNEL spaces are global.
	a) If the page is mapped writable AT LEAST ONCE to a kernel space
	   AND the page is mapped more than once, no matter if the second
	   mapping is in the user or kernel space, all mappings must not
	   be cached. 

	b) If the page has only readable kernel mappings but at least one
	   writable user mapping, the cache must be disabled for the mappings
	   of page in this address space. This is slightly different from
	   rule 2. Kernel mappings are typically writable, so this is a
	   case that really does not happen.

It gets a little tricky to implement, because we have to catch the transition
from cache -> non-cache (change pte and wbinv/inv data or instruction caches)
and from non-cache -> cache (change the pte).

--Mark.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201003081937.o28JbNcU049003>