From owner-freebsd-arm@FreeBSD.ORG  Wed Feb  9 19:28:05 2011
Return-Path: <owner-freebsd-arm@FreeBSD.ORG>
Delivered-To: arm@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id EFF6D106564A
	for <arm@freebsd.org>; Wed,  9 Feb 2011 19:28:05 +0000 (UTC)
	(envelope-from xcllnt@mac.com)
Received: from asmtpout027.mac.com (asmtpout027.mac.com [17.148.16.102])
	by mx1.freebsd.org (Postfix) with ESMTP id D4F598FC12
	for <arm@freebsd.org>; Wed,  9 Feb 2011 19:28:05 +0000 (UTC)
MIME-version: 1.0
Content-transfer-encoding: 7BIT
Content-type: text/plain; CHARSET=US-ASCII
Received: from sa-nc-common2-131.static.jnpr.net
	(natint3.juniper.net [66.129.224.36])
	by asmtp027.mac.com (Oracle Communications Messaging Exchange Server
	7u4-18.01 64bit (built Jul 15 2010)) with ESMTPSA id
	<0LGD005MP7EP9M30@asmtp027.mac.com>
	for arm@freebsd.org; Wed, 09 Feb 2011 11:28:03 -0800 (PST)
X-Proofpoint-Virus-Version: vendor=fsecure
	engine=2.50.10432:5.2.15,1.0.148,0.0.0000
	definitions=2011-02-09_06:2011-02-09, 2011-02-09,
	1970-01-01 signatures=0
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0
	ipscore=0 suspectscore=2 phishscore=0 bulkscore=0 adultscore=0
	classifier=spam
	adjust=0 reason=mlx engine=6.0.2-1012030000 definitions=main-1102090118
From: Marcel Moolenaar <xcllnt@mac.com>
In-reply-to: <4D52D01D.7060204@gmail.com>
Date: Wed, 09 Feb 2011 11:28:00 -0800
Message-id: <5DBD21BB-5C7A-4389-BF28-D508B06DB656@mac.com>
References: <857AA8D9-5C41-4D80-A3B5-0D29BE051014@mac.com>
	<20110209095630.GA57320@ci0.org>
	<CFDFD654-F62D-4BA8-83F8-95AC0A38437F@mac.com>
	<4D52D01D.7060204@gmail.com>
To: Mark Tinguely <marktinguely@gmail.com>
X-Mailer: Apple Mail (2.1082)
Cc: arm@freebsd.org
Subject: Re: Elimination of cpu_l2cache_* functions
X-BeenThere: freebsd-arm@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Porting FreeBSD to the StrongARM Processor <freebsd-arm.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arm>,
	<mailto:freebsd-arm-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arm>
List-Post: <mailto:freebsd-arm@freebsd.org>
List-Help: <mailto:freebsd-arm-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arm>,
	<mailto:freebsd-arm-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 09 Feb 2011 19:28:06 -0000


On Feb 9, 2011, at 9:34 AM, Mark Tinguely wrote:

> On 2/9/2011 10:25 AM, Marcel Moolenaar wrote:
>> On Feb 9, 2011, at 1:56 AM, Olivier Houchard wrote:
>> 
>>> Hi Marcel,
>>> 
>>> On Mon, Feb 07, 2011 at 10:43:54AM -0800, Marcel Moolenaar wrote:
>>>> All,
>>>> 
>>>> I've been reviewing the use of the cpu_l2cache_* functions and found
>>>> that 1) they're missing from cpu_witch() and 2) they are always used
>>>> in conjunction with either cpu_idcache_* or cpu_dcache_*.
>>>> 
>>>> Since most CPU variants define them as null ops, isn't it better to
>>>> incorporate the functionality of cpu_l2cache_* in cpu_idcache_* and
>>>> cpu_dcache_* and eliminate them altogether?
>>>> 
>>>> Any objections to me removing cpu_l2cache_* and therefore changing
>>>> the semantics of cpu_idcache_* and cpu_dcahce_* to apply to all
>>>> relevant cache levels?
>> Hi Olivier, good to hear from you,
>> 
>>> I chose to make the l2cache functions separate from the [i]dcache functions
>>> because there's a number of cases where L1 cache flush was needed, but not L2,
>>> and that would be a performance penalty to do both.
>> I'll take it from this that the L2 is PIPT for the Xscale core 3
>> as well, right?
>> 
>>> Also, more CPU variants define them as null ops now, but most new arm cpus
>>> come with a L2 cache,, so we need to think about it carefully.
>> Agreed. If the L2 cache is PIPT, then we should not do tie L1&  L2
>> together and I'd like to change the code to remove the L2 cache
>> operations from most places where we have them now.
> 
> My point is the L2 caches better be PIPT. If the L2 cache are virtual indexed and we do not flush them on context change, then we could have multiple copies in the L2 cache when we share a page and the width of the level 2 cache is larger than a page.
> 
> It only make sense from the hardware design side to make the L2 cache PIPT.

I have no problem with VIVT L2 caches. You deal with L2 anywhere you
deal with L1. In other words, you deal with them in cpu_idcache_* and
cpu_dcache_*.

That's probably also why cpu_l2cache_* is a sub-optimimal name. It's
not so much a distinction between L1 & L2, but rather a distinction
between VIVT & PIPT that is significant here.

>> What I'm thinking about is the following: introduce pmap_switch(),
>> as we have it on ia64. This function is called from cpu_switch to
>> replace the existing active pmap with a new one. In pmap_switch()
>> we flush the VIVT caches *IFF* the new pmap is different from the
>> old (=currently active) pmap. Consequently, we're not going to
>> flush the VIVT caches when we switch between kernel threads, nor
>> do we flush the caches when we switch between threads in the
>> same process. In all other cases we'll flush the VIVT caches.
>> 
>> pmap_switch() is also called when a pmap interface function gets
>> a pmap to work on. The interface function switches the pmap, (if
>> applicable) which may or may not force a VIVT cache operation.
>> The pmap interface function does it's work, after which it switches
>> back to the pmap that was active on entry to the function. This
>> then could also trigger VIVT cache operations.
>> 
>> In any case: I'm thinking that this removes most of the explicit
>> calls to the cache functions while still guaranteing coherency.
>> 
>> I need to look into the aliasing case to see how that is handled.
>> I have some ideas for that too...
>> 
>> Thoughts?
>> 
> There are places we can remove redundant cache operations; pmap_qenter() comes to mind.
> 
> A lot of the cache operations outside of a context switch occur because we share a page within the same  memory map (I think that is what you mean by the aliasing case), we turn access or writing off, and for dma. For VIVT caches, I can't see these operations going away. The page copying and zeroing are other examples and it seems like they need cache operations.

As I said, I need to look at the current implementation, but in
general my thinking is that you allow only 1 of the aliased VAs
to be "active" or mapped. All other VAs that map to the same PA
should cause a page fault. Handling the page fault should then:
1.  flush the VIVT caches for the currently mapped VA.
2.  Remove the currently mapped VA.
3.  Add the new VA->PA mapping to satisfy the page fault.

This assumes that we're not concurrently accessing the data through
multiple aliased VAs.

This, with pmap_switch() seems to be a lot more transparent, easier
to comprehend and hopefully actually solves all outstanding problems
we still have and it's hopefully more optimimal than what we have
now.

I have no proof that these ideas actually work or actually work more
efficiently. That's why I'm discussing it here :-)

...

-- 
Marcel Moolenaar
xcllnt@mac.com