From owner-freebsd-current  Tue Jul 13  3:15:35 1999
Delivered-To: freebsd-current@freebsd.org
Received: from alcanet.com.au (border.alcanet.com.au [203.62.196.10])
	by hub.freebsd.org (Postfix) with ESMTP id 9F11614D70
	for <freebsd-current@FreeBSD.ORG>; Tue, 13 Jul 1999 03:15:21 -0700 (PDT)
	(envelope-from jeremyp@gsmx07.alcatel.com.au)
Received: by border.alcanet.com.au id <40353>; Tue, 13 Jul 1999 19:55:41 +1000
Date: Tue, 13 Jul 1999 20:13:31 +1000
From: Peter Jeremy <jeremyp@gsmx07.alcatel.com.au>
Subject: Re: "objtrm" problem probably found (was Re: Stuck in "objtrm")
In-reply-to: <199907130501.WAA74171@apollo.backplane.com>
To: dillon@apollo.backplane.com
Cc: freebsd-current@FreeBSD.ORG
Message-Id: <99Jul13.195541est.40353@border.alcanet.com.au>
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Matthew Dillon <dillon@apollo.backplane.com> wrote:
>:I'm not sure there's any reason why you shouldn't.  If you changed the
>:semantics of a stack segment so that memory addresses below the stack
>:pointer were irrelevant, you could implement a small, 0-cycle, on-chip
>:stack (that overflowed into memory).
>
>    This would be relatively complex and also results in cache coherency
>    problems.

I agree that there would be additional complexity.  I believe that the
`on-chip stack cache' part has been implemented on some Forth chips
(where stack performance is rather critical), though I don't know
whether any of them were MP-capable.

My reason for suggesting the change to stack semantics was also to
allow cache line allocation without a memory fetch (ie if SP=1000,
a push would result in ff0..fff (or fe0..fff) being allocated as
a cache line without bothering to fetch ff0..ffb).  I'm not sure
whether this change would actually provide a measurable improvement
though (I suspect that it wouldn't).

In this case, I believe cache coherency can be bypassed.  The stack
segment is only needed on one processor at a time.  If there's an
interrupt on that CPU, the on-chip stack would flush to memory so
that the memory image was consistent.

At the minimal end, another way of looking at it would be as an
`invisible' branch-and-link register - capable of saving a single
return address as long as nothing else was pushed onto the stack.

> A solution already exists:  It's called branch-and-link,
One case where the IBM/360 accidently got it right :-).

>    but Intel cpu's do not use it because Intel cpu's do not have enough
>    registers (makes you just want to throw up -- all that MMX junk and they
>    couldn't add a branch and link register! ).
But all that MMX junk makes Doom (or whatever) look much better
and that's far more critical :-).

>  The key with branch-and-link
>    is that the lowest subroutine level does not have to save/restore the 
>    register, making entry and return two or three times faster then 
>    subroutine calls that make other subroutine calls.
I seem to recall reading somewhere that leaf subroutine performance
is also fairly important for overall performance (though that may
have been before C-compilers learnt how to in-line functions).

Peter


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message