From owner-freebsd-current Tue Jul 13 3:15:35 1999 Delivered-To: freebsd-current@freebsd.org Received: from alcanet.com.au (border.alcanet.com.au [203.62.196.10]) by hub.freebsd.org (Postfix) with ESMTP id 9F11614D70 for ; Tue, 13 Jul 1999 03:15:21 -0700 (PDT) (envelope-from jeremyp@gsmx07.alcatel.com.au) Received: by border.alcanet.com.au id <40353>; Tue, 13 Jul 1999 19:55:41 +1000 Date: Tue, 13 Jul 1999 20:13:31 +1000 From: Peter Jeremy Subject: Re: "objtrm" problem probably found (was Re: Stuck in "objtrm") In-reply-to: <199907130501.WAA74171@apollo.backplane.com> To: dillon@apollo.backplane.com Cc: freebsd-current@FreeBSD.ORG Message-Id: <99Jul13.195541est.40353@border.alcanet.com.au> Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Matthew Dillon wrote: >:I'm not sure there's any reason why you shouldn't. If you changed the >:semantics of a stack segment so that memory addresses below the stack >:pointer were irrelevant, you could implement a small, 0-cycle, on-chip >:stack (that overflowed into memory). > > This would be relatively complex and also results in cache coherency > problems. I agree that there would be additional complexity. I believe that the `on-chip stack cache' part has been implemented on some Forth chips (where stack performance is rather critical), though I don't know whether any of them were MP-capable. My reason for suggesting the change to stack semantics was also to allow cache line allocation without a memory fetch (ie if SP=1000, a push would result in ff0..fff (or fe0..fff) being allocated as a cache line without bothering to fetch ff0..ffb). I'm not sure whether this change would actually provide a measurable improvement though (I suspect that it wouldn't). In this case, I believe cache coherency can be bypassed. The stack segment is only needed on one processor at a time. If there's an interrupt on that CPU, the on-chip stack would flush to memory so that the memory image was consistent. At the minimal end, another way of looking at it would be as an `invisible' branch-and-link register - capable of saving a single return address as long as nothing else was pushed onto the stack. > A solution already exists: It's called branch-and-link, One case where the IBM/360 accidently got it right :-). > but Intel cpu's do not use it because Intel cpu's do not have enough > registers (makes you just want to throw up -- all that MMX junk and they > couldn't add a branch and link register! ). But all that MMX junk makes Doom (or whatever) look much better and that's far more critical :-). > The key with branch-and-link > is that the lowest subroutine level does not have to save/restore the > register, making entry and return two or three times faster then > subroutine calls that make other subroutine calls. I seem to recall reading somewhere that leaf subroutine performance is also fairly important for overall performance (though that may have been before C-compilers learnt how to in-line functions). Peter To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message