From owner-freebsd-smp  Thu May 30 13:28:50 1996
Return-Path: owner-smp
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.5/8.7.3) id NAA20535
          for smp-outgoing; Thu, 30 May 1996 13:28:50 -0700 (PDT)
Received: from tfs.com (tfs.com [140.145.250.1])
          by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id NAA20521
          for <freebsd-smp@freebsd.org>; Thu, 30 May 1996 13:28:45 -0700 (PDT)
Received: from critter.tfs.com by tfs.com (smail3.1.28.1) with SMTP
	id m0uPEKq-0003wnC; Thu, 30 May 96 13:28 PDT
Received: from critter.tfs.com (localhost [127.0.0.1]) by critter.tfs.com (8.7.5/8.6.12) with ESMTP id UAA00779; Thu, 30 May 1996 20:28:42 GMT
To: erich@uruk.org
cc: freebsd-smp@freebsd.org
Subject: Re: How do you get the SMP code 
In-reply-to: Your message of "Thu, 30 May 1996 12:05:31 MST."
             <199605301905.MAA07776@uruk.org> 
Date: Thu, 30 May 1996 20:28:41 +0000
Message-ID: <777.833488121@critter.tfs.com>
From: Poul-Henning Kamp <phk@critter.tfs.com>
Sender: owner-smp@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

> You have to do at least one level of pointer indirection for indexing
> which CPU you're using anyway, so this is mostly a redesign of how the
> mapping works.
I agree mostly to this, I just want to make sure that we don't 
overengineer it.

> > Nasty question time:  Can part of the local apic registers be used
> > as "per-cpu registers" without too much performance penalty ?
> 
> It is particular to each processor.  On the Pentium and Pentium Pro,
> accesses are part of the external bus and L2 DCU path respectively.
> I.e. I think it is slower than the L1.
but faster than doing too much aritmetic on the apic_id.  My point being
that any attempt to find the per-cpu data starts out with trying to read
the APIC_ID, so we might as well cache a pointer in the APIC and read
that instead...

I thought about the fact that we don't use %[gf]s in the kernel quite a 
bit, one could make a segment per cpu and have the CPU's differ only in
%gs's contents.  That way we just need to set %gs on entry to the kernel
(in trap/syscall/irq &c) and everything is (moderate) downhill from there,
with the footnote that we have no way of explaining to CC that it should
use the "gs:" prefix, so a lot of ugly inline assembler is needed for it.

> > I would really love to have one or two 32bit registers local per CPU
> > to speed up all this stuff...
> 
> Yes, yes.  We've heard that quite a bit.
Oh, and while your're at it: add a nano-second clock, it doesn't have to 
have nano-sec increments, just units of nano-secs.  And if you have
more space on your silicon, we have more ideas as well :-)

> FWIW:  GCC has a lot of room for improvement...  the "Pentium GCC" work
>        showed that just taking advantage of some x86-isms can get you
>        BOTH 10-30% denser code that's also 10-30% faster.

Oh, sure!

But performance by design is even better :-)

If I can design it so that I only have one level of indirection, then
any version of any compiler will do better :-)

--
Poul-Henning Kamp           | phk@FreeBSD.ORG       FreeBSD Core-team.
http://www.freebsd.org/~phk | phk@login.dknet.dk    Private mailbox.
whois: [PHK]                | phk@ref.tfs.com       TRW Financial Systems, Inc.
Future will arrive by its own means, progress not so.