Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 30 May 1996 20:28:41 +0000
From:      Poul-Henning Kamp <phk@critter.tfs.com>
To:        erich@uruk.org
Cc:        freebsd-smp@freebsd.org
Subject:   Re: How do you get the SMP code 
Message-ID:  <777.833488121@critter.tfs.com>
In-Reply-To: Your message of "Thu, 30 May 1996 12:05:31 MST." <199605301905.MAA07776@uruk.org> 

next in thread | previous in thread | raw e-mail | index | archive | help
> You have to do at least one level of pointer indirection for indexing
> which CPU you're using anyway, so this is mostly a redesign of how the
> mapping works.
I agree mostly to this, I just want to make sure that we don't 
overengineer it.

> > Nasty question time:  Can part of the local apic registers be used
> > as "per-cpu registers" without too much performance penalty ?
> 
> It is particular to each processor.  On the Pentium and Pentium Pro,
> accesses are part of the external bus and L2 DCU path respectively.
> I.e. I think it is slower than the L1.
but faster than doing too much aritmetic on the apic_id.  My point being
that any attempt to find the per-cpu data starts out with trying to read
the APIC_ID, so we might as well cache a pointer in the APIC and read
that instead...

I thought about the fact that we don't use %[gf]s in the kernel quite a 
bit, one could make a segment per cpu and have the CPU's differ only in
%gs's contents.  That way we just need to set %gs on entry to the kernel
(in trap/syscall/irq &c) and everything is (moderate) downhill from there,
with the footnote that we have no way of explaining to CC that it should
use the "gs:" prefix, so a lot of ugly inline assembler is needed for it.

> > I would really love to have one or two 32bit registers local per CPU
> > to speed up all this stuff...
> 
> Yes, yes.  We've heard that quite a bit.
Oh, and while your're at it: add a nano-second clock, it doesn't have to 
have nano-sec increments, just units of nano-secs.  And if you have
more space on your silicon, we have more ideas as well :-)

> FWIW:  GCC has a lot of room for improvement...  the "Pentium GCC" work
>        showed that just taking advantage of some x86-isms can get you
>        BOTH 10-30% denser code that's also 10-30% faster.

Oh, sure!

But performance by design is even better :-)

If I can design it so that I only have one level of indirection, then
any version of any compiler will do better :-)

--
Poul-Henning Kamp           | phk@FreeBSD.ORG       FreeBSD Core-team.
http://www.freebsd.org/~phk | phk@login.dknet.dk    Private mailbox.
whois: [PHK]                | phk@ref.tfs.com       TRW Financial Systems, Inc.
Future will arrive by its own means, progress not so.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?777.833488121>