Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 30 May 1996 12:05:31 -0700
From:      erich@uruk.org
To:        Poul-Henning Kamp <phk@critter.tfs.com>
Cc:        freebsd-smp@freebsd.org
Subject:   Re: How do you get the SMP code 
Message-ID:  <199605301905.MAA07776@uruk.org>
In-Reply-To: Your message of "Thu, 30 May 1996 18:01:28 -0000." <511.833479288@critter.tfs.com> 

next in thread | previous in thread | raw e-mail | index | archive | help

Poul-Henning Kamp <phk@critter.tfs.com> writes:

> > It turns out that the whole "NCPU" thing in Linux was (and is, as I
> > have to submit patches because people are constantly breaking it)
> > a pain in the ass.
> > 
> > At the very least, it should be changed to "MAXCPU", with dynamic
> > activation of CPUs up to the maximum (this is what Linux does, though
> > they still call it NCPU).
> That's how our NCPU is intended.  We hope to keep the size to < 64bytes
> per cpu if we can...

Linux only uses one GDT (and it uses one TSS per process).  The locking in
the x86 chips is sufficient for this to work reasonably...  so the "per CPU"
data structures were small enough for "NCPUS" (that's what it was called)
to be 32.

> > Ideally (this is what I'm going to do for Utah's Mach4 distribution),
> > it should just be "options SMP" and dynamically allocate CPU structures
> > as necessary...  and if it doesn't find other an MPS configuration, it
> > will still function without getting confused by the possible lack of
> > a local APIC on the CPU.
> 
> Yes, that would be nice, but until I have some numbers that tell me
> what the extra indirection means, I'm kind of reluctant to do so.

If you look at the typical operations done in most routines, and
set up the indirection carefully in the first place (you may need two
levels...  but it can still be done carefully), then the overhead of
what's done in the routine (certainly if you had to do a trap to get
there in the first place) then one or two pointer indirections for
some parts of the code seem OK.

You have to do at least one level of pointer indirection for indexing
which CPU you're using anyway, so this is mostly a redesign of how the
mapping works.

In essence, the cleanest method I've figured out is two tables:

  [logical CPU number]  -->  [local APIC id]

and

  [local APIC id]  -->  [logical CPU number]

With all the structures being allocated using the logical CPU number
(boot CPU is #0, counting up from there with installed and/or
operational CPUs).

The important point is that the MPS document allows both the boot CPU id
being non-zero, and gaps in the numbering sequence in general.  The
main guarantee is that one CPU will be APIC id 0.

> Nasty question time:  Can part of the local apic registers be used
> as "per-cpu registers" without too much performance penalty ?

It is particular to each processor.  On the Pentium and Pentium Pro,
accesses are part of the external bus and L2 DCU path respectively.
I.e. I think it is slower than the L1.

> I would really love to have one or two 32bit registers local per CPU
> to speed up all this stuff...

Yes, yes.  We've heard that quite a bit.

FWIW:  GCC has a lot of room for improvement...  the "Pentium GCC" work
       showed that just taking advantage of some x86-isms can get you
       BOTH 10-30% denser code that's also 10-30% faster.

--
  Erich Stefan Boleyn                 \_ E-mail (preferred):  <erich@uruk.org>
Mad Genius wanna-be, CyberMuffin        \__      (finger me for other stats)
Web:  http://www.uruk.org/~erich/     Motto: "I'll live forever or die trying"
  This is my home system, so I'm speaking only for myself, not for Intel.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199605301905.MAA07776>