From owner-freebsd-smp  Mon Sep  2 17:56:51 1996
Return-Path: owner-smp
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.5/8.7.3) id RAA21990
          for smp-outgoing; Mon, 2 Sep 1996 17:56:51 -0700 (PDT)
Received: from uruk.org (uruk.org [198.145.95.253])
          by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id RAA21985
          for <freebsd-smp@freebsd.org>; Mon, 2 Sep 1996 17:56:46 -0700 (PDT)
From: erich@uruk.org
Received: from loopback (loopback [127.0.0.1]) by uruk.org (8.7.4/8.7.3) with SMTP id RAA19083; Mon, 2 Sep 1996 17:56:49 -0700 (PDT)
Message-Id: <199609030056.RAA19083@uruk.org>
X-Authentication-Warning: uruk.org: Host loopback [127.0.0.1] didn't use HELO protocol
To: Steve Passe <smp@csn.net>
cc: terry@lambert.org, freebsd-smp@freebsd.org, rv@groa.uct.ac.za,
        erich@uruk.org
Subject: Re: SMP on Intel MG15 
In-reply-to: Your message of "Mon, 02 Sep 1996 16:27:37 MDT."
             <199609022227.QAA07488@clem.systemsix.com> 
Date: Mon, 02 Sep 1996 17:56:49 -0700
Sender: owner-smp@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk


[The messages are flying pretty fast on this thread...  I can barely keep
 up ;-)]

Steve Passe <smp@csn.net> writes:

> Hi,
> 
> again, answering 2 messages:
> 
> ===========================================================================
> #1:
> 
> >The evil motherboard that has the problems doesn't have an 82489DX, right?
> >And it is running a local APIC of version 1.x or higher, right?
> correct , and correct
> 
> >In 1.1 compliant mode, I would expect the STARTUP IPI method to work
> >on the board, without any change to the Warm Reset vector, or use of
> >the INIT IPI.  Clearly this fails.

> very clearly, but we just got the 2nd CPU to run to the point where it
> spins on "kern.smp_active == 2".  this was accomplished by setting the
> warmstart vector to point @ the boot code (bootMP), doing the INIT/RESET
> IPI, AND then going on to do the STARTUP IPI.  note that having the
> warmstart vector point to a HLT instruction had NO effect.  it would
> appear that this machine NEEDS (as erich said, thanx erich!) the
> INIT/RESET IPI, AND that vector must actually run the boot code, AND
> it looks like it might even ignore the following STARTUP IPI!!!

It does ignore the following STARTUP IPI completely.

The Intel CPUs only pay attention to the first STARTUP IPI after a reset
(I can't remember where this is documented, but I've both seen it and
talked with a few of the hardware geeks who implemented it in the first
place).

> >I might be misunderstanding the 1.1 specification, but I don't think so

> As I said b4, the 1.4 document contradicts itself in several places, so
> I wouldn't take everything in the 1.1 document as gospel.

I've read the 1.4 document pretty carefully.  In my experience it doesn't
contradict itself, but it is somewhat ambiguous in a few spots.  If you
have some real problems, I can probably dredge up my connections at Intel
and get the complaints to them.

> ===========================================================================
> #2:
> 
> >I think you can HALT everything, and do it one at a time with different
> >"VV" values for the 000VV000h real mode start address as part of the
> >STARTUP IPI with various VV's as vector.
> >
> >This would mean chopping the heck out of the first meg of memory, but
> >it *is* possible to do (assuming the damn thing listens to the STARTUP
> >IPI like it is supposed to for version 1.x or higher local APIC's).

Huh?  Since in general you have to use the warm reset vector, you can
only start ONE of the other CPUs at a time.  Why go around having separate
bootstrap areas for each one?  (you only need a separate stack for each
one, and that's pretty easy)

I'm getting confused here.  Is this trying to state that for every
context switch (or at least for some of them) you're re-running the
startup sequence?

The only time the startup sequence needs to be used is at bootstrap time.
It's perfectly fine to send some other interrupt via IPIs to wake up
a CPU out of the HLT instruction.

> specifically what we get is:
> 
> the 2nd CPU runs the boot code then waits for the sysctl.
> 
> the 2nd CPU sees kern.smp_active == 2 and calls cpu_switch(curproc);
> 
> the system wedges tight.
> 
> Russel suggests that it is because the 2nd CPU's ID is 2, NOT 1, and this
> might cause the lockup.  I believe he is correct, the ID is probably being
> used as an index into some array(s) somewhere (manywheres?).  Russel and I
> are off chasing the source now.

Yes, as I mentioned before:

  --  FreeBSD-SMP presumes the boot CPU is APIC id #0, and the second CPU
      is APIC id #1.  Yes, there are several tables where the APIC id is
      used as an index.

  --  The Intel XXPRESS box has CPUs numbered 0, 2, 3, 4.

I encountered a similar problem until I changed Linux-SMP to use a
"virtual CPU" numbering scheme, where there was an APIC id to internal
CPU number mapping.  The "virtual number" of the boot CPU was always
0, and the others were numbered consecutively from there.

Since reading APIC registers takes a large number of clocks, the hit of
this table lookup will be pretty minimal.  Most code sequences would
end up looking like:

	/* gets APIC id of current CPU, then maps it to the virtual
	   CPU number */
	int cpu_num = get_current_cpu_num();

	...

	/* do rest of stuff here using "cpu_num" */

	...

This will eliminate most of the overhead with lots of SMP code I've
seen which plops the "get_current_cpu_id" thing in every spot where
getting to the CPUs tables is needed.  (This is just carelessness
on the part of the programmers in the first place, presuming that
the APIC registers are as fast as internal registers)

--
  Erich Stefan Boleyn                 \_ E-mail (preferred):  <erich@uruk.org>
Mad Genius wanna-be, CyberMuffin        \__      (finger me for other stats)
Web:  http://www.uruk.org/~erich/     Motto: "I'll live forever or die trying"