Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 22 Sep 2002 15:21:24 -0700
From:      Terry Lambert <tlambert2@mindspring.com>
To:        beemern <beemern@ksu.edu>
Cc:        smp@freebsd.org
Subject:   Re: For those with P4 SMP problems..
Message-ID:  <3D8E4264.D7BE0E80@mindspring.com>
References:  <Pine.GSO.4.33L.0209221549140.9893-100000@unix2.cc.ksu.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
beemern wrote:
> Terry Lambert wrote:
> > The theory is that the BIOS has the corect information, but in the
> > wrong order,
> 
> so the bios is fubar

No.  You said Linux and Windows don't have problems starting the
other processors on this box, correct?


> so couldn't we simply re-init from the register? since apparantly things
> are getting confused in the intermediate data structures?
> (perhaps thru mptable_pass1 and/or mptable_pass2 ?)

You could try this.  I expect that FreeBSD is taking the BIOS
contents more literally than the other OSs you claimed ran
without problems.

I think this probably won't work, though, because it's (kind of)
what John Baldwin already suggested:

<http://docs.freebsd.org/cgi/getmsg.cgi?fetch=77474+0+archive/2002/freebsd-smp/20020922.freebsd-smp>;

"...so it is consistent everywhere..."

...in other words, it's based on the BIOS contents being known to
be wrong, and that the *precise* way in which they are wrong is
that the BIOS idea of how the APIC ID pins are wired up or down
on the motherboard doesn't match the 5 bit APIC ID value that's
recorded in the BIOS.

John's "Grrr." there is related to the fact that just futzing the
number isn't as global as it should be; to get around this, you
could hack the value that gets stored in the physical-to-logical
table.

One problem where is that there is not a reverse reference out of
there that's used internally; instead, the local APIC ID is read
from the CPU itself, either by reading the local APIC ID register,
or by executing a CPUID with an EAX of 1, and then taking the
value from the EBX register.  If the forward mapping doesn't match
the reverse, then there will be a problem there as well.  The
FreeBSD APIC timer loop will fail, actually, because the IPI that
was expected doesn't happen (FreeBSD cares about the source ID,
because it IPI's the BSP to indicate it's alive).

If you really want to understand this, rather than just fixing it,
then this is probably not the correct place to look; a mailing
list will give you people's answers to your questions, but every
person is going to be operating from an imperfect understanding
of their own, anyway (yes, even me ;^)).  So you should probably
get a copy of the official documentation:

	IA-32 Intel Architecture Softwar DEveloper's Manual
	Volume 3: System Programming Guide
	Intel Order Numer 245472-007

And pay special attention to:

	Chapter 8
	Advanced Programmable Interrupt Controller (APIC)
	8.4.3 - 8.4.7


> > them start the APs simultaneously, and a side effect of this is that
> > they don't care about order of start, they just care *that* they start
> 
> but we've got only one AP, so simultaneous startup shouldnt be an issue?

No, but the same solution solves your problem, too (I think, if what
you said about Linux working is true).


> dont mean to be obtuse, just tryin to get a grip on whats happening and
> what is supposed to be

No problem; my suggestions have all been based on the idea that
"Linux works", as you claimed in:

<http://docs.freebsd.org/cgi/getmsg.cgi?fetch=1878+0+archive/2002/freebsd-smp/20020908.freebsd-smp>;

The other possibility here is that the APICs are software disabled
by the disabling of Hyperthreading.  So if you enable Hyperthreadin
in the BIOS, it may be that the non-hyperthreading Local APIC is no
longer disabled (i.e. maybe they are coupled).

Since no one else has complained, except for the chipset (John
Baldwin asked about this, and so far there have been no responses),
it *could* be the programming of the I/O APIC.  The I/O APIC for
P4 and Xeon processors is ont he other side of a PCI bridge, not
the 3 wire APIC bus; if it were *all* P4 and Xeon based boxes,
then we could blame that, but it's not, so we can't.  That leaves
a chipset specific problem, which, while posible, is pretty hard
to credit, or a BIOS vs. motherboard wiring mismatch (which is
what we've all been concentrating on in this discusion, since there
is a long and glorious history of BIOS people who don't get the
hardware information right.


> right now i'm just doing a lot of rebooting with checkpoints in the code
> so i can get a handle on the boot process and what code is doing what etc
> etc

The most important thing to keep in mind, IMO, is that the only
way you know whats happing on anything other than the BSP, at least
at the star, is that the AP sends an IPI to the BSP, and the BSP
does the reporting.  So checkpoints aren't necessarily going to be
very helpful to you in tracking down the problem here.  8-(.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3D8E4264.D7BE0E80>