From owner-freebsd-smp Sun Sep 22 15:29:22 2002 Delivered-To: freebsd-smp@freebsd.org Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 80EEC37B401 for ; Sun, 22 Sep 2002 15:29:19 -0700 (PDT) Received: from scaup.mail.pas.earthlink.net (scaup.mail.pas.earthlink.net [207.217.120.49]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1A77943E42 for ; Sun, 22 Sep 2002 15:29:19 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0398.cvx40-bradley.dialup.earthlink.net ([216.244.43.143] helo=mindspring.com) by scaup.mail.pas.earthlink.net with esmtp (Exim 3.33 #1) id 17tFE4-0001ZU-00; Sun, 22 Sep 2002 15:29:17 -0700 Message-ID: <3D8E4264.D7BE0E80@mindspring.com> Date: Sun, 22 Sep 2002 15:21:24 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: beemern Cc: smp@freebsd.org Subject: Re: For those with P4 SMP problems.. References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org beemern wrote: > Terry Lambert wrote: > > The theory is that the BIOS has the corect information, but in the > > wrong order, > > so the bios is fubar No. You said Linux and Windows don't have problems starting the other processors on this box, correct? > so couldn't we simply re-init from the register? since apparantly things > are getting confused in the intermediate data structures? > (perhaps thru mptable_pass1 and/or mptable_pass2 ?) You could try this. I expect that FreeBSD is taking the BIOS contents more literally than the other OSs you claimed ran without problems. I think this probably won't work, though, because it's (kind of) what John Baldwin already suggested: "...so it is consistent everywhere..." ...in other words, it's based on the BIOS contents being known to be wrong, and that the *precise* way in which they are wrong is that the BIOS idea of how the APIC ID pins are wired up or down on the motherboard doesn't match the 5 bit APIC ID value that's recorded in the BIOS. John's "Grrr." there is related to the fact that just futzing the number isn't as global as it should be; to get around this, you could hack the value that gets stored in the physical-to-logical table. One problem where is that there is not a reverse reference out of there that's used internally; instead, the local APIC ID is read from the CPU itself, either by reading the local APIC ID register, or by executing a CPUID with an EAX of 1, and then taking the value from the EBX register. If the forward mapping doesn't match the reverse, then there will be a problem there as well. The FreeBSD APIC timer loop will fail, actually, because the IPI that was expected doesn't happen (FreeBSD cares about the source ID, because it IPI's the BSP to indicate it's alive). If you really want to understand this, rather than just fixing it, then this is probably not the correct place to look; a mailing list will give you people's answers to your questions, but every person is going to be operating from an imperfect understanding of their own, anyway (yes, even me ;^)). So you should probably get a copy of the official documentation: IA-32 Intel Architecture Softwar DEveloper's Manual Volume 3: System Programming Guide Intel Order Numer 245472-007 And pay special attention to: Chapter 8 Advanced Programmable Interrupt Controller (APIC) 8.4.3 - 8.4.7 > > them start the APs simultaneously, and a side effect of this is that > > they don't care about order of start, they just care *that* they start > > but we've got only one AP, so simultaneous startup shouldnt be an issue? No, but the same solution solves your problem, too (I think, if what you said about Linux working is true). > dont mean to be obtuse, just tryin to get a grip on whats happening and > what is supposed to be No problem; my suggestions have all been based on the idea that "Linux works", as you claimed in: The other possibility here is that the APICs are software disabled by the disabling of Hyperthreading. So if you enable Hyperthreadin in the BIOS, it may be that the non-hyperthreading Local APIC is no longer disabled (i.e. maybe they are coupled). Since no one else has complained, except for the chipset (John Baldwin asked about this, and so far there have been no responses), it *could* be the programming of the I/O APIC. The I/O APIC for P4 and Xeon processors is ont he other side of a PCI bridge, not the 3 wire APIC bus; if it were *all* P4 and Xeon based boxes, then we could blame that, but it's not, so we can't. That leaves a chipset specific problem, which, while posible, is pretty hard to credit, or a BIOS vs. motherboard wiring mismatch (which is what we've all been concentrating on in this discusion, since there is a long and glorious history of BIOS people who don't get the hardware information right. > right now i'm just doing a lot of rebooting with checkpoints in the code > so i can get a handle on the boot process and what code is doing what etc > etc The most important thing to keep in mind, IMO, is that the only way you know whats happing on anything other than the BSP, at least at the star, is that the AP sends an IPI to the BSP, and the BSP does the reporting. So checkpoints aren't necessarily going to be very helpful to you in tracking down the problem here. 8-(. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message