From owner-freebsd-smp Mon Sep 2 17:56:51 1996 Return-Path: owner-smp Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id RAA21990 for smp-outgoing; Mon, 2 Sep 1996 17:56:51 -0700 (PDT) Received: from uruk.org (uruk.org [198.145.95.253]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id RAA21985 for ; Mon, 2 Sep 1996 17:56:46 -0700 (PDT) From: erich@uruk.org Received: from loopback (loopback [127.0.0.1]) by uruk.org (8.7.4/8.7.3) with SMTP id RAA19083; Mon, 2 Sep 1996 17:56:49 -0700 (PDT) Message-Id: <199609030056.RAA19083@uruk.org> X-Authentication-Warning: uruk.org: Host loopback [127.0.0.1] didn't use HELO protocol To: Steve Passe cc: terry@lambert.org, freebsd-smp@freebsd.org, rv@groa.uct.ac.za, erich@uruk.org Subject: Re: SMP on Intel MG15 In-reply-to: Your message of "Mon, 02 Sep 1996 16:27:37 MDT." <199609022227.QAA07488@clem.systemsix.com> Date: Mon, 02 Sep 1996 17:56:49 -0700 Sender: owner-smp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk [The messages are flying pretty fast on this thread... I can barely keep up ;-)] Steve Passe writes: > Hi, > > again, answering 2 messages: > > =========================================================================== > #1: > > >The evil motherboard that has the problems doesn't have an 82489DX, right? > >And it is running a local APIC of version 1.x or higher, right? > correct , and correct > > >In 1.1 compliant mode, I would expect the STARTUP IPI method to work > >on the board, without any change to the Warm Reset vector, or use of > >the INIT IPI. Clearly this fails. > very clearly, but we just got the 2nd CPU to run to the point where it > spins on "kern.smp_active == 2". this was accomplished by setting the > warmstart vector to point @ the boot code (bootMP), doing the INIT/RESET > IPI, AND then going on to do the STARTUP IPI. note that having the > warmstart vector point to a HLT instruction had NO effect. it would > appear that this machine NEEDS (as erich said, thanx erich!) the > INIT/RESET IPI, AND that vector must actually run the boot code, AND > it looks like it might even ignore the following STARTUP IPI!!! It does ignore the following STARTUP IPI completely. The Intel CPUs only pay attention to the first STARTUP IPI after a reset (I can't remember where this is documented, but I've both seen it and talked with a few of the hardware geeks who implemented it in the first place). > >I might be misunderstanding the 1.1 specification, but I don't think so > As I said b4, the 1.4 document contradicts itself in several places, so > I wouldn't take everything in the 1.1 document as gospel. I've read the 1.4 document pretty carefully. In my experience it doesn't contradict itself, but it is somewhat ambiguous in a few spots. If you have some real problems, I can probably dredge up my connections at Intel and get the complaints to them. > =========================================================================== > #2: > > >I think you can HALT everything, and do it one at a time with different > >"VV" values for the 000VV000h real mode start address as part of the > >STARTUP IPI with various VV's as vector. > > > >This would mean chopping the heck out of the first meg of memory, but > >it *is* possible to do (assuming the damn thing listens to the STARTUP > >IPI like it is supposed to for version 1.x or higher local APIC's). Huh? Since in general you have to use the warm reset vector, you can only start ONE of the other CPUs at a time. Why go around having separate bootstrap areas for each one? (you only need a separate stack for each one, and that's pretty easy) I'm getting confused here. Is this trying to state that for every context switch (or at least for some of them) you're re-running the startup sequence? The only time the startup sequence needs to be used is at bootstrap time. It's perfectly fine to send some other interrupt via IPIs to wake up a CPU out of the HLT instruction. > specifically what we get is: > > the 2nd CPU runs the boot code then waits for the sysctl. > > the 2nd CPU sees kern.smp_active == 2 and calls cpu_switch(curproc); > > the system wedges tight. > > Russel suggests that it is because the 2nd CPU's ID is 2, NOT 1, and this > might cause the lockup. I believe he is correct, the ID is probably being > used as an index into some array(s) somewhere (manywheres?). Russel and I > are off chasing the source now. Yes, as I mentioned before: -- FreeBSD-SMP presumes the boot CPU is APIC id #0, and the second CPU is APIC id #1. Yes, there are several tables where the APIC id is used as an index. -- The Intel XXPRESS box has CPUs numbered 0, 2, 3, 4. I encountered a similar problem until I changed Linux-SMP to use a "virtual CPU" numbering scheme, where there was an APIC id to internal CPU number mapping. The "virtual number" of the boot CPU was always 0, and the others were numbered consecutively from there. Since reading APIC registers takes a large number of clocks, the hit of this table lookup will be pretty minimal. Most code sequences would end up looking like: /* gets APIC id of current CPU, then maps it to the virtual CPU number */ int cpu_num = get_current_cpu_num(); ... /* do rest of stuff here using "cpu_num" */ ... This will eliminate most of the overhead with lots of SMP code I've seen which plops the "get_current_cpu_id" thing in every spot where getting to the CPUs tables is needed. (This is just carelessness on the part of the programmers in the first place, presuming that the APIC registers are as fast as internal registers) -- Erich Stefan Boleyn \_ E-mail (preferred): Mad Genius wanna-be, CyberMuffin \__ (finger me for other stats) Web: http://www.uruk.org/~erich/ Motto: "I'll live forever or die trying"