Date: Fri, 05 Jul 1996 10:42:23 +0800 From: Peter Wemm <peter@spinner.dialix.com> To: Terry Lambert <terry@lambert.org> Cc: erich@uruk.org, freebsd-smp@freebsd.org Subject: Re: Running SMP Message-ID: <199607050242.KAA18450@spinner.DIALix.COM> In-Reply-To: Your message of "Thu, 04 Jul 1996 17:32:29 MST." <199607050032.RAA13959@phaeton.artisoft.com>
next in thread | previous in thread | raw e-mail | index | archive | help
>>>Terry Lambert said: > > It's still important to get the details on FreeBSD-SMP's model of how > > it uses the x86 data structures such as the TSS. Getting an accurate > > image of this is particularly important to how the > 2 CPU generalization > > goes (i.e. what static data is really necessary here... allocating > > something upwards of 64K per CPU statically at compile-time would be > > very annoying). I still haven't looked through that part yet, admittedly. > > > > Linux-SMP has one GDT with one TSS per process, simply guaranteeing that > > no more than one processor will access any particular TSS at a time. > > > > Could someone comment on how FreeBSD currently does this in more detail ? > > David? John? Pohl? We have one tss structure per process, it is stored at the beginning of the user area in a fixed virtual address. The GDT has a single slot pointing to the virtual address of the tss, and all processes use this same slot number in their task register. I want to change this so that we have a hybrid dynamic GDT (ie: some fixed entries, and an extentable component) and have the 'struct i386tss' in kernel space at a floating address (referenced by proc->p_addr rather than the fixed virtual address). This means doing a 'ltr' at task switch time, which is no big deal. This simply causes the process context to be saved in the new location, it's not hardware (slow) task switching. A dynamic GDT means we can *really* do VM86 properly, as well as support the dos emulator and the willows stuff properly. However, Bruce Evans is/was working in a different direction. He wants to remove the 'struct i386tss' from the process context and use a much smaller structure. The structure that holds the user process context at interrupt would be instead be in a static place and itself context switched on traps etc. If I understand what he's doing, this will make SMP support harder, and makes VM86, user_ldt etc stuff harder too, because some processes will need a full i386tss, while others will use the shared one etc, and as well, for SMP, you need a shared tss per cpu, and need to figure out which one you are using at trap/context switch time etc. I'm not sold on Bruce's idea yet, I've got my "wait and see" hat on, to see how much work it'll cause us on SMP. What I've implemented, is remove the per-process kernel stack and 'struct user' pages from the process address space, and cleaning up the code as a result. This will allow pure address space sharing between processes for the libc_r pthread support, since when one process changes it's VM space, it automatically appears in the others by virtue of the fact that they have the *same* page tables, vm_map and vmspace. This can only be completed once the user pages are completely gone. As for SMP, I've left things on the back burner for an uncomfortably long time now, I must get back to it. What I've written (but not polished/tested/debugged yet) is: - MP config table parsing - Safe boot straps of the non-booting cpus (ie: generate a PTD with the 0MB P==V mapping) - this PTD is used on a per-cpu basis for the idle loop (yes, it's back!) - the idle procs are gone. - As far as I can tell, it'll support N cpus cleanly. - a hack sysctl handler that takes string writes and parses them, theoretically it should allow fine-grained halt/start/on/offline etc of cpus, but for now it just understands "boot" - code to do message passing via IPI (eg: force reschedule of non-primary processors, this is suboptimal as there will be a thundering herd trying to enter the kernel. This would probably be better done with the timer in the local apic) - move most of the remaining initialisation code out of locore and into a seperate module. There is very little SMP activity at boot, and it should be possible to run almost (but not quite) identically as a non-smp kernel would. (curproc runtime etc would still be intercepted). The boot code does not do much other than locate the MP block and preserve it to try and prevent the VM system from trashing it if it's below 640K, and reserve some space for the trampoline code. I've thought of a few ways (and talked to phk about it some time ago) to do scheduling biasing to try and get processes on the same cpu where possible to get the benefit of the on-chip cache. I think this will be critical for P6 support as the 256K/512K L2 cache will generate a lot of MESI traffic for invalidations when we miss the cpu. Does anybody have documentation on the IO apic? I've got enough detail on the local apic, but the IO apic is a real problem. We're stuck in the painful "dumb" mode where all cpus get all interrupts in parallel with each other until we can get some details on the IO apic. (somebody pointed me to something from intel, but I can't find the reference anymore) > Terry Lambert > terry@lambert.org > --- > Any opinions in this posting are my own and not those of my present > or previous employers. Cheers, -Peter
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199607050242.KAA18450>