Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 05 Jul 1996 10:42:23 +0800
From:      Peter Wemm <peter@spinner.dialix.com>
To:        Terry Lambert <terry@lambert.org>
Cc:        erich@uruk.org, freebsd-smp@freebsd.org
Subject:   Re: Running SMP 
Message-ID:  <199607050242.KAA18450@spinner.DIALix.COM>
In-Reply-To: Your message of "Thu, 04 Jul 1996 17:32:29 MST." <199607050032.RAA13959@phaeton.artisoft.com> 

next in thread | previous in thread | raw e-mail | index | archive | help
>>>Terry Lambert said:
> > It's still important to get the details on FreeBSD-SMP's model of how
> > it uses the x86 data structures such as the TSS.  Getting an accurate
> > image of this is particularly important to how the > 2 CPU generalization
> > goes (i.e. what static data is really necessary here...  allocating
> > something upwards of 64K per CPU statically at compile-time would be
> > very annoying).  I still haven't looked through that part yet, admittedly.
> > 
> > Linux-SMP has one GDT with one TSS per process, simply guaranteeing that
> > no more than one processor will access any particular TSS at a time.
> > 
> > Could someone comment on how FreeBSD currently does this in more detail ?
> 
> David?  John?  Pohl?

We have one tss structure per process, it is stored at the beginning of 
the user area in a fixed virtual address.  The GDT has a single slot 
pointing to the virtual address of the tss, and all processes use this 
same slot number in their task register.

I want to change this so that we have a hybrid dynamic GDT (ie: some fixed 
entries, and an extentable component) and have the 'struct i386tss' in 
kernel space at a floating address (referenced by proc->p_addr rather than 
the fixed virtual address).  This means doing a 'ltr' at task switch time, 
which is no big deal.  This simply causes the process context to be saved 
in the new location, it's not hardware (slow) task switching.  A dynamic 
GDT means we can *really* do VM86 properly, as well as support the dos 
emulator and the willows stuff properly.

However, Bruce Evans is/was working in a different direction.  He wants to 
remove the 'struct i386tss' from the process context and use a much 
smaller structure.  The structure that holds the user process context at 
interrupt would be instead be in a static place and itself context 
switched on traps etc. If I understand what he's doing, this will make SMP 
support harder, and makes VM86, user_ldt etc stuff harder too, because 
some processes will need a full i386tss, while others will use the shared 
one etc, and as well, for SMP, you need a shared tss per cpu, and need to 
figure out which one you are using at trap/context switch time etc.  I'm 
not sold on Bruce's idea yet, I've got my "wait and see" hat on, to see 
how much work it'll cause us on SMP.

What I've implemented, is remove the per-process kernel stack and 'struct 
user' pages from the process address space, and cleaning up the code as a 
result.  This will allow pure address space sharing between processes for 
the libc_r pthread support, since when one process changes it's VM space, 
it automatically appears in the others by virtue of the fact that they 
have the *same* page tables, vm_map and vmspace.  This can only be 
completed once the user pages are completely gone.

As for SMP, I've left things on the back burner for an uncomfortably long 
time now, I must get back to it.  What I've written (but not 
polished/tested/debugged yet) is:
- MP config table parsing
- Safe boot straps of the non-booting cpus (ie: generate a PTD with the 
0MB P==V mapping)
- this PTD is used on a per-cpu basis for the idle loop (yes, it's back!)
- the idle procs are gone.
- As far as I can tell, it'll support N cpus cleanly.
- a hack sysctl handler that takes string writes and parses them, 
theoretically it should allow fine-grained halt/start/on/offline etc of 
cpus, but for now it just understands "boot"
- code to do message passing via IPI (eg: force reschedule of non-primary 
processors, this is suboptimal as there will be a thundering herd trying 
to enter the kernel.  This would probably be better done with the timer in 
the local apic)
- move most of the remaining initialisation code out of locore and into a 
seperate module.  There is very little SMP activity at boot, and it should 
be possible to run almost (but not quite) identically as a non-smp kernel 
would.  (curproc runtime etc would still be intercepted).  The boot code 
does not do much other than locate the MP block and preserve it to try and 
prevent the VM system from trashing it if it's below 640K, and reserve 
some space for the trampoline code.

I've thought of a few ways (and talked to phk about it some time ago) to 
do scheduling biasing to try and get processes on the same cpu where 
possible to get the benefit of the on-chip cache.  I think this will be 
critical for P6 support as the 256K/512K L2 cache will generate a lot of 
MESI traffic for invalidations when we miss the cpu. 

Does anybody have documentation on the IO apic?  I've got enough detail on 
the local apic, but the IO apic is a real problem.  We're stuck in the 
painful "dumb" mode where all cpus get all interrupts in parallel with 
each other until we can get some details on the IO apic.  (somebody 
pointed me to something from intel, but I can't find the reference anymore)

> 					Terry Lambert
> 					terry@lambert.org
> ---
> Any opinions in this posting are my own and not those of my present
> or previous employers.

Cheers,
-Peter





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199607050242.KAA18450>