Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 25 Nov 1996 22:42:35 +0800
From:      Peter Wemm <peter@spinner.dialix.com>
To:        Poul-Henning Kamp <phk@critter.tfs.com>
Cc:        freebsd-smp@freebsd.org
Subject:   Re: cvs commit: sys/i386/i386 locore.s swtch.s sys/i386/include pmap.h 
Message-ID:  <199611251442.WAA01613@spinner.DIALix.COM>
In-Reply-To: Your message of "Mon, 25 Nov 1996 14:01:32 %2B0100." <1858.848926892@critter.tfs.com> 

next in thread | previous in thread | raw e-mail | index | archive | help
Poul-Henning Kamp wrote:
> I don't doubt that, I just hate the perspectives, that's all :-)

:-)
Actually, it doesn't seem to be turning out to look so bad after all..

> Yes, the initial stuff was certainly not smart about this, and avoiding
> disturbing the pmap as much as possible is undoubtedly a good idea.

You got that right about the "hands off!" nature of the pmap stuff.. :-)
John supposedly understands it, and it still bites him regularly.
 
> Yeah, well, you still have to keep separate PTD's for each CPU, and
> make sure to update them all, AND to tell all the cpu's about it when
> you do -- that's the real trouble I bet.

Not necessarily..  What I've done so far is have a per-cpu page table
page, not a top-level page directory.

Currently pmap_growkernel() already walks all processes in the proc list
and modifies their PTD's.  This gets more complicated once we have real
idle procs, but that's a feature of eliminating them, it's not a side
effect of having per-cpu pages.

Anyway, I'm not afraid of blood...  I could live with the prospect of
setting a "pause" flag, sending an IPI to cause all cpu's to drop everything
and wait, update all the PTD's, then let them go again.  After all,
pmap_growkernel only happens a maximum of 54 times during the lifetime of
the kernel on a standard system, and most of them it seems are during the
first 30 seconds or so as it boots up.
 
> I'm still of the opinion that sticking the logical cpu# in %gs or
> some other >=8bit register we abduct for the purpose, at least whenever
> we enter the kernel, and using that as index into arrays will be less 
> pain, and maybe more efficient on top of that, but since I'm not SMPactive
> at this time I'll not stand in your way...
> And if it works, then hey... I'm game.

.. until we fix and use %fs and %gs.. :-)

> The only real benefit I see to this scheme is that you can put the
> per-cpu idle-kernel-stack somewhere and not worry about it.  As long
> as it fits in the 4K minus the data we stick there.

Actually, I had a more evil plan in mind to start with.  I was thinking
of using fork() to create the process contexts, teach schedule() how not
to assign them to run queues, and double-map their pages into the private
pages to hold the contexts for the idle.  It sure beats initialising the
stack, pcb etc.  But on the other hand, we know how many cpu's are online
when doing pmap_bootstrap(), so we could just as easily allocate the space
then.

> My one particular grief about this is that we will still have to make it
> 
> 	extern struct mpstruct mps;
> 
> 	#define curproc	mps.mp_curproc
> 	...
> 
> To avoid debugger people shooting us.

Again, not necessaily..  I have already thought about this.  We *definately*
need to be able to read the variables in ddb, and not having a variable
"_curproc" etc is a real pain, to say the least.

I think it's more useable to have both..  The individual variables put
at fixed virtual locations that correspond with the C structure packing.

Since we have 4MB of space to play with, I had thought of mapping all
the private pages into the each cpu's private space at the same address.

So, a map could look like this:
page 0: this cpu's private data
page 1: this cpu's PT
page 2: this cpu's idle pcb	- these two pages are the "UPAGES" as 
page 3: this cpu's idle "stack" - such for the idle processes
.. etc
page 64: cpu 0's private page
page 65: cpu 1's private page
...
page 80: cpu 0's private PT
page 81: cpu 1's private PT
..
page 96: cpu 0's "idle" stack
.. etc..
page 512: entire local apic "space" from 0xfee00000->0xfeefffff
page 768: entire IO apic "space" from 0xfec00000->0xfecfffff
page 1023: "end" of per cpu space.

Yes, this is overkill, but it's for free.  the mappings consume no memory
since this way would need a minimum of 12 bytes out of the 4KB page. We
can point the spares to something useful and access it there rather
than consume more precious kernel VM.

This stuff is static after initial boot, with the exception of the idle
PTD's.

But then again, the result would look a damn lot like a hall of mirrors..
 
> And performance/debugging comparisons will be much simpler.  (I know,
> there will be some uglyness in some .s code but that is a minor
> nuisance compared to the benefit I think.
> 
> Please at least consider this detour for a moment.

Yes... I've not burned down any bridges yet.  What I'm trying to do at the
moment is create a convenient place to store this stuff.  Then we can compare
the results.

Going this way eliminates (for each reference to curproc):
- a 32 bit IO operation to the local apic (which Eric said should be
minimised if possible since IO is slow)
- a 32 bit 'and' operation
- a 24/32 bit barrel roll
- a multiplied by 4 index, 32 bit table lookup and dereference
- another 32 bit table lookup and dereference
.. and replaces it with a cacheable ram memory fetch.  The same goes
for curpcb, runtime, possibly 32 run queues, inter-cpu message passing
scratch space, and other stuff that will turn up.

Hmm, why am I sleepy?

Cheers,
-Peter



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199611251442.WAA01613>