FreeBSD Mail Archives

Date:      Thu, 05 Nov 1998 22:28:46 +0800
From:      Peter Wemm <peter@netplex.com.au>
To:        Ville-Pertti Keinonen <will@iki.fi>
Cc:        james@westongold.com, current@FreeBSD.ORG
Subject:   Re: Kernel threading (was Re: Thread Scheduler bug) 
Message-ID:  <199811051428.WAA06376@spinner.netplex.com.au>
In-Reply-To: Your message of "05 Nov 1998 11:20:45 GMT." <19981105112045.668.qmail@ns.oeno.com>

index | next in thread | previous in thread | raw e-mail

Ville-Pertti Keinonen wrote:
> 
> > Not quite..  Each process could have one page directory for each thread, 
> > up to the number of cpus.  If you only have two threads, but 4 cpus, then 
> > you still only need 2 directories.
> 
> True, I was silently assuming that there are more threads than cpus.
> It doesn't change the fact that the resource requirements don't seem
> reasonable.  Pages are huge.

Yes.

> > We have to do something like this already because of the per-cpu pages in 
> > kernel space.  However, the PTD slot is outside the reach of the user 
> > process segment limits so we can't use one of the unused page table page 
> > slots because the address that it corresponds to is outside the user 
> > address space.
> 
> The existing per-cpu pages might be worth getting rid of.
> 
> I'm not sure how much data is currently kept there, but the overhead
> of APIC ID indexing shouldn't be too bad.

We've done it that way before and it was a real pain in the backside. There
are a number of disincentives:
 - apic id's are all over the place.  0, 12 and 13 are common with P6's.
   Having 16 slots in arrays for all the per-cpu variables is not nice.
 - converting physical id's back into compacted logical id's is OK from
   C but a pain in assembler.
 - converting things from variables to macros shows up other things.  I
   seem to recall some places where "curproc" was referenced over and
   over again in loops and the like.
 - we have to have different binaries modules/lkm's for SMP and non-smp
   kernels.
 - accessing the local apic is *much* slower than a memory access (according
   to one of the intel people who told us to try and do it this way if we
   could).
 - it was a lot of pain to get working in the first place.

> > The only option would be to take over another PTD slot in reach of user 
> > space, that would cost 256K of virtual address space from the user and 
> > would cost a 4K page for the PTP as well as the data page.
> 
> 256k?  Don't you mean 4MB?

Sorry, yes.

> I don't think a user address space should contain magic unless it has
> been explicitly requested by the program.
> 
> Multithreaded programs are going to have to perform user-space
> initialization, in any case.  Explicitly mmapping special kernel pages
> might not be out of the question.
> 
> It still wouldn't be nice to either waste a whole page for a bit of
> thread-specific data or reveal inconsistent information on the rest of
> the page.

Yep.

> > Hmmmmm..  I wonder..  We might be able to create a GDT slot that maps up 
> > into the per-cpu pages with user priviliges, then have an assembler 
> > routine that (say) loads %fs with the descriptor index, accesses the data 
> > relative to the %fs segment, then restores the %fs register.  We could get 
> 
> That's starting to sound like a reasonable idea.
> 
> Didn't FreeBSD not even save %fs?  The kernel could always set it
> before going into user mode, so that it doesn't need to be loaded for
> each access.  The kernel must either save or set it to make it usable
> at all.

It used to not be preserved or context switched.  It is now.

> Setting it in the kernel is probably better, to avoid changing things
> like sigcontext, and if per-cpu pages are not used, things can still
> work since the kernel can set the register to a cpu-specific segment.

It still needs assembler support in the user land threading system, 
because gcc will not (I think) generate code to use segment loads and 
prefixes by itself.  The point is that if a descriptor table slot is 
available, any segment register can be used to select it.

John wrote something to use the LDT on a per-thread basis, I never really 
sat down to see what he did.

But, if we provide a   void *thread_getpointer()  and
void thread_setpointer(void *)  in libc, this just has to be a few 
assembler instructions for setting %fs, using a %fs data reference, 
restoring %fs and returning the data.

rfork() could "set" %fs for the child to tell it what slot to use.  If it 
wished to leave %fs untouched, it could use it at any time.  Otherwise it 
would have to store it somewhere.  It would be the same value for all 
processes on the system.

> IMHO the alternative of using aligned thread stacks is not a bad idea,
> either.  It's fast and portable and it doesn't require anything evil
> to be done by the kernel.  The restrictions placed on what can be done
> with the threads are a bit nasty, though.

The bit that I don't like about it is that it forces all the stacks to have
the same upper limit size.  That could be a bit wasteful of address space,
or could leave you short on room to grow the stack.  Incidently, I'd like a
special mmap() option to provide a real grow-down stack in a specified
region.  mmap()ing a few hundred kb of stack from anonymous swap times a
few hundred threads adds up on the size counter.

Also, while on the subject, something Julian said has got me thinking about
creating kernel threads on the fly when and if a user thread happens to
block.  This sounds rather interesting.. - async IO is done this way too I
think.  It would require a fair amount of cooperation between the thread 
"engine" and the kernel, perhaps by having an executive thread of sorts 
that handled the kernel interation and thread activation.

Cheers,
-Peter

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message

help

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199811051428.WAA06376>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation