Date: Fri, 11 Jan 2002 21:07:02 -0500 From: Dan Eischen <eischen@vigrid.com> To: Peter Wemm <peter@wemm.org> Cc: Alfred Perlstein <bright@mu.org>, Kelly Yancey <kbyanc@posi.net>, Nate Williams <nate@yogotech.com>, Terry Lambert <tlambert2@mindspring.com>, Archie Cobbs <archie@dellroad.org>, arch@FreeBSD.ORG Subject: Re: Request for review: getcontext, setcontext, etc Message-ID: <3C3F9A46.BBA1A1D5@vigrid.com> References: <20020112005212.5CB2038FF@overcee.netplex.com.au>
next in thread | previous in thread | raw e-mail | index | archive | help
Peter Wemm wrote:
>
> Alfred Perlstein wrote:
> > * Kelly Yancey <kbyanc@posi.net> [020110 13:14] wrote:
> > > On Thu, 10 Jan 2002, Nate Williams wrote:
> > >
> > > > See above. Even in 5.0, we're going to have some threads being switched
> > > > in userland context, while others are switched in the kernel. (KSE is a
> > > > hybrid approach that attempts to gain both the effeciency of userland
> > > > threads with the ability to parallelize the effeciency gains of multiple
> > > > CPU && I/O processing from kernel threads.
> > > >
> > >
> > > OK, I'm going to stick my head in and show my ignorance. If {get,set}cont
> ext
> > > have to be implemented as system calls, then doesn't that eliminate much, i
> f
> > > not all, the gains assumed by having a separate userland scheduler? I mean
> if
> > > we've got to go to the kernel to switch thread contexts, why not just have
> the
> > > kernel track all of the threads and restore context once, just for the curr
> ent
> > > thread, rather than twice (once for the scheduler and another for the
> > > scheduler to switch to the current thread context)?
> >
> > That's the point of this discussion, we're trying to figure out
> > why and if possible how to avoid them being system calls. :)
> >
> > Basically what it seems to come down to are two points:
> >
> > 1) Is atomicity required? (looks like a "no")
>
> Question, why do we have a sigreturn(2) syscall if atomicity isn't required?
> setcontext() is supposed to be able to be used in place of sigreturn().
>
> sigreturn() atomically restores the signal mask and context so that
> unmasking the signal doesn't re-trigger a pending signal before we've
> finished restoring.
There's nothing in the spec about atomicity. NetBSD seems to be
implementing get,resume context in user space also. I think you
need to protect accessing your contexts before they are initialized
anyways and a syscall doesn't do that for you (see Bruce's example
near the beginning of this whole thread).
> > 2) Are states like FP usage trackable from userspace?
> > (looks like a "yes" with some kernel help)
>
> With kernel help, yes. But if you are going to use the kernel to find out
> when to save/restore fp context then you may as well do it all in the
> kernel.
>
> The biggest problem on the x86 implementation is that once you touch the
> fpu at all, you now own a fpu context forever. When we context switch
> a process, we save its active FPU state if[it has an active one] into
> the pcb. When we return to the process, we do *not* load the fpu state
> into the FPU until the process touches it again.
>
> For a userland application to do a swapcontext(), it would have to look
> at the present fpu state (causing a kernel trap, which loads the fpu state
> into the fpu), dump out the registers, switch contexts and load the
> fpu state from the new context into the active fpu registers. If the old
> context hadn't used the FPU and the new context doesn't actually use it before
> switching out to another process, then we've wasted a kernel trap, a two
> fpu state loads and two fpu state saves.
You're assuming that getcontext() gets and saves the current FPU
state. So far we are assuming that it doesn't have to, and swapcontext
wouldn't have to either. swapcontext() would only have to load the
FPU state if the context were gotten by being passed to a signal
handler. [ And I want to fix the kernel so that it places the FPU
state in the sigcontext/ucontext passed to the signal handler. ]
> Specifically:
> 0: cpu_switch() to new process. fpu state not loaded (lazy)
> [no fpu activity at all, so the fpu state is still sitting in the pcb]
> 1: user does swapcontext()
> [process does a sigprocmask(2) syscall when being used outside of libc_r]
> 2: userland swapcontext blindly attempts to save fpu state
Not true.
> 3: kernel traps, and loads fpu context from pcb into fpu registers
> 4: userland swapcontext blindly copys fpu registers to old ucontext_t
> [process does a sigprocmask(2) syscall when being used outside of libc_r]
Again, only true if the context came from a signal handler.
> 5: userland swapcontext blindly copys new ucontext fpu state intp fpu regs
> 6: new context is running...
> [no more fpu activity until timeslice ends]
> 7: cpu_switch copies the active fpu regs into the pcb
>
> So, for no actual fpu activity, we had one kernel trap (stage 3), one
> fpu load context (stage 3), one fpu save context (stage 4), another fpu
> load context (stage 5) and yet another fpu save context (stage 7).
> And when being used outside of libc_r, there are also two system calls!
>
> And all this with not one FPU operation in userland!
>
> Contrast this to a kernel getsetcontext(2) call:
> 0: cpu_switch() to new process, fpu state is not loaded (lazy)
> [no fpu activity at all, so the fpu state is still sitting in the pcb]
> 1: user does swapcontext()
> 2: system call getsetcontext(SWAPCONTEXT, ucontext_t *ocp, ucontext_t *ncp)
> 3: kernel copies old registers into ocp
> 4: kernel copies fpu state from *pcb* into ocp
> [kernel saves sigprocmask if told to via ocp flags, libc_r saves it itself]
> 5: kernel copies new registers from ncp
> 6: kernel copies new fpu state from ncp into *pcb*
> [kernel restores sigprocmask if told to via ncp flags, libc_r saves it itself]
> [return to user in new context]
> [no fpu activity at all, so the fpu state is still sitting in the pcb]
> 7: cpu_switch notices the fpu state is still lazily sitting in the pcb
>
> This time through we dont waste one kernel trap and four fpu load/save
> contexts and enter the kernel only 1 time, versus 1 or 3 times depending on
> whether we're in libc_r or not.
If getcontext (and therefore swapcontext) had to save the FPU state, then
I would agree that a syscall would be better, at least in the case of a
non-threaded application. And the first time I implemented get,set,swap
context was with syscalls, so it's not like I'm that biased towards doing
it in userspace :-) I just want fast context switches for the threads
library, and so far I don't see the real need for syscalls anyways.
--
Dan Eischen
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3C3F9A46.BBA1A1D5>
