Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 11 Jan 2002 18:54:17 -0700
From:      Nate Williams <nate@yogotech.com>
To:        Peter Wemm <peter@wemm.org>
Cc:        Alfred Perlstein <bright@mu.org>, Kelly Yancey <kbyanc@posi.net>, Nate Williams <nate@yogotech.com>, Terry Lambert <tlambert2@mindspring.com>, Daniel Eischen <eischen@pcnet1.pcnet.com>, Dan Eischen <eischen@vigrid.com>, Archie Cobbs <archie@dellroad.org>, arch@FreeBSD.ORG
Subject:   Re: Request for review: getcontext, setcontext, etc 
Message-ID:  <15423.38729.468367.222867@caddis.yogotech.com>
In-Reply-To: <20020112005212.5CB2038FF@overcee.netplex.com.au>
References:  <20020110135217.M7984@elvis.mu.org> <20020112005212.5CB2038FF@overcee.netplex.com.au>

next in thread | previous in thread | raw e-mail | index | archive | help
Peter Wemm writes:

[ Thanks for shedding some light on the subject Peter! ]

> The biggest problem on the x86 implementation is that once you touch the
> fpu at all, you now own a fpu context forever.  When we context switch
> a process, we save its active FPU state if[it has an active one] into
> the pcb.  When we return to the process, we do *not* load the fpu state
> into the FPU until the process touches it again.

So, the 'lazy binding' is done whenever the process touches the FPU
again, not at process context switch time?  This is different than what
I expected, but it makes sense since it means that we may not have to
save/load the context for every context switch from that point forward
(if the process no longer does any FPU operations).

> For a userland application to do a swapcontext(), it would have to
> look at the present fpu state (causing a kernel trap, which loads the
> fpu state into the fpu), dump out the registers, switch contexts and
> load the fpu state from the new context into the active fpu registers.
> If the old context hadn't used the FPU and the new context doesn't
> actually use it before switching out to another process, then we've
> wasted a kernel trap, a two fpu state loads and two fpu state saves.

If we are stupid, we waste two loads and two saves, saving all of the
overhead of a kernel trap and such.  It would interesting to measure
this overhead vs. the overhead of making a kernel trap to check if the
saves/loads are necessary.

> Specifically:
> 0: cpu_switch() to new process. fpu state not loaded (lazy)
> [no fpu activity at all, so the fpu state is still sitting in the pcb]
> 1: user does swapcontext()
> [process does a sigprocmask(2) syscall when being used outside of libc_r]
> 2: userland swapcontext blindly attempts to save fpu state

Unless it does the FPU state load w/out the kernel's help, which all
userland thread libraries have done (up til this point, perhaps
bogusly).

> 3: kernel traps, and loads fpu context from pcb into fpu registers

Why is there a kernel trap here?  Is it because we're doing FPU
operations, and hence the state must be loaded?

> 4: userland swapcontext blindly copys fpu registers to old ucontext_t
> [process does a sigprocmask(2) syscall when being used outside of libc_r]
> 5: userland swapcontext blindly copys new ucontext fpu state intp fpu regs
> 6: new context is running...
> [no more fpu activity until timeslice ends]
> 7: cpu_switch copies the active fpu regs into the pcb
> 
> So, for no actual fpu activity, we had one kernel trap (stage 3)

I'm still not sure I completely understand why the kernel trap happens.
Can you help out here?

> one
> fpu load context (stage 3), one fpu save context (stage 4), another fpu
> load context (stage 5) and yet another fpu save context (stage 7).
> And when being used outside of libc_r, there are also two system calls!
> 
> And all this with not one FPU operation in userland!

Yep.

> Contrast this to a kernel getsetcontext(2) call:
> 0: cpu_switch() to new process, fpu state is not loaded (lazy)
> [no fpu activity at all, so the fpu state is still sitting in the pcb]
> 1: user does swapcontext()
> 2: system call getsetcontext(SWAPCONTEXT, ucontext_t *ocp, ucontext_t *ncp)

We have a system call here, with the overhead that this entails.

> 3: kernel copies old registers into ocp
> 4: kernel copies fpu state from *pcb* into ocp

Won't the *PCB* state be invalid at this point in the case where there
*IS* FPU actitivity in the process?  I would think you would need to
copy *from* ocp into the PCB at this point.

> [kernel saves sigprocmask if told to via ocp flags, libc_r saves it itself]
> 5: kernel copies new registers from ncp
> 6: kernel copies new fpu state from ncp into *pcb*

See above.  Isn't this backwards?

> [kernel restores sigprocmask if told to via ncp flags, libc_r saves it itself]
> [return to user in new context]

> [no fpu activity at all, so the fpu state is still sitting in the pcb]
> 7: cpu_switch notices the fpu state is still lazily sitting in the pcb
> 
> This time through we dont waste one kernel trap and four fpu load/save
> contexts 

Actually, we still have the *pcb* load/saves, which are essentially the
same as the four fpu load/save contexts, no?

> whether we're in libc_r or not.



Nate

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?15423.38729.468367.222867>