Date: Fri, 11 Jan 2002 18:54:17 -0700 From: Nate Williams <nate@yogotech.com> To: Peter Wemm <peter@wemm.org> Cc: Alfred Perlstein <bright@mu.org>, Kelly Yancey <kbyanc@posi.net>, Nate Williams <nate@yogotech.com>, Terry Lambert <tlambert2@mindspring.com>, Daniel Eischen <eischen@pcnet1.pcnet.com>, Dan Eischen <eischen@vigrid.com>, Archie Cobbs <archie@dellroad.org>, arch@FreeBSD.ORG Subject: Re: Request for review: getcontext, setcontext, etc Message-ID: <15423.38729.468367.222867@caddis.yogotech.com> In-Reply-To: <20020112005212.5CB2038FF@overcee.netplex.com.au> References: <20020110135217.M7984@elvis.mu.org> <20020112005212.5CB2038FF@overcee.netplex.com.au>
next in thread | previous in thread | raw e-mail | index | archive | help
Peter Wemm writes: [ Thanks for shedding some light on the subject Peter! ] > The biggest problem on the x86 implementation is that once you touch the > fpu at all, you now own a fpu context forever. When we context switch > a process, we save its active FPU state if[it has an active one] into > the pcb. When we return to the process, we do *not* load the fpu state > into the FPU until the process touches it again. So, the 'lazy binding' is done whenever the process touches the FPU again, not at process context switch time? This is different than what I expected, but it makes sense since it means that we may not have to save/load the context for every context switch from that point forward (if the process no longer does any FPU operations). > For a userland application to do a swapcontext(), it would have to > look at the present fpu state (causing a kernel trap, which loads the > fpu state into the fpu), dump out the registers, switch contexts and > load the fpu state from the new context into the active fpu registers. > If the old context hadn't used the FPU and the new context doesn't > actually use it before switching out to another process, then we've > wasted a kernel trap, a two fpu state loads and two fpu state saves. If we are stupid, we waste two loads and two saves, saving all of the overhead of a kernel trap and such. It would interesting to measure this overhead vs. the overhead of making a kernel trap to check if the saves/loads are necessary. > Specifically: > 0: cpu_switch() to new process. fpu state not loaded (lazy) > [no fpu activity at all, so the fpu state is still sitting in the pcb] > 1: user does swapcontext() > [process does a sigprocmask(2) syscall when being used outside of libc_r] > 2: userland swapcontext blindly attempts to save fpu state Unless it does the FPU state load w/out the kernel's help, which all userland thread libraries have done (up til this point, perhaps bogusly). > 3: kernel traps, and loads fpu context from pcb into fpu registers Why is there a kernel trap here? Is it because we're doing FPU operations, and hence the state must be loaded? > 4: userland swapcontext blindly copys fpu registers to old ucontext_t > [process does a sigprocmask(2) syscall when being used outside of libc_r] > 5: userland swapcontext blindly copys new ucontext fpu state intp fpu regs > 6: new context is running... > [no more fpu activity until timeslice ends] > 7: cpu_switch copies the active fpu regs into the pcb > > So, for no actual fpu activity, we had one kernel trap (stage 3) I'm still not sure I completely understand why the kernel trap happens. Can you help out here? > one > fpu load context (stage 3), one fpu save context (stage 4), another fpu > load context (stage 5) and yet another fpu save context (stage 7). > And when being used outside of libc_r, there are also two system calls! > > And all this with not one FPU operation in userland! Yep. > Contrast this to a kernel getsetcontext(2) call: > 0: cpu_switch() to new process, fpu state is not loaded (lazy) > [no fpu activity at all, so the fpu state is still sitting in the pcb] > 1: user does swapcontext() > 2: system call getsetcontext(SWAPCONTEXT, ucontext_t *ocp, ucontext_t *ncp) We have a system call here, with the overhead that this entails. > 3: kernel copies old registers into ocp > 4: kernel copies fpu state from *pcb* into ocp Won't the *PCB* state be invalid at this point in the case where there *IS* FPU actitivity in the process? I would think you would need to copy *from* ocp into the PCB at this point. > [kernel saves sigprocmask if told to via ocp flags, libc_r saves it itself] > 5: kernel copies new registers from ncp > 6: kernel copies new fpu state from ncp into *pcb* See above. Isn't this backwards? > [kernel restores sigprocmask if told to via ncp flags, libc_r saves it itself] > [return to user in new context] > [no fpu activity at all, so the fpu state is still sitting in the pcb] > 7: cpu_switch notices the fpu state is still lazily sitting in the pcb > > This time through we dont waste one kernel trap and four fpu load/save > contexts Actually, we still have the *pcb* load/saves, which are essentially the same as the four fpu load/save contexts, no? > whether we're in libc_r or not. Nate To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?15423.38729.468367.222867>