Date: Sat, 12 Jan 2002 09:15:58 -0500 From: Dan Eischen <eischen@vigrid.com> To: Bruce Evans <bde@zeta.org.au> Cc: Peter Wemm <peter@wemm.org>, Alfred Perlstein <bright@mu.org>, Kelly Yancey <kbyanc@posi.net>, Nate Williams <nate@yogotech.com>, Terry Lambert <tlambert2@mindspring.com>, Archie Cobbs <archie@dellroad.org>, arch@FreeBSD.ORG Subject: Re: Request for review: getcontext, setcontext, etc Message-ID: <3C40451E.5AC30582@vigrid.com> References: <20020112162603.X4598-100000@gamplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Bruce Evans wrote: > > On Fri, 11 Jan 2002, Dan Eischen wrote: > > > Peter Wemm wrote: > > > switching out to another process, then we've wasted a kernel trap, a two > > > fpu state loads and two fpu state saves. > > > > You're assuming that getcontext() gets and saves the current FPU > > state. So far we are assuming that it doesn't have to, and swapcontext > > wouldn't have to either. swapcontext() would only have to load the > > FPU state if the context were gotten by being passed to a signal > > handler. [ And I want to fix the kernel so that it places the FPU > > state in the sigcontext/ucontext passed to the signal handler. ] > > The current getcontext/setcontext touch the FPU state even when they don't > preserve it (using fnstcw/fninit+fldcw, like setjmp/longjmp), so they > cause the same inefficiencies. Hmm. That sucks. But they are no different than setjmp/longjmp, and noone is complaining about them ;-) > > > Specifically: > > > 0: cpu_switch() to new process. fpu state not loaded (lazy) > > > [no fpu activity at all, so the fpu state is still sitting in the pcb] > > > 1: user does swapcontext() > > > [process does a sigprocmask(2) syscall when being used outside of libc_r] > > > 2: userland swapcontext blindly attempts to save fpu state > > > > Not true. > > True enough :-). It (the i386 version) blindly attemps to save either the > whole FPU state or just the control word. If this causes a trap to load Just the control word right now. > the state from the pcb, then efficiency gained from not saving the whole > state is almost irrelevant, since the trap overhead takes longer than > fnsave. > > > If getcontext (and therefore swapcontext) had to save the FPU state, then > > I would agree that a syscall would be better, at least in the case of a > > non-threaded application. And the first time I implemented get,set,swap > > context was with syscalls, so it's not like I'm that biased towards doing > > it in userspace :-) I just want fast context switches for the threads > > library, and so far I don't see the real need for syscalls anyways. > > For really fast context switches, I think we need to avoid both FPU > switching and ucontext_t-based interfaces. ucontext_t has mounds of > stuff in it that is only relevant for switching in signals handlers. > Even the limited part of it that is switched by getcontext/setcontext > is larger than the part switched by sigsetjmp/siglongjmp (much larger > in cycles if switching of the signal mask is not needed, since there > is no way to avoid switching it). Solaris seems to have fast traps (from <sys/trap.h>): #define ST_GETCC 0x20 #define ST_SETCC 0x21 #define ST_GETPSR 0x22 #define ST_SETPSR 0x23 #define ST_GETHRTIME 0x24 #define ST_GETHRVTIME 0x25 #define ST_GETHRESTIME 0x27 I wonder if ST_GETCC/ST_SETCC are get/set current context. A google search yields the following: http://groups.google.com/groups?q=solaris+fast+traps&hl=en&selm=3820421F.AA1E813%40rentec.com&rnum=1 Programs using gethrtime(3C) to do some fine-grained timing in their code expect a very fast system call to get the job done. gethrtime(3C) is one of a few fast trap system calls implemented in Solaris. This means that an invocation of gethrtime(3C) does not incur the normal overhead of typical system call processing. Rather, it generates a fast trap into the kernel, which reads the hardware TICK register value and returns. While many system calls may take microseconds to execute (non-I/O system calls, that is; I/O system calls will be throttled by the speed of the device they're reading or writing), gethrtime(3C) takes a few hundred nanoseconds on a 300 MHz UltraSPARC processor. It's about 1,000 times faster than a typical system call. Is this something that is limited to SPARC? -- Dan Eischen To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3C40451E.5AC30582>
