Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 12 Jan 2002 09:15:58 -0500
From:      Dan Eischen <eischen@vigrid.com>
To:        Bruce Evans <bde@zeta.org.au>
Cc:        Peter Wemm <peter@wemm.org>, Alfred Perlstein <bright@mu.org>, Kelly Yancey <kbyanc@posi.net>, Nate Williams <nate@yogotech.com>, Terry Lambert <tlambert2@mindspring.com>, Archie Cobbs <archie@dellroad.org>, arch@FreeBSD.ORG
Subject:   Re: Request for review: getcontext, setcontext, etc
Message-ID:  <3C40451E.5AC30582@vigrid.com>
References:  <20020112162603.X4598-100000@gamplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help

Bruce Evans wrote:
> 
> On Fri, 11 Jan 2002, Dan Eischen wrote:
> 
> > Peter Wemm wrote:
> > > switching out to another process, then we've wasted a kernel trap, a two
> > > fpu state loads and two fpu state saves.
> >
> > You're assuming that getcontext() gets and saves the current FPU
> > state.  So far we are assuming that it doesn't have to, and swapcontext
> > wouldn't have to either.  swapcontext() would only have to load the
> > FPU state if the context were gotten by being passed to a signal
> > handler.  [ And I want to fix the kernel so that it places the FPU
> > state in the sigcontext/ucontext passed to the signal handler. ]
> 
> The current getcontext/setcontext touch the FPU state even when they don't
> preserve it (using fnstcw/fninit+fldcw, like setjmp/longjmp), so they
> cause the same inefficiencies.

Hmm.  That sucks.  But they are no different than setjmp/longjmp, and noone
is complaining about them ;-)

> > > Specifically:
> > > 0: cpu_switch() to new process. fpu state not loaded (lazy)
> > > [no fpu activity at all, so the fpu state is still sitting in the pcb]
> > > 1: user does swapcontext()
> > > [process does a sigprocmask(2) syscall when being used outside of libc_r]
> > > 2: userland swapcontext blindly attempts to save fpu state
> >
> > Not true.
> 
> True enough :-).  It (the i386 version) blindly attemps to save either the
> whole FPU state or just the control word.  If this causes a trap to load

Just the control word right now.

> the state from the pcb, then efficiency gained from not saving the whole
> state is almost irrelevant, since the trap overhead takes longer than
> fnsave.
> 
> > If getcontext (and therefore swapcontext) had to save the FPU state, then
> > I would agree that a syscall would be better, at least in the case of a
> > non-threaded application.  And the first time I implemented get,set,swap
> > context was with syscalls, so it's not like I'm that biased towards doing
> > it in userspace :-)  I just want fast context switches for the threads
> > library, and so far I don't see the real need for syscalls anyways.
> 
> For really fast context switches, I think we need to avoid both FPU
> switching and ucontext_t-based interfaces.  ucontext_t has mounds of
> stuff in it that is only relevant for switching in signals handlers.
> Even the limited part of it that is switched by getcontext/setcontext
> is larger than the part switched by sigsetjmp/siglongjmp (much larger
> in cycles if switching of the signal mask is not needed, since there
> is no way to avoid switching it).

Solaris seems to have fast traps (from <sys/trap.h>):

#define ST_GETCC                0x20
#define ST_SETCC                0x21
#define ST_GETPSR               0x22
#define ST_SETPSR               0x23
#define ST_GETHRTIME            0x24
#define ST_GETHRVTIME           0x25
#define ST_GETHRESTIME          0x27

I wonder if ST_GETCC/ST_SETCC are get/set current context.

A google search yields the following:

  http://groups.google.com/groups?q=solaris+fast+traps&hl=en&selm=3820421F.AA1E813%40rentec.com&rnum=1

  Programs using gethrtime(3C) to do some fine-grained timing in their
  code expect a very fast system call to get the job done.
  gethrtime(3C) is one of a few fast trap system calls implemented in
  Solaris. This means that an invocation of gethrtime(3C) does not
  incur the normal overhead of typical system call processing. Rather,
  it generates a fast trap into the kernel, which reads the hardware
  TICK register value and returns. While many system calls may take
  microseconds to execute (non-I/O system calls, that is; I/O system
  calls will be throttled by the speed of the device they're reading or
  writing), gethrtime(3C) takes a few hundred nanoseconds on a 300 MHz
  UltraSPARC processor. It's about 1,000 times faster than a typical
  system call.

Is this something that is limited to SPARC?

-- 
Dan Eischen

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3C40451E.5AC30582>