Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 16 Jun 2003 17:54:47 -0400 (EDT)
From:      Andy Ritger <ARitger@nvidia.com>
To:        Daniel Eischen <eischen@pcnet.com>
Cc:        Gareth Hughes <gareth@nvidia.com>
Subject:   Re: NVIDIA and TLS
Message-ID:  <Pine.LNX.4.44.0306161744510.4675-100000@stravinsky.nvidia.com>
In-Reply-To: <Pine.GSO.4.10.10306161556500.19940-100000@pcnet5.pcnet.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On Mon, 16 Jun 2003, Daniel Eischen wrote:

> On Sat, 14 Jun 2003, Andy Ritger wrote:
> > 
> > I'd like to add a few comments to what has recently been said here
> > and on freebsd-current to clarify what the %gs register is used
> > for by the NVIDIA driver and what ideally would be done to allow
> > the NVIDIA driver to coexist with FreeBSD threading implementations.
> > 
> > The NVIDIA driver does not need %gs to maintain internal state in the
> > kernel; rather, it uses it to maintain fast thread local data for the
> > OpenGL implementation, where it is critically important to have fast
> > (single instruction) access to such data.
> 
> Take 2.  After thinking a bit more...
> 
> I guess OpenGL (or OpenGL implementation interfaces to the
> NVIDIA driver) doesn't have thread-safe interfaces.  Instead
> of trying to change the interface of the OS, why not change
> (or add) OpenGL or NVIDIA driver interfaces so that there
> is not a need for TLS?

Thanks, Dan.

Sorry for the slow response.  I had hoped to put some tests together
to have some hard numbers to backup my argument.  Unfortunately,
I had to put those tests on hold.

The issue is really that each thread has its own rendering context,
and an OpenGL implementation needs fast access to that thread-local
rendering context data.

OpenGL is a state machine.  Applications call commands like:

    glColor3f(1.0, 0.0, 0.0);  /* set the current color */
    glTexCoord2f(1.0, 1.0);    /* set the current texture coordinate */
    glVertex3f(0.0, 0.0, 0.0); /* draw a vertex, using the current
                                * accumulated state in the rendering
                                * context
                                */
 
Also, the affect that an OpenGL command has may vary, based on
previously accumulated state, or different modes of operation that
have been enabled.  A common implementation technique to deal 
with this is to use dispatch tables that get "plugged in" based
on the mode of operation.  Because modes of operation are enabled
by a thread for the rendering context that it is currently using,
these dispatch tables must be thread specific.
 
Consider this example: 
 
    void glFoo(GLint bar) 
    { 
        gl_dispatch_t *dispatch = GET_CURRENT_DISPATCH();
 
        dispatch->Foo(bar); 
    } 
 
where the current dispatch table is fetched from thread-local
storage.  Having to perform a function call to do this lookup can
severely impact performance, particularly when the function that will
be executed after the lookup is less than ten instructions long.
OpenGL's immediate mode rendering API falls into this category,
which is used heavily in workstation applications and benchmarks
like Viewperf.

To give you an idea of how this dispatch mechanism can be
implemented, here is how it could be implemented on Linux using
the new TLS implementation:

    glFoo:
        mov %gs:__gl_dispatch@ntpoff, %eax jmp *__foo_offset(%eax)

Here, we have a __thread variable "__gl_dispatch" which holds the
current thread's dispatch table pointer.  This is fetched using
the Local Exec TLS access model (a single instruction per access),
and the required function pointer is jumped through to execute the
correct backend function.

I think adding new OpenGL interfaces to eliminate the need for TLS
is out of the question: this would require source code changes to
applications that already work fine on other operating systems.
The TLS problem has been solved numerous times already on x86; it's
hard to make the case that it can't be solved this time without
changing the OpenGL API.

I can appreciate that you might not currently have the resources to
pursue the ELF TLS mechanism that the glibc folks have implemented.
Hopefully, this might be something to pursue in the future.  While I
have very little bandwidth to spare, I'd be willing to work with
anyone else interested to investigate further the idea of ELF TLS
on FreeBSD.

So from an OpenGL point of view, here are several alternatives that
I see for atleast the near term:

    - make NVIDIA's OpenGL implementation not thread-safe (just
      use global data rather that thread-local data)

    - accept the performance hit of using pthread_getspecific()
      on FreeBSD.  From talking to other OpenGL engineers,
      conservative estimates of the performance impact on
      applications like viewperf range from 10% - 15%.  I'd like
      to quantify that, but certainly there will be a performance
      penalty.

both of these options are regressions from the current driver.
Another alternative, which is admittedly hacky, would be for the
userland KSE implementation to reserve a chunk of data (say, 16
words) at a fixed offset in the thread environment block for an
OpenGL implementation to use.

Thanks,
- Andy
 
> -- 
> Dan Eischen
> 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.LNX.4.44.0306161744510.4675-100000>