Date: Mon, 16 Jun 2003 17:54:47 -0400 (EDT) From: Andy Ritger <ARitger@nvidia.com> To: Daniel Eischen <eischen@pcnet.com> Cc: Gareth Hughes <gareth@nvidia.com> Subject: Re: NVIDIA and TLS Message-ID: <Pine.LNX.4.44.0306161744510.4675-100000@stravinsky.nvidia.com> In-Reply-To: <Pine.GSO.4.10.10306161556500.19940-100000@pcnet5.pcnet.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 16 Jun 2003, Daniel Eischen wrote: > On Sat, 14 Jun 2003, Andy Ritger wrote: > > > > I'd like to add a few comments to what has recently been said here > > and on freebsd-current to clarify what the %gs register is used > > for by the NVIDIA driver and what ideally would be done to allow > > the NVIDIA driver to coexist with FreeBSD threading implementations. > > > > The NVIDIA driver does not need %gs to maintain internal state in the > > kernel; rather, it uses it to maintain fast thread local data for the > > OpenGL implementation, where it is critically important to have fast > > (single instruction) access to such data. > > Take 2. After thinking a bit more... > > I guess OpenGL (or OpenGL implementation interfaces to the > NVIDIA driver) doesn't have thread-safe interfaces. Instead > of trying to change the interface of the OS, why not change > (or add) OpenGL or NVIDIA driver interfaces so that there > is not a need for TLS? Thanks, Dan. Sorry for the slow response. I had hoped to put some tests together to have some hard numbers to backup my argument. Unfortunately, I had to put those tests on hold. The issue is really that each thread has its own rendering context, and an OpenGL implementation needs fast access to that thread-local rendering context data. OpenGL is a state machine. Applications call commands like: glColor3f(1.0, 0.0, 0.0); /* set the current color */ glTexCoord2f(1.0, 1.0); /* set the current texture coordinate */ glVertex3f(0.0, 0.0, 0.0); /* draw a vertex, using the current * accumulated state in the rendering * context */ Also, the affect that an OpenGL command has may vary, based on previously accumulated state, or different modes of operation that have been enabled. A common implementation technique to deal with this is to use dispatch tables that get "plugged in" based on the mode of operation. Because modes of operation are enabled by a thread for the rendering context that it is currently using, these dispatch tables must be thread specific. Consider this example: void glFoo(GLint bar) { gl_dispatch_t *dispatch = GET_CURRENT_DISPATCH(); dispatch->Foo(bar); } where the current dispatch table is fetched from thread-local storage. Having to perform a function call to do this lookup can severely impact performance, particularly when the function that will be executed after the lookup is less than ten instructions long. OpenGL's immediate mode rendering API falls into this category, which is used heavily in workstation applications and benchmarks like Viewperf. To give you an idea of how this dispatch mechanism can be implemented, here is how it could be implemented on Linux using the new TLS implementation: glFoo: mov %gs:__gl_dispatch@ntpoff, %eax jmp *__foo_offset(%eax) Here, we have a __thread variable "__gl_dispatch" which holds the current thread's dispatch table pointer. This is fetched using the Local Exec TLS access model (a single instruction per access), and the required function pointer is jumped through to execute the correct backend function. I think adding new OpenGL interfaces to eliminate the need for TLS is out of the question: this would require source code changes to applications that already work fine on other operating systems. The TLS problem has been solved numerous times already on x86; it's hard to make the case that it can't be solved this time without changing the OpenGL API. I can appreciate that you might not currently have the resources to pursue the ELF TLS mechanism that the glibc folks have implemented. Hopefully, this might be something to pursue in the future. While I have very little bandwidth to spare, I'd be willing to work with anyone else interested to investigate further the idea of ELF TLS on FreeBSD. So from an OpenGL point of view, here are several alternatives that I see for atleast the near term: - make NVIDIA's OpenGL implementation not thread-safe (just use global data rather that thread-local data) - accept the performance hit of using pthread_getspecific() on FreeBSD. From talking to other OpenGL engineers, conservative estimates of the performance impact on applications like viewperf range from 10% - 15%. I'd like to quantify that, but certainly there will be a performance penalty. both of these options are regressions from the current driver. Another alternative, which is admittedly hacky, would be for the userland KSE implementation to reserve a chunk of data (say, 16 words) at a fixed offset in the thread environment block for an OpenGL implementation to use. Thanks, - Andy > -- > Dan Eischen >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.LNX.4.44.0306161744510.4675-100000>