Date: Mon, 16 Jun 2003 16:34:44 -0700 (PDT) From: Julian Elischer <julian@elischer.org> To: Gareth Hughes <gareth@nvidia.com> Cc: Andy Ritger <ARitger@nvidia.com> Subject: RE: NVIDIA and TLS Message-ID: <Pine.BSF.4.21.0306161609550.19977-100000@InterJet.elischer.org> In-Reply-To: <2D32959E172B8F4D9B02F68266BE421401A6D7DB@mail-sc-3.nvidia.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 16 Jun 2003, Gareth Hughes wrote: > On Mon, 16 Jun 2003, Daniel Eischen wrote: > > > > Again, %gs isn't per-thread; it's per-KSE. Plus, we're reserving > > TLS for one vendor/library. What happens when someone else comes > > along and wants the same thing? I'd much rather see someone push > > for a new OpenGL spec with better interfaces/APIs. > I think that the problem is that the access method for TLS is dependent on which library is used. In the multiplexd thread library, %gs points to the current KSE (kernel Schedulable entity) (think virtual CPU), and THAT has a pointer to the current thread. with thousands of threads going in and out of runnablility each tick (without notifying the kernel) we don't want to keep changing %gs in userland as that's slow. (there are plenty of apps that have MANY threads). These threads run entirely in userland and control switches between them at an alarming rate.. Anything that slows down context switches has a bad effect on the speed that these programs (e.g. some java implementations and programs) run. In the 1:1 (or N:N) thread library the context times are considerably larger as there is kernel interaction on each and every context switch. Overhead from switching %gs in this library would probably be buried in the noise. It would probably be an acceptable solution. In the single-streamed pthreads library (libc_r) that is currently in use, %gs is no used so your code is not colliding, but then there isn't explicit support for __thread though (curthread->TLS) would be all that is required since it is not multithreaded from a real perspective. The trouble is that each of these would require a differnt mechanism to reach TLS and the compiler cannot know ahead of time which one to use. > I don't think there's a library out there that has the strict > performance requirements that OpenGL does. Of course, if FreeBSD > supported the ELF TLS standard. I may be wrong but I don't think it is a standard yet.. especailly for the reason that we see here.. It requires that the compiler know what threading library is in use. We could certainly implement efficient TLS code generation for each library, but which one would be compiled in when you compile a .o file that may be used with any library? > this point would be moot because > applications and libraries would automatically get fast > thread-local storage. If not, and another library really did need > the same kind of fast TLS access, what's wrong with just allocating > another static block after the libGL one? Your internal data > structures would work fine, libGL would work fine because you > haven't changed the location of its data block, and the new library > would access its data directly. The only problem with this scheme > is if you move the block, or change the way it is accessed, this > would break binary compatibility. A single library is not going to get It's own block in the system thread descriptor, but it makes great sense to allocate a pointer there for the system to support TLS in a uniform manner for many applications. if we place a pointer (actually a couple) there for the .tbss and .tdata segments then we can get to those segments quickly (though possibly requiring 2 instructions) (I'm not a x86 assembler guru) I wouldn't think of adding such for a single app or library, but support for ELF TLS is probably of high enough importance that it is worth while.. as I said though.. "which library gets to support it?" the one with one indirection? the one with 2 indirections or the one with no indirections? The trick is that because it is a pre-link-time thing, it cannot handle the case where there are more than one binary interface to the threads.. That is why it is not in the posix threading moddle and probably never really will be, except in a slow form. (BTW have you looked at the speed of function calls on modern PCs, I think you'll be surprised).
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.21.0306161609550.19977-100000>