Date: Mon, 16 Jun 2003 17:02:28 -0700 (PDT) From: Julian Elischer <julian@elischer.org> To: Gareth Hughes <gareth@nvidia.com> Cc: Andy Ritger <ARitger@nvidia.com> Subject: RE: NVIDIA and TLS Message-ID: <Pine.BSF.4.21.0306161655400.19977-100000@InterJet.elischer.org> In-Reply-To: <2D32959E172B8F4D9B02F68266BE421401A6D7E2@mail-sc-3.nvidia.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 16 Jun 2003, Gareth Hughes wrote: > On Mon, 16 Jun 2003, Julian Elischer wrote: > > > > I think that the problem is that the access method for TLS is dependent > > on which library is used. > > > > [snip] > > > > The trouble is that each of these would require a differnt mechanism to > > reach TLS and the compiler cannot know ahead of time which one to use. > > > > [snip] > > > > I may be wrong but I don't think it is a standard yet.. > > especailly for the reason that we see here.. > > It requires that the compiler know what threading library is in use. > > > > We could certainly implement efficient TLS code generation for each > > library, but which one would be compiled in when you compile a .o file > > that may be used with any library? > > Please read Ulrich Drepper's document, if you haven't done so already. > You'll see that the general case of __thread variable access involves > a function call to look up the variables address. There are > optimizations to this access model, that allows one of the other three > models to be used (ranging from a function call the first time a > __thread variable is accessed, down to a single instruction per > access). FreeBSD could trivially implement __thread variables with > the General Dynamic model (involving a function call per access). > Our driver uses the Local Exec model (single instruction per access) > because GNU libc has an optimization on x86 that allows shared > libraries to use this model, which is normally reserved for > applications. The key thing is that they're still __thread variables, > the access model depends on the compile time options used and what's > available at runtime. Please, I urge you to read Drepper's document > carefully. > I have read most of it already. What I'm saying is that we can and probably should implement TLS using the general model to provide TLS at "sane" speed, (e.g. 5 instructions) but that I don't think we can implement the "1 instruction" version that you are asking for without breaking the binary compatibility that we currently have between our 3 pthread libraries. We can currently switch libraries between 3 very different threads libraries without recompiling the app or any other libraries involved. in fact we have a config file to the loader that specifies which one to use at run time **Per application**. Without using an entrypoint (or maybe self modifying code) (*EEK!*) I don't see how we can do it and keep that *Very useful* functionality.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.21.0306161655400.19977-100000>