Date: Mon, 16 Jun 2003 23:34:25 -0700 From: Terry Lambert <tlambert2@mindspring.com> To: Alexander Kabaev <kabaev@mail.ru> Cc: Andy Ritger <ARitger@nvidia.com> Subject: Re: NVIDIA and TLS Message-ID: <3EEEB671.2172F5D8@mindspring.com> References: <2D32959E172B8F4D9B02F68266BE421401A6D7E6@mail-sc-3.nvidia.com> <20030616211835.10d0fba4.kabaev@mail.ru>
next in thread | previous in thread | raw e-mail | index | archive | help
Alexander Kabaev wrote: > Gareth Hughes <gareth@nvidia.com> wrote: > > If FreeBSD support ELF TLS and __thread variables in ANY form, our > > driver will use this support. If the best you can do is a function > > call per access, so be it. It doesn't sound like there are any other > > options, given the fact that you ship with three different thread > > libraries. > > Three different libraries can attempt to coordinate and reserve exactly > the same TLS segment size. Gareth's point is that there is a function call to abstract that, unless all approaches use the same mechanism. Right now, he's said that he can't use %fs, because he has a version of WINE that uses OpenGL to implement it's video, and the threads people have said that he can't use %gs because it points to a per-KSE value, not a per thread value, and there are potentially many threads associated with each KSE, so it's no good for TLS, only for KSELS. I've suggested that he use the compiler flags to reserve another *different* register (not %fs and not %gs) for his purpose; the compiler has explicit support for this. If he is wiling to burn a register in order to get this functionality, it should be one that's normally allocated for user programs to burn anyway. Julian has pointed out that the TLS data should probably be cached following a return from a context switch as a result of the user space threads scheduler, if there is one, scheduling the thread to run; so far, it seems that Gareth has mistaken what Julian meant by this (that the reload happen high up in the OpenGL code, using an explicit call, to do the register switch, so that it's available lower down; this would work with the %fs approach, without breaking WINE). In addition, it seems that Gareth's Linux approach is assuming that a thread which is involuntarily context switch will be run again, when the next quantum becomes available. This is an invalid assumption for libkse (though it's currently hacked, locally, by most people to keep non-strict POSIX applications happy), and could be an invalid assumption in Linux and libthr in FreeBSD, should the scheduler code change in the future to recalculate priority prior to giving the quantum to a particular thread. Add to this that the model that Gareth is suggesting as a defacto standard is one advocacted by a company that's sueing IBM over the Linux OS containing uniform code in 60 lines of its scheduler which might very well be related to this publication, for all we know, and the assumptions about implementation implicit in the standard, and there's a significant issue that needs to be resolve here, and I'm not just talking about the legal implications of implementing to a Caldera/SCO specification that they may feel contains their intellectual property. Minimally, we are likely talking a "first call" optimization, at a minimum, in order to set a global offset as valid relative to a register common to all three threads libraries in FreeBSD, and, potentially, two of them, if we *must* go indirect through %gs to get the TLS from the KSELS in %gs. Does that jive with what everyone else understands of thi mailing list thread so far? -- Terry
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3EEEB671.2172F5D8>