Date: Fri, 26 Dec 2014 12:54:40 -0500 (EST) From: Daniel Eischen <deischen@freebsd.org> To: Konstantin Belousov <kostikbel@gmail.com> Cc: threads@freebsd.org, arch@freebsd.org Subject: Re: Fixing dlopen("libpthread.so") Message-ID: <Pine.GSO.4.64.1412261249160.22867@sea.ntplx.net> In-Reply-To: <20141226165337.GJ1754@kib.kiev.ua> References: <20141226165337.GJ1754@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 26 Dec 2014, Konstantin Belousov wrote: > [Long] > > It is somewhat well-known that our libthr.so cannot be loaded > dynamically into the process. Or rather, it can be, but the > consequences are catastrophic. We recommend to link any program which > may load modules, explicitely with -lpthread; the known workaround is > to do LD_PRELOAD=libthr.so.3 for binaries which were not. I > implemented support for ld -z nodlopen some time ago, but attempt to > mark libthr.so as non-loadable caused extreme roar. > > A common opinion is that the proper way to fix the problem is > to merge the actual code from libthr into libc, leaving libthr as the > filter to preserve the current ABI. Unfortunately, there are some > non-trivial and undesirable consequences of doing this. > > First, all pthread mutexes (and other kind of locks) would become > fully initialized and used even for single-threaded programs, at least > I do not see a way to work around this. Right now, libc shims for > pthread_mutex_init() and pthread_mutex_lock(3) are nop. After the > merge, init needs to allocate memory and lock/unlock operations, > although uncontested, will start costing one atomic each. In > particular, malloc(3) and stdio(3) are affected. > > Another very delicate issue is introducing unwanted cancellation > points into libc functions after libthr wrappers become mandatory. > This is fixable, but requires lot of mundane work and probably a long > time to find missed places (i.e. bugs). > > There are probably more problems, and this brings an obvious > alternative: fix the issues which make dlopen("libthr.so") so > destructive. > > One known show-stopper is the broken errno after the load. The libthr > provides the interposer for the errno and all cancellable functions > from libc. If any interposed symbols have been resolved before the > libthr.so was loaded, or non-lazy binding mode is requested, the > bindings cannot be undonde. In particular, references to __error(), > which implements errno, are bound to return locate of the main thread > errno variable. Similarly, code referencing cancellable functions > still gets the uncancellable libc implementations of them. > > Another issue is the recursion between malloc(3) and mutex_init(). > The statically initialized pthread_mutex_t needs some further > initialization before first use. Jemalloc calls pthread_mutex_init(3) > for internally-used mutexes, which is nop stub from libc until libthr > is loaded. After the load, first use of any mutex by malloc(3) leads > to the thr_mutex.c initialization code, which needs calloc(3). This > immediately leads to hang due to recursion on some internal libthr > umtx. Making the lock recursive does not solve the problem, which is > the infinite mutual recursion between malloc and pthread_mutex_lock() > for uninitialized malloc mutex. > > Yet another issue is the signal handlers. The libthr routes signal > delivery through its internal signal handler, to avoid interrupting > critical sections. Any signal handler installed prior to libthr is > loaded misses the wrapper, potentially breaking cancellation and > critical sections. > > Proposed patch does the following: > > - Remove libthr interposers of the libc functions, including > __error(). Instead, functions calls are indirected through the > interposing table, similar to how pthread stubs in libc are already > done. Libc by default points either to syscall trampolines or to > existing libc implementations. On libthr load, it rewrites the > pointers to the cancellable implementations already in libthr. > > - Postpone the malloc(3) internal mutexes initialization until libthr > is loaded. > > - Reinstall signal handlers with wrapper on libthr load. > > The signal handler reinstallation on libthr initialization is only > needed when libthr.so is dlopened. Performing 128*2 sigaction(2) > calls on the startup of the binary which is linked to libthr, and thus > libthr is guaranteed to install proper sighandler wrappers, is huge > waste. So, I perform the hand-over of signal handlers only for the > dlopen-ed libthr, which now needs to detect loading at startup > vs. dlopen. I was unable to distinguish the cases using existing > facilities, so new private rtld interface is implemented, > _rtld_is_dlopened(), to query the way library was brought into the > process address space. > > Without some special measures, static binaries would pull in the whole > set of the interposed syscalls due to references from the > interposition table. To fix it, the references are made weak. Also, > to not pull in the pthread stubs, the interposition table is separate > from pthreads stubs indirection table. > > The patch is available at > https://www.kib.kiev.ua/kib/libthr_dlopen.1.patch . > Among other things, I tested it with the program illustrating the > issues https://www.kib.kiev.ua/kib/threaded_errno.c . > Note that you must use matching versions of rtld, libc and libthr. > Using old ld-elf.so.1 or old libc.so.7 with new libthr.so.3 will > break the system. > > Work was sponsored by The FreeBSD Foundation. I took a once-over look at the patch and it looks good. I never liked the intimacy between malloc and libpthread, but there's nothing we can currently do about that until mutexes become real objects instead of pointers. -- DE
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.GSO.4.64.1412261249160.22867>