Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 26 Dec 2014 12:54:40 -0500 (EST)
From:      Daniel Eischen <deischen@freebsd.org>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        threads@freebsd.org, arch@freebsd.org
Subject:   Re: Fixing dlopen("libpthread.so")
Message-ID:  <Pine.GSO.4.64.1412261249160.22867@sea.ntplx.net>
In-Reply-To: <20141226165337.GJ1754@kib.kiev.ua>
References:  <20141226165337.GJ1754@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 26 Dec 2014, Konstantin Belousov wrote:

> [Long]
>
> It is somewhat well-known that our libthr.so cannot be loaded
> dynamically into the process.  Or rather, it can be, but the
> consequences are catastrophic.  We recommend to link any program which
> may load modules, explicitely with -lpthread; the known workaround is
> to do LD_PRELOAD=libthr.so.3 for binaries which were not.  I
> implemented support for ld -z nodlopen some time ago, but attempt to
> mark libthr.so as non-loadable caused extreme roar.
>
> A common opinion is that the proper way to fix the problem is
> to merge the actual code from libthr into libc, leaving libthr as the
> filter to preserve the current ABI.  Unfortunately, there are some
> non-trivial and undesirable consequences of doing this.
>
> First, all pthread mutexes (and other kind of locks) would become
> fully initialized and used even for single-threaded programs, at least
> I do not see a way to work around this.  Right now, libc shims for
> pthread_mutex_init() and pthread_mutex_lock(3) are nop.  After the
> merge, init needs to allocate memory and lock/unlock operations,
> although uncontested, will start costing one atomic each.  In
> particular, malloc(3) and stdio(3) are affected.
>
> Another very delicate issue is introducing unwanted cancellation
> points into libc functions after libthr wrappers become mandatory.
> This is fixable, but requires lot of mundane work and probably a long
> time to find missed places (i.e. bugs).
>
> There are probably more problems, and this brings an obvious
> alternative: fix the issues which make dlopen("libthr.so") so
> destructive.
>
> One known show-stopper is the broken errno after the load.  The libthr
> provides the interposer for the errno and all cancellable functions
> from libc.  If any interposed symbols have been resolved before the
> libthr.so was loaded, or non-lazy binding mode is requested, the
> bindings cannot be undonde.  In particular, references to __error(),
> which implements errno, are bound to return locate of the main thread
> errno variable.  Similarly, code referencing cancellable functions
> still gets the uncancellable libc implementations of them.
>
> Another issue is the recursion between malloc(3) and mutex_init().
> The statically initialized pthread_mutex_t needs some further
> initialization before first use.  Jemalloc calls pthread_mutex_init(3)
> for internally-used mutexes, which is nop stub from libc until libthr
> is loaded.  After the load, first use of any mutex by malloc(3) leads
> to the thr_mutex.c initialization code, which needs calloc(3).  This
> immediately leads to hang due to recursion on some internal libthr
> umtx.  Making the lock recursive does not solve the problem, which is
> the infinite mutual recursion between malloc and pthread_mutex_lock()
> for uninitialized malloc mutex.
>
> Yet another issue is the signal handlers.  The libthr routes signal
> delivery through its internal signal handler, to avoid interrupting
> critical sections.  Any signal handler installed prior to libthr is
> loaded misses the wrapper, potentially breaking cancellation and
> critical sections.
>
> Proposed patch does the following:
>
> - Remove libthr interposers of the libc functions, including
>  __error(). Instead, functions calls are indirected through the
>  interposing table, similar to how pthread stubs in libc are already
>  done.  Libc by default points either to syscall trampolines or to
>  existing libc implementations.  On libthr load, it rewrites the
>  pointers to the cancellable implementations already in libthr.
>
> - Postpone the malloc(3) internal mutexes initialization until libthr
>  is loaded.
>
> - Reinstall signal handlers with wrapper on libthr load.
>
> The signal handler reinstallation on libthr initialization is only
> needed when libthr.so is dlopened.  Performing 128*2 sigaction(2)
> calls on the startup of the binary which is linked to libthr, and thus
> libthr is guaranteed to install proper sighandler wrappers, is huge
> waste.  So, I perform the hand-over of signal handlers only for the
> dlopen-ed libthr, which now needs to detect loading at startup
> vs. dlopen.  I was unable to distinguish the cases using existing
> facilities, so new private rtld interface is implemented,
> _rtld_is_dlopened(), to query the way library was brought into the
> process address space.
>
> Without some special measures, static binaries would pull in the whole
> set of the interposed syscalls due to references from the
> interposition table.  To fix it, the references are made weak.  Also,
> to not pull in the pthread stubs, the interposition table is separate
> from pthreads stubs indirection table.
>
> The patch is available at
> https://www.kib.kiev.ua/kib/libthr_dlopen.1.patch .
> Among other things, I tested it with the program illustrating the
> issues https://www.kib.kiev.ua/kib/threaded_errno.c .
> Note that you must use matching versions of rtld, libc and libthr.
> Using old ld-elf.so.1 or old libc.so.7 with new libthr.so.3 will
> break the system.
>
> Work was sponsored by The FreeBSD Foundation.

I took a once-over look at the patch and it looks good.

I never liked the intimacy between malloc and libpthread, but
there's nothing we can currently do about that until mutexes
become real objects instead of pointers.

-- 
DE



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.GSO.4.64.1412261249160.22867>