Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 12 Dec 2015 15:47:15 +0100
From:      Tijl Coosemans <tijl@FreeBSD.org>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        freebsd-threads@FreeBSD.org
Subject:   Re: Nvidia libGL crash in libthr
Message-ID:  <20151212154715.0b3bb9e6@kalimero.tijl.coosemans.org>
In-Reply-To: <20151211175439.GJ82577@kib.kiev.ua>
References:  <20151211181809.29c64399@kalimero.tijl.coosemans.org> <20151211175439.GJ82577@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 11 Dec 2015 19:54:39 +0200 Konstantin Belousov <kostikbel@gmail.com> wrote:
> On Fri, Dec 11, 2015 at 06:18:09PM +0100, Tijl Coosemans wrote:
>> This is taken from https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=205149
>> 
>> /usr/local/lib/kde4/libexec/kwin_opengl_test (from kde-workspace package)
>> crashes in libthr when Nvidia libGL is installed:
>> 
>> #0  0x000000080697d201 in pthread_mutexattr_setkind_np () from /lib/libthr.so.3
>> #1  0x0000000801a6c9c7 in glXCreateNewContext () from /usr/local/lib/libGL.so.1
>> #2  0x0000000804bd958c in _nv021glcore () from /usr/local/lib/libnvidia-glcore.so.1
>> #3  0x0000000804f4821e in _nv015glcore () from /usr/local/lib/libnvidia-glcore.so.1
>> #4  0x0000000801a4cefb in glXCreateNewContext () from /usr/local/lib/libGL.so.1
>> #5  0x0000000801a4da0a in glXCreateNewContext () from /usr/local/lib/libGL.so.1
>> #6  0x0000000800605a9f in r_debug_state () from /libexec/ld-elf.so.1
>> #7  0x00000008006050ee in __tls_get_addr () from /libexec/ld-elf.so.1
>> #8  0x0000000800603439 in .text () from /libexec/ld-elf.so.1
>> #9  0x0000000000000000 in ?? ()
>> 
>> libthr is pulled in via kwin_opengl_test -> libXft -> libfontconfig ->
>> libthr.  Nothing else links to it.  Nvidia libGL seems to be using
>> dlopen(NULL,..) and then dlsym to look up pthread_* symbols.
>> 
>> The output of ldd kwin_opengl_test:
>> 
>> libSM.so.6 => /usr/local/lib/libSM.so.6 (0x800820000)
>> libICE.so.6 => /usr/local/lib/libICE.so.6 (0x800a27000)
>> libX11.so.6 => /usr/local/lib/libX11.so.6 (0x800c41000)
>> libXext.so.6 => /usr/local/lib/libXext.so.6 (0x800f80000)
>> libXft.so.2 => /usr/local/lib/libXft.so.2 (0x801191000)
>> libXau.so.6 => /usr/local/lib/libXau.so.6 (0x8013a6000)
>> libXdmcp.so.6 => /usr/local/lib/libXdmcp.so.6 (0x8015a9000)
>> libXpm.so.4 => /usr/local/lib/libXpm.so.4 (0x8017ae000)
>> libGL.so.1 => /usr/local/lib/libGL.so.1 (0x8019c1000)
>> libc++.so.1 => /usr/lib/libc++.so.1 (0x801cbd000)
>> libcxxrt.so.1 => /lib/libcxxrt.so.1 (0x801f7a000)
>> libm.so.5 => /lib/libm.so.5 (0x802197000)
>> libc.so.7 => /lib/libc.so.7 (0x8023c1000)
>> libxcb.so.1 => /usr/local/lib/libxcb.so.1 (0x80276c000)
>> librpcsvc.so.5 => /usr/lib/librpcsvc.so.5 (0x80298d000)
>> libfontconfig.so.1 => /usr/local/lib/libfontconfig.so.1 (0x802b96000)
>> libfreetype.so.6 => /usr/local/lib/libfreetype.so.6 (0x802dd6000)
>> libXrender.so.1 => /usr/local/lib/libXrender.so.1 (0x803076000)
>> libnvidia-tls.so.1 => /usr/local/lib/libnvidia-tls.so.1 (0x80327f000)
>> libnvidia-glcore.so.1 => /usr/local/lib/libnvidia-glcore.so.1 (0x803600000)
>> libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x80588c000)
>> libpthread-stubs.so.0 => /usr/local/lib/libpthread-stubs.so.0 (0x805a9a000)
>> libexpat.so.1 => /usr/local/lib/libexpat.so.1 (0x805c9b000)
>> libthr.so.3 => /lib/libthr.so.3 (0x805ec2000)
>> libz.so.6 => /lib/libz.so.6 (0x8060e6000)
>> libbz2.so.4 => /usr/lib/libbz2.so.4 (0x8062fc000)
>> libpng16.so.16 => /usr/local/lib/libpng16.so.16 (0x80650f000)
>> 
>> libthr appears after libc so it looks like dlsym returns libc symbols
>> except for pthread_mutexattr_setkind_np which doesn't exist in libc.
>> Nvidia libGL ends up calling libc pthread_mutexattr_init (no-op) and
>> then calls libthr pthread_mutexattr_setkind_np with an uninitialised
>> pthread_mutexattr_t and crashes.  
> It is more complicated, take a look at libc/gen/_pthread_stubs.c.
> The libc pthread_* stubs do redirect calls to the libthr after libthr
> is initialized.  The _thr_jtable in libc is overwritten by libthr, see
> the memcpy(_thr_jtable, ...) call in _libpthread_init().
> 
> BTW, the backtrace you demonstrated was obtained from libthr without
> debugging symbols, and it might be that pthread_mutexattr_setkind_np
> is happens to be closest defined dynamic symbol, while the problem
> is elsewere.

I did some of my own debugging now (see below).  It looks like libGL
looks up pthread_* symbols and initialises mutexes from _init() which is
called before libthr is initialised.  Technically it's wrong to call any
function from a library that hasn't been initialised yet, but I suppose
initialising mutexes is simple enough that it should be safe.

>> I think the problem is that libthr declares pthread_* symbols weak.
>> Shouldn't they be ordinary global symbols?  
> Our rtld treatment of the non-weak symbols as having higher prioriry
> over non-weak symbols in the dynamic resolution is the bug.  ELF
> standard specifies that a first symbol from namespace found in the
> resolution order, is the right target.

I'm not sure I understand what you are saying here.  Both libc and
libthr declare pthread_* symbols weak.  There are no non-weak symbols
involved right now.

What I'm proposing is that all __weak_reference(..., pthread_*) in
lib/libthr/ should be changed into __strong_reference.  Doing so for
pthread_mutexattr_init fixes the crash.  It makes sense that the libc
stubs are weak, but it's not immediately obvious to me why the libthr
implementation needs to be weak as well.  Why is that?



Reading symbols from /usr/local/lib/kde4/libexec/kwin_opengl_test...(no debugging symbols found)...done.
(gdb)  b pthread_mutexattr_init
Function "pthread_mutexattr_init" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (pthread_mutexattr_init) pending.
(gdb) r
Starting program: /usr/local/lib/kde4/libexec/kwin_opengl_test 
[Switching to LWP 100197]

Breakpoint 1, pthread_mutexattr_init_exp (p0=0x7fffffffe2b0)
    at /usr/home/tijl/freebsd/base/head/lib/libc/gen/_pthread_stubs.c:230
230	STUB_FUNC1(pthread_mutexattr_init, PJT_MUTEXATTR_INIT, int, void *)
(gdb) s
stub_zero ()
    at /usr/home/tijl/freebsd/base/head/lib/libc/gen/_pthread_stubs.c:282
282		return (0);
(gdb) stepi
0x000000080261cee7 in stub_zero ()
    at /usr/home/tijl/freebsd/base/head/lib/libc/gen/_pthread_stubs.c:282
282		return (0);
(gdb) 
0x000000080261d598 in pthread_mutexattr_init_exp (p0=0x7fffffffe2b0)
    at /usr/home/tijl/freebsd/base/head/lib/libc/gen/_pthread_stubs.c:230
230	STUB_FUNC1(pthread_mutexattr_init, PJT_MUTEXATTR_INIT, int, void *)
(gdb) 
0x000000080261d59c	230	STUB_FUNC1(pthread_mutexattr_init, PJT_MUTEXATTR_INIT, int, void *)
(gdb) 
0x000000080261d59d in pthread_mutexattr_init_exp (
    p0=<error reading variable: Cannot access memory at address 0xfffffffffffffffb>)
    at /usr/home/tijl/freebsd/base/head/lib/libc/gen/_pthread_stubs.c:230
230	STUB_FUNC1(pthread_mutexattr_init, PJT_MUTEXATTR_INIT, int, void *)
(gdb) 
0x0000000801a66876 in ?? () from /usr/local/lib/libGL.so.1
(gdb) 
0x0000000801a6687d in ?? () from /usr/local/lib/libGL.so.1
(gdb) 
0x0000000801a66882 in ?? () from /usr/local/lib/libGL.so.1
(gdb) 
0x0000000801a66885 in ?? () from /usr/local/lib/libGL.so.1
(gdb) 
_pthread_mutexattr_setkind_np (
    attr=<error reading variable: Cannot access memory at address 0xfffffffffffffffb>, 
    kind=<error reading variable: Cannot access memory at address 0xfffffffffffffff7>)
    at /usr/home/tijl/freebsd/base/head/lib/libthr/thread/thr_mutexattr.c:105
105	{
(gdb) n
107		if (attr == NULL || *attr == NULL) {
(gdb) p *attr
$1 = (pthread_mutexattr_t) 0x4
(gdb) n
111			(*attr)->m_type = kind;
(gdb) 

Program received signal SIGSEGV, Segmentation fault.
0x0000000805ed4d99 in _pthread_mutexattr_setkind_np (attr=0x7fffffffe2b0, 
    kind=2)
    at /usr/home/tijl/freebsd/base/head/lib/libthr/thread/thr_mutexattr.c:111
111			(*attr)->m_type = kind;
(gdb) bt
#0  0x0000000805ed4d99 in _pthread_mutexattr_setkind_np (attr=0x7fffffffe2b0, 
    kind=2)
    at /usr/home/tijl/freebsd/base/head/lib/libthr/thread/thr_mutexattr.c:111
#1  0x0000000801a66887 in ?? () from /usr/local/lib/libGL.so.1
#2  0x000000080471db33 in ?? () from /usr/local/lib/libnvidia-glcore.so.1
#3  0x0000000804958b6e in ?? () from /usr/local/lib/libnvidia-glcore.so.1
#4  0x0000000801a4afce in ?? () from /usr/local/lib/libGL.so.1
#5  0x0000000801a4b62f in ?? () from /usr/local/lib/libGL.so.1
#6  0x0000000800606c2e in objlist_call_init (list=0x7fffffffe8f0, 
    lockstate=0x7fffffffe888)
    at /usr/home/tijl/freebsd/base/head/libexec/rtld-elf/rtld.c:2442
#7  0x00000008006052eb in _rtld (sp=0x7fffffffeb68, exit_proc=0x7fffffffea40, 
    objp=0x7fffffffea48)
    at /usr/home/tijl/freebsd/base/head/libexec/rtld-elf/rtld.c:669
#8  0x00000008006034c9 in .rtld_start ()
    at /usr/home/tijl/freebsd/base/head/libexec/rtld-elf/amd64/rtld_start.S:39
#9  0x0000000000000000 in ?? ()
(gdb) 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20151212154715.0b3bb9e6>