From owner-freebsd-threads@freebsd.org Tue Dec 8 09:46:22 2015 Return-Path: Delivered-To: freebsd-threads@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8CA559D36D9 for ; Tue, 8 Dec 2015 09:46:22 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 7E6371379 for ; Tue, 8 Dec 2015 09:46:22 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id tB89kMEn009703 for ; Tue, 8 Dec 2015 09:46:22 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-threads@FreeBSD.org Subject: [Bug 200992] proccess won't die in thread_suspend_switch Date: Tue, 08 Dec 2015 09:46:22 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: threads X-Bugzilla-Version: 11.0-CURRENT X-Bugzilla-Keywords: patch X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: johan@300.nl X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-threads@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Dec 2015 09:46:22 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=200992 --- Comment #24 from johans --- I'm suspecting that the patch (patch 1) attached here doesn't fully fix the problem, but only greatly reduces when it happens. I've encountered two issues since yesterday on different machines: (1) This one is pretty straightforward related: I've had one puppet process get stuck in STOP state, unfortunately my colleague didn't to a procstat to see the exact trace but rather rebooted the machine. (2) This one I'm unsure: I'm now debugging a hang in unmount which traces to a wait in zfs: dp->dp_spaceavail_cv Running dtrace on txg group syncing shows that there is no dirty data left, or at least that it's below the max. 2015 Dec 8 10:39:21 : 0KB of 8MB used 2015 Dec 8 10:39:26 : 0KB of 8MB used 2015 Dec 8 10:39:31 : 0KB of 8MB used Wake-up should be done by 'dsl_pool_dirty_delta': if (dp->dp_dirty_total <= zfs_dirty_data_max) cv_signal(&dp->dp_spaceavail_cv); This condition has been clearly met. Having this bug in the back of my head it seemed this might be related. -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-threads@freebsd.org Sat Dec 12 14:48:28 2015 Return-Path: Delivered-To: freebsd-threads@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 82C23A14482 for ; Sat, 12 Dec 2015 14:48:28 +0000 (UTC) (envelope-from tijl@freebsd.org) Received: from mailrelay103.isp.belgacom.be (mailrelay103.isp.belgacom.be [195.238.20.130]) by mx1.freebsd.org (Postfix) with ESMTP id D5D861C0A for ; Sat, 12 Dec 2015 14:48:27 +0000 (UTC) (envelope-from tijl@freebsd.org) X-Belgacom-Dynamic: yes X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2CTAgDCMmxW/zhosVtegzpTbr02AQ2BYyGFbQKBIzkUAQEBAQEBAYEKhDQBAQEDAScTHCMQCw4KCSUPEhgeBhOIGgMKDAm5IQ2EPQEBAQEBAQQBAQEBAQEZBItTglOGbQWWdoU1gnGDJoFvZpRch1sfAQFCghEdgVc9NAGFGAEBAQ Received: from 56.104-177-91.adsl-dyn.isp.belgacom.be (HELO kalimero.tijl.coosemans.org) ([91.177.104.56]) by relay.skynet.be with ESMTP; 12 Dec 2015 15:47:17 +0100 Received: from kalimero.tijl.coosemans.org (kalimero.tijl.coosemans.org [127.0.0.1]) by kalimero.tijl.coosemans.org (8.15.2/8.15.2) with ESMTP id tBCElFVb002592; Sat, 12 Dec 2015 15:47:17 +0100 (CET) (envelope-from tijl@FreeBSD.org) Date: Sat, 12 Dec 2015 15:47:15 +0100 From: Tijl Coosemans To: Konstantin Belousov Cc: freebsd-threads@FreeBSD.org Subject: Re: Nvidia libGL crash in libthr Message-ID: <20151212154715.0b3bb9e6@kalimero.tijl.coosemans.org> In-Reply-To: <20151211175439.GJ82577@kib.kiev.ua> References: <20151211181809.29c64399@kalimero.tijl.coosemans.org> <20151211175439.GJ82577@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Dec 2015 14:48:28 -0000 On Fri, 11 Dec 2015 19:54:39 +0200 Konstantin Belousov wrote: > On Fri, Dec 11, 2015 at 06:18:09PM +0100, Tijl Coosemans wrote: >> This is taken from https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=205149 >> >> /usr/local/lib/kde4/libexec/kwin_opengl_test (from kde-workspace package) >> crashes in libthr when Nvidia libGL is installed: >> >> #0 0x000000080697d201 in pthread_mutexattr_setkind_np () from /lib/libthr.so.3 >> #1 0x0000000801a6c9c7 in glXCreateNewContext () from /usr/local/lib/libGL.so.1 >> #2 0x0000000804bd958c in _nv021glcore () from /usr/local/lib/libnvidia-glcore.so.1 >> #3 0x0000000804f4821e in _nv015glcore () from /usr/local/lib/libnvidia-glcore.so.1 >> #4 0x0000000801a4cefb in glXCreateNewContext () from /usr/local/lib/libGL.so.1 >> #5 0x0000000801a4da0a in glXCreateNewContext () from /usr/local/lib/libGL.so.1 >> #6 0x0000000800605a9f in r_debug_state () from /libexec/ld-elf.so.1 >> #7 0x00000008006050ee in __tls_get_addr () from /libexec/ld-elf.so.1 >> #8 0x0000000800603439 in .text () from /libexec/ld-elf.so.1 >> #9 0x0000000000000000 in ?? () >> >> libthr is pulled in via kwin_opengl_test -> libXft -> libfontconfig -> >> libthr. Nothing else links to it. Nvidia libGL seems to be using >> dlopen(NULL,..) and then dlsym to look up pthread_* symbols. >> >> The output of ldd kwin_opengl_test: >> >> libSM.so.6 => /usr/local/lib/libSM.so.6 (0x800820000) >> libICE.so.6 => /usr/local/lib/libICE.so.6 (0x800a27000) >> libX11.so.6 => /usr/local/lib/libX11.so.6 (0x800c41000) >> libXext.so.6 => /usr/local/lib/libXext.so.6 (0x800f80000) >> libXft.so.2 => /usr/local/lib/libXft.so.2 (0x801191000) >> libXau.so.6 => /usr/local/lib/libXau.so.6 (0x8013a6000) >> libXdmcp.so.6 => /usr/local/lib/libXdmcp.so.6 (0x8015a9000) >> libXpm.so.4 => /usr/local/lib/libXpm.so.4 (0x8017ae000) >> libGL.so.1 => /usr/local/lib/libGL.so.1 (0x8019c1000) >> libc++.so.1 => /usr/lib/libc++.so.1 (0x801cbd000) >> libcxxrt.so.1 => /lib/libcxxrt.so.1 (0x801f7a000) >> libm.so.5 => /lib/libm.so.5 (0x802197000) >> libc.so.7 => /lib/libc.so.7 (0x8023c1000) >> libxcb.so.1 => /usr/local/lib/libxcb.so.1 (0x80276c000) >> librpcsvc.so.5 => /usr/lib/librpcsvc.so.5 (0x80298d000) >> libfontconfig.so.1 => /usr/local/lib/libfontconfig.so.1 (0x802b96000) >> libfreetype.so.6 => /usr/local/lib/libfreetype.so.6 (0x802dd6000) >> libXrender.so.1 => /usr/local/lib/libXrender.so.1 (0x803076000) >> libnvidia-tls.so.1 => /usr/local/lib/libnvidia-tls.so.1 (0x80327f000) >> libnvidia-glcore.so.1 => /usr/local/lib/libnvidia-glcore.so.1 (0x803600000) >> libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x80588c000) >> libpthread-stubs.so.0 => /usr/local/lib/libpthread-stubs.so.0 (0x805a9a000) >> libexpat.so.1 => /usr/local/lib/libexpat.so.1 (0x805c9b000) >> libthr.so.3 => /lib/libthr.so.3 (0x805ec2000) >> libz.so.6 => /lib/libz.so.6 (0x8060e6000) >> libbz2.so.4 => /usr/lib/libbz2.so.4 (0x8062fc000) >> libpng16.so.16 => /usr/local/lib/libpng16.so.16 (0x80650f000) >> >> libthr appears after libc so it looks like dlsym returns libc symbols >> except for pthread_mutexattr_setkind_np which doesn't exist in libc. >> Nvidia libGL ends up calling libc pthread_mutexattr_init (no-op) and >> then calls libthr pthread_mutexattr_setkind_np with an uninitialised >> pthread_mutexattr_t and crashes. > It is more complicated, take a look at libc/gen/_pthread_stubs.c. > The libc pthread_* stubs do redirect calls to the libthr after libthr > is initialized. The _thr_jtable in libc is overwritten by libthr, see > the memcpy(_thr_jtable, ...) call in _libpthread_init(). > > BTW, the backtrace you demonstrated was obtained from libthr without > debugging symbols, and it might be that pthread_mutexattr_setkind_np > is happens to be closest defined dynamic symbol, while the problem > is elsewere. I did some of my own debugging now (see below). It looks like libGL looks up pthread_* symbols and initialises mutexes from _init() which is called before libthr is initialised. Technically it's wrong to call any function from a library that hasn't been initialised yet, but I suppose initialising mutexes is simple enough that it should be safe. >> I think the problem is that libthr declares pthread_* symbols weak. >> Shouldn't they be ordinary global symbols? > Our rtld treatment of the non-weak symbols as having higher prioriry > over non-weak symbols in the dynamic resolution is the bug. ELF > standard specifies that a first symbol from namespace found in the > resolution order, is the right target. I'm not sure I understand what you are saying here. Both libc and libthr declare pthread_* symbols weak. There are no non-weak symbols involved right now. What I'm proposing is that all __weak_reference(..., pthread_*) in lib/libthr/ should be changed into __strong_reference. Doing so for pthread_mutexattr_init fixes the crash. It makes sense that the libc stubs are weak, but it's not immediately obvious to me why the libthr implementation needs to be weak as well. Why is that? Reading symbols from /usr/local/lib/kde4/libexec/kwin_opengl_test...(no debugging symbols found)...done. (gdb) b pthread_mutexattr_init Function "pthread_mutexattr_init" not defined. Make breakpoint pending on future shared library load? (y or [n]) y Breakpoint 1 (pthread_mutexattr_init) pending. (gdb) r Starting program: /usr/local/lib/kde4/libexec/kwin_opengl_test [Switching to LWP 100197] Breakpoint 1, pthread_mutexattr_init_exp (p0=0x7fffffffe2b0) at /usr/home/tijl/freebsd/base/head/lib/libc/gen/_pthread_stubs.c:230 230 STUB_FUNC1(pthread_mutexattr_init, PJT_MUTEXATTR_INIT, int, void *) (gdb) s stub_zero () at /usr/home/tijl/freebsd/base/head/lib/libc/gen/_pthread_stubs.c:282 282 return (0); (gdb) stepi 0x000000080261cee7 in stub_zero () at /usr/home/tijl/freebsd/base/head/lib/libc/gen/_pthread_stubs.c:282 282 return (0); (gdb) 0x000000080261d598 in pthread_mutexattr_init_exp (p0=0x7fffffffe2b0) at /usr/home/tijl/freebsd/base/head/lib/libc/gen/_pthread_stubs.c:230 230 STUB_FUNC1(pthread_mutexattr_init, PJT_MUTEXATTR_INIT, int, void *) (gdb) 0x000000080261d59c 230 STUB_FUNC1(pthread_mutexattr_init, PJT_MUTEXATTR_INIT, int, void *) (gdb) 0x000000080261d59d in pthread_mutexattr_init_exp ( p0=) at /usr/home/tijl/freebsd/base/head/lib/libc/gen/_pthread_stubs.c:230 230 STUB_FUNC1(pthread_mutexattr_init, PJT_MUTEXATTR_INIT, int, void *) (gdb) 0x0000000801a66876 in ?? () from /usr/local/lib/libGL.so.1 (gdb) 0x0000000801a6687d in ?? () from /usr/local/lib/libGL.so.1 (gdb) 0x0000000801a66882 in ?? () from /usr/local/lib/libGL.so.1 (gdb) 0x0000000801a66885 in ?? () from /usr/local/lib/libGL.so.1 (gdb) _pthread_mutexattr_setkind_np ( attr=, kind=) at /usr/home/tijl/freebsd/base/head/lib/libthr/thread/thr_mutexattr.c:105 105 { (gdb) n 107 if (attr == NULL || *attr == NULL) { (gdb) p *attr $1 = (pthread_mutexattr_t) 0x4 (gdb) n 111 (*attr)->m_type = kind; (gdb) Program received signal SIGSEGV, Segmentation fault. 0x0000000805ed4d99 in _pthread_mutexattr_setkind_np (attr=0x7fffffffe2b0, kind=2) at /usr/home/tijl/freebsd/base/head/lib/libthr/thread/thr_mutexattr.c:111 111 (*attr)->m_type = kind; (gdb) bt #0 0x0000000805ed4d99 in _pthread_mutexattr_setkind_np (attr=0x7fffffffe2b0, kind=2) at /usr/home/tijl/freebsd/base/head/lib/libthr/thread/thr_mutexattr.c:111 #1 0x0000000801a66887 in ?? () from /usr/local/lib/libGL.so.1 #2 0x000000080471db33 in ?? () from /usr/local/lib/libnvidia-glcore.so.1 #3 0x0000000804958b6e in ?? () from /usr/local/lib/libnvidia-glcore.so.1 #4 0x0000000801a4afce in ?? () from /usr/local/lib/libGL.so.1 #5 0x0000000801a4b62f in ?? () from /usr/local/lib/libGL.so.1 #6 0x0000000800606c2e in objlist_call_init (list=0x7fffffffe8f0, lockstate=0x7fffffffe888) at /usr/home/tijl/freebsd/base/head/libexec/rtld-elf/rtld.c:2442 #7 0x00000008006052eb in _rtld (sp=0x7fffffffeb68, exit_proc=0x7fffffffea40, objp=0x7fffffffea48) at /usr/home/tijl/freebsd/base/head/libexec/rtld-elf/rtld.c:669 #8 0x00000008006034c9 in .rtld_start () at /usr/home/tijl/freebsd/base/head/libexec/rtld-elf/amd64/rtld_start.S:39 #9 0x0000000000000000 in ?? () (gdb) From owner-freebsd-threads@freebsd.org Sat Dec 12 22:33:48 2015 Return-Path: Delivered-To: freebsd-threads@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E4506A148D8 for ; Sat, 12 Dec 2015 22:33:48 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 84373196F; Sat, 12 Dec 2015 22:33:45 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id tBCMXXo1053669 (version=TLSv1 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Sun, 13 Dec 2015 00:33:34 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua tBCMXXo1053669 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id tBCMXXpT053668; Sun, 13 Dec 2015 00:33:33 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sun, 13 Dec 2015 00:33:33 +0200 From: Konstantin Belousov To: Tijl Coosemans Cc: freebsd-threads@FreeBSD.org Subject: Re: Nvidia libGL crash in libthr Message-ID: <20151212223333.GO82577@kib.kiev.ua> References: <20151211181809.29c64399@kalimero.tijl.coosemans.org> <20151211175439.GJ82577@kib.kiev.ua> <20151212154715.0b3bb9e6@kalimero.tijl.coosemans.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20151212154715.0b3bb9e6@kalimero.tijl.coosemans.org> User-Agent: Mutt/1.5.24 (2015-08-30) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Dec 2015 22:33:49 -0000 On Sat, Dec 12, 2015 at 03:47:15PM +0100, Tijl Coosemans wrote: > On Fri, 11 Dec 2015 19:54:39 +0200 Konstantin Belousov wrote: > > On Fri, Dec 11, 2015 at 06:18:09PM +0100, Tijl Coosemans wrote: > >> This is taken from https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=205149 > >> > >> /usr/local/lib/kde4/libexec/kwin_opengl_test (from kde-workspace package) > >> crashes in libthr when Nvidia libGL is installed: > >> > >> #0 0x000000080697d201 in pthread_mutexattr_setkind_np () from /lib/libthr.so.3 > >> #1 0x0000000801a6c9c7 in glXCreateNewContext () from /usr/local/lib/libGL.so.1 > >> #2 0x0000000804bd958c in _nv021glcore () from /usr/local/lib/libnvidia-glcore.so.1 > >> #3 0x0000000804f4821e in _nv015glcore () from /usr/local/lib/libnvidia-glcore.so.1 > >> #4 0x0000000801a4cefb in glXCreateNewContext () from /usr/local/lib/libGL.so.1 > >> #5 0x0000000801a4da0a in glXCreateNewContext () from /usr/local/lib/libGL.so.1 > >> #6 0x0000000800605a9f in r_debug_state () from /libexec/ld-elf.so.1 > >> #7 0x00000008006050ee in __tls_get_addr () from /libexec/ld-elf.so.1 > >> #8 0x0000000800603439 in .text () from /libexec/ld-elf.so.1 > >> #9 0x0000000000000000 in ?? () > >> > >> libthr is pulled in via kwin_opengl_test -> libXft -> libfontconfig -> > >> libthr. Nothing else links to it. Nvidia libGL seems to be using > >> dlopen(NULL,..) and then dlsym to look up pthread_* symbols. > >> > >> The output of ldd kwin_opengl_test: > >> > >> libSM.so.6 => /usr/local/lib/libSM.so.6 (0x800820000) > >> libICE.so.6 => /usr/local/lib/libICE.so.6 (0x800a27000) > >> libX11.so.6 => /usr/local/lib/libX11.so.6 (0x800c41000) > >> libXext.so.6 => /usr/local/lib/libXext.so.6 (0x800f80000) > >> libXft.so.2 => /usr/local/lib/libXft.so.2 (0x801191000) > >> libXau.so.6 => /usr/local/lib/libXau.so.6 (0x8013a6000) > >> libXdmcp.so.6 => /usr/local/lib/libXdmcp.so.6 (0x8015a9000) > >> libXpm.so.4 => /usr/local/lib/libXpm.so.4 (0x8017ae000) > >> libGL.so.1 => /usr/local/lib/libGL.so.1 (0x8019c1000) > >> libc++.so.1 => /usr/lib/libc++.so.1 (0x801cbd000) > >> libcxxrt.so.1 => /lib/libcxxrt.so.1 (0x801f7a000) > >> libm.so.5 => /lib/libm.so.5 (0x802197000) > >> libc.so.7 => /lib/libc.so.7 (0x8023c1000) > >> libxcb.so.1 => /usr/local/lib/libxcb.so.1 (0x80276c000) > >> librpcsvc.so.5 => /usr/lib/librpcsvc.so.5 (0x80298d000) > >> libfontconfig.so.1 => /usr/local/lib/libfontconfig.so.1 (0x802b96000) > >> libfreetype.so.6 => /usr/local/lib/libfreetype.so.6 (0x802dd6000) > >> libXrender.so.1 => /usr/local/lib/libXrender.so.1 (0x803076000) > >> libnvidia-tls.so.1 => /usr/local/lib/libnvidia-tls.so.1 (0x80327f000) > >> libnvidia-glcore.so.1 => /usr/local/lib/libnvidia-glcore.so.1 (0x803600000) > >> libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x80588c000) > >> libpthread-stubs.so.0 => /usr/local/lib/libpthread-stubs.so.0 (0x805a9a000) > >> libexpat.so.1 => /usr/local/lib/libexpat.so.1 (0x805c9b000) > >> libthr.so.3 => /lib/libthr.so.3 (0x805ec2000) > >> libz.so.6 => /lib/libz.so.6 (0x8060e6000) > >> libbz2.so.4 => /usr/lib/libbz2.so.4 (0x8062fc000) > >> libpng16.so.16 => /usr/local/lib/libpng16.so.16 (0x80650f000) > >> > >> libthr appears after libc so it looks like dlsym returns libc symbols > >> except for pthread_mutexattr_setkind_np which doesn't exist in libc. > >> Nvidia libGL ends up calling libc pthread_mutexattr_init (no-op) and > >> then calls libthr pthread_mutexattr_setkind_np with an uninitialised > >> pthread_mutexattr_t and crashes. > > It is more complicated, take a look at libc/gen/_pthread_stubs.c. > > The libc pthread_* stubs do redirect calls to the libthr after libthr > > is initialized. The _thr_jtable in libc is overwritten by libthr, see > > the memcpy(_thr_jtable, ...) call in _libpthread_init(). > > > > BTW, the backtrace you demonstrated was obtained from libthr without > > debugging symbols, and it might be that pthread_mutexattr_setkind_np > > is happens to be closest defined dynamic symbol, while the problem > > is elsewere. > > I did some of my own debugging now (see below). It looks like libGL > looks up pthread_* symbols and initialises mutexes from _init() which is > called before libthr is initialised. Technically it's wrong to call any > function from a library that hasn't been initialised yet, but I suppose > initialising mutexes is simple enough that it should be safe. I think if this even works now, then only by chance. > > >> I think the problem is that libthr declares pthread_* symbols weak. > >> Shouldn't they be ordinary global symbols? > > Our rtld treatment of the non-weak symbols as having higher prioriry > > over non-weak symbols in the dynamic resolution is the bug. ELF > > standard specifies that a first symbol from namespace found in the > > resolution order, is the right target. > > I'm not sure I understand what you are saying here. Both libc and > libthr declare pthread_* symbols weak. There are no non-weak symbols > involved right now. Your proposal to fix the problem includes making one symbol non-weak and then depend on the non-weak symbols priority in the resolution over the weak symbols. I noted that the behaviour of our rtld that you proposed to utilize (or abuse) is actually a bug in our rtld. Fixing it is very hard due to both a lot of existing places in source tree which depend on the bug, and due to the ABI change it introduces for third-party binaries. > > What I'm proposing is that all __weak_reference(..., pthread_*) in > lib/libthr/ should be changed into __strong_reference. Doing so for > pthread_mutexattr_init fixes the crash. It makes sense that the libc > stubs are weak, but it's not immediately obvious to me why the libthr > implementation needs to be weak as well. Why is that? You should account for the history, this might be a consequence of the libc_r existence. Changing the symbols weakness might solve the immediate problem, but makes other changes (see above) harder. And, we still end up with the issue of calling function from the library which constructors did not executed yet ? Why does not NVidia libGL link to libthr ? This is a question to the NVidia, I understand. > > > > Reading symbols from /usr/local/lib/kde4/libexec/kwin_opengl_test...(no debugging symbols found)...done. > (gdb) b pthread_mutexattr_init > Function "pthread_mutexattr_init" not defined. > Make breakpoint pending on future shared library load? (y or [n]) y > Breakpoint 1 (pthread_mutexattr_init) pending. > (gdb) r > Starting program: /usr/local/lib/kde4/libexec/kwin_opengl_test > [Switching to LWP 100197] > > Breakpoint 1, pthread_mutexattr_init_exp (p0=0x7fffffffe2b0) > at /usr/home/tijl/freebsd/base/head/lib/libc/gen/_pthread_stubs.c:230 > 230 STUB_FUNC1(pthread_mutexattr_init, PJT_MUTEXATTR_INIT, int, void *) > (gdb) s > stub_zero () > at /usr/home/tijl/freebsd/base/head/lib/libc/gen/_pthread_stubs.c:282 > 282 return (0); > (gdb) stepi > 0x000000080261cee7 in stub_zero () > at /usr/home/tijl/freebsd/base/head/lib/libc/gen/_pthread_stubs.c:282 > 282 return (0); > (gdb) > 0x000000080261d598 in pthread_mutexattr_init_exp (p0=0x7fffffffe2b0) > at /usr/home/tijl/freebsd/base/head/lib/libc/gen/_pthread_stubs.c:230 > 230 STUB_FUNC1(pthread_mutexattr_init, PJT_MUTEXATTR_INIT, int, void *) > (gdb) > 0x000000080261d59c 230 STUB_FUNC1(pthread_mutexattr_init, PJT_MUTEXATTR_INIT, int, void *) > (gdb) > 0x000000080261d59d in pthread_mutexattr_init_exp ( > p0=) > at /usr/home/tijl/freebsd/base/head/lib/libc/gen/_pthread_stubs.c:230 > 230 STUB_FUNC1(pthread_mutexattr_init, PJT_MUTEXATTR_INIT, int, void *) > (gdb) > 0x0000000801a66876 in ?? () from /usr/local/lib/libGL.so.1 > (gdb) > 0x0000000801a6687d in ?? () from /usr/local/lib/libGL.so.1 > (gdb) > 0x0000000801a66882 in ?? () from /usr/local/lib/libGL.so.1 > (gdb) > 0x0000000801a66885 in ?? () from /usr/local/lib/libGL.so.1 > (gdb) > _pthread_mutexattr_setkind_np ( > attr=, > kind=) > at /usr/home/tijl/freebsd/base/head/lib/libthr/thread/thr_mutexattr.c:105 > 105 { > (gdb) n > 107 if (attr == NULL || *attr == NULL) { > (gdb) p *attr > $1 = (pthread_mutexattr_t) 0x4 > (gdb) n > 111 (*attr)->m_type = kind; > (gdb) > > Program received signal SIGSEGV, Segmentation fault. > 0x0000000805ed4d99 in _pthread_mutexattr_setkind_np (attr=0x7fffffffe2b0, > kind=2) > at /usr/home/tijl/freebsd/base/head/lib/libthr/thread/thr_mutexattr.c:111 > 111 (*attr)->m_type = kind; > (gdb) bt > #0 0x0000000805ed4d99 in _pthread_mutexattr_setkind_np (attr=0x7fffffffe2b0, > kind=2) > at /usr/home/tijl/freebsd/base/head/lib/libthr/thread/thr_mutexattr.c:111 > #1 0x0000000801a66887 in ?? () from /usr/local/lib/libGL.so.1 > #2 0x000000080471db33 in ?? () from /usr/local/lib/libnvidia-glcore.so.1 > #3 0x0000000804958b6e in ?? () from /usr/local/lib/libnvidia-glcore.so.1 > #4 0x0000000801a4afce in ?? () from /usr/local/lib/libGL.so.1 > #5 0x0000000801a4b62f in ?? () from /usr/local/lib/libGL.so.1 > #6 0x0000000800606c2e in objlist_call_init (list=0x7fffffffe8f0, > lockstate=0x7fffffffe888) > at /usr/home/tijl/freebsd/base/head/libexec/rtld-elf/rtld.c:2442 > #7 0x00000008006052eb in _rtld (sp=0x7fffffffeb68, exit_proc=0x7fffffffea40, > objp=0x7fffffffea48) > at /usr/home/tijl/freebsd/base/head/libexec/rtld-elf/rtld.c:669 > #8 0x00000008006034c9 in .rtld_start () > at /usr/home/tijl/freebsd/base/head/libexec/rtld-elf/amd64/rtld_start.S:39 > #9 0x0000000000000000 in ?? () > (gdb)