From owner-freebsd-threads@FreeBSD.ORG Thu Aug 28 21:14:20 2008 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 64722106567B for ; Thu, 28 Aug 2008 21:14:20 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from hosted.kievnet.com (hosted.kievnet.com [193.138.144.10]) by mx1.freebsd.org (Postfix) with ESMTP id 1A4F78FC12 for ; Thu, 28 Aug 2008 21:14:20 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from localhost ([127.0.0.1] helo=edge.pp.kiev.ua) by hosted.kievnet.com with esmtpa (Exim 4.62) (envelope-from ) id 1KYo7M-00044R-6c; Thu, 28 Aug 2008 23:29:20 +0300 Message-ID: <48B70A98.5060501@icyb.net.ua> Date: Thu, 28 Aug 2008 23:29:12 +0300 From: Andriy Gapon User-Agent: Thunderbird 2.0.0.16 (X11/20080821) MIME-Version: 1.0 To: freebsd-threads@freebsd.org Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: Subject: mysterious hang in pthread_create X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Aug 2008 21:14:20 -0000 I've got quite a strange issue with only one particular threaded program. My system is 7-STABLE from around July 6 (rather old, I know). I, of course, use libthr as a thread library. I have plenty of threaded programs and all of them except one are working perfectly (firefox, thunderbird, KDE). The bad one is linuxdcpp (net-p2p/linuxdcpp, linuxdcpp-1.0.2). I built it with debug enabled and also rebuilt libthr with debug. It seems that the program hangs in the very first call to pthread_create. Here's a stack trace of the hanging program: #0 _thr_umtx_wait (mtx=0x838774c, id=0, timeout=0x0) at /system/src/lib/libthr/thread/thr_umtx.c:93 #1 0x28f9168c in _thr_rtld_rlock_acquire (lock=0x8387740) at /system/src/lib/libthr/thread/thr_rtld.c:129 #2 0x282a6a27 in dlopen () from /libexec/ld-elf.so.1 #3 0x282a491d in dladdr () from /libexec/ld-elf.so.1 #4 0x282a1469 in ?? () from /libexec/ld-elf.so.1 #5 0x289b3600 in ?? () #6 0x000000d8 in ?? () #7 0x000186d1 in ?? () #8 0x00000000 in ?? () #9 0x08303a00 in ?? () #10 0x00000246 in ?? () #11 0x289b3600 in ?? () #12 0x000000d8 in ?? () #13 0x28f94b5f in _tcb_ctor (thread=0x8303a00, initial=0) at /system/src/lib/libthr/arch/i386/i386/pthread_md.c:46 #14 0x28f94215 in _thr_alloc (curthread=0x8302100) at /system/src/lib/libthr/thread/thr_list.c:169 #15 0x28f8d22e in _pthread_create (thread=0x831cb90, attr=0x0, start_routine=0x8170ce0 , arg=0x831cb8c) at /system/src/lib/libthr/thread/thr_create.c:68 #16 0x08170bd8 in Thread::start (this=0x831cb8c) at client/Thread.cpp:41 #17 0x080abfb4 in HashManager::startup (this=0x831cb60) at HashManager.h:97 #18 0x0809f4d6 in startup (f=0x827a2c0 , p=0x0) at client/DCPlusPlus.cpp:82 #19 0x0827a571 in main (argc=1, argv=0xbfbfe844) at linux/wulfor.cc:61 I tracked all calls to functions of _thr_rtld_*lock* family and it seems that the lock in question gets acquired for writing before the above access. The stack: #0 _thr_rtld_wlock_acquire (lock=0x8387740) at /system/src/lib/libthr/thread/thr_rtld.c:144 #1 0x282a6dcc in _rtld_thread_init () from /libexec/ld-elf.so.1 #2 0x28f91af6 in _thr_rtld_init () at /system/src/lib/libthr/thread/thr_rtld.c:238 #3 0x28f938db in _thr_setthreaded (threaded=1) at /system/src/lib/libthr/thread/thr_kern.c:56 #4 0x28f8d208 in _pthread_create (thread=0x831cb90, attr=0x0, start_routine=0x8170ce0 , arg=0x831cb8c) at /system/src/lib/libthr/thread/thr_create.c:64 #5 0x08170bd8 in Thread::start (this=0x831cb8c) at client/Thread.cpp:41 #6 0x080abfb4 in HashManager::startup (this=0x831cb60) at HashManager.h:97 #7 0x0809f4d6 in startup (f=0x827a2c0 , p=0x0) at client/DCPlusPlus.cpp:82 #8 0x0827a571 in main (argc=1, argv=0xbfbfe844) at linux/wulfor.cc:61 It seems that for all other programs there is no such call as the above ( I mean wlock_acquire). I didn't have debug symbols in rtld when I executed this test, unfortunately. The problem is 100% reproducible, so I can get any additional debugging info. I wonder what can be so special about this program, what can be going wrong. I didn't quite get the logic with flags and masks in _rtld_thread_init (especially when lockinfo is still default, but the issue seems to be related to that. -- Andriy Gapon