From owner-freebsd-questions@FreeBSD.ORG Mon Oct 6 03:33:12 2008 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E68621065698 for ; Mon, 6 Oct 2008 03:33:11 +0000 (UTC) (envelope-from dale.hagglund@gmail.com) Received: from mail-gx0-f21.google.com (mail-gx0-f21.google.com [209.85.217.21]) by mx1.freebsd.org (Postfix) with ESMTP id 860878FC24 for ; Mon, 6 Oct 2008 03:33:11 +0000 (UTC) (envelope-from dale.hagglund@gmail.com) Received: by gxk14 with SMTP id 14so4386513gxk.19 for ; Sun, 05 Oct 2008 20:33:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:from:to:cc:subject:references :date:in-reply-to:message-id:user-agent:mime-version:content-type; bh=Xnfi+xmEjwMJ8BQVmKfBRWPAbkNc0hSQ+mTuX4rvqVM=; b=SHB2c64uPSLdT4ekLpocmj/88PyQyaH+aaGbE+BVyxVvHVrPHwp5XPGourGzRUG5SB XaoxaUeH0zwx0rBnYJoKmBRkb7gWmE9KyxfRIpLVsRT8hELnw4HRHDc1cO9D0LqTToL7 kMAvZW3OPtwpQS4cnUpDvYrh92uJCQfGNaTTY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version:content-type; b=eP7CcmqAwSZoYCKShkvj80h5IZ1MOD1HWe2IrdV2vaHk4R/opojsieVzwJehfd5X7Z x9w+GOxpi5SJV2kizF7ZFabPzszfZmYC281PyFr/oXETyKHiJGh7BGIHDuumE+UtRVAN waxOIJARAFmf0vLLdtP7HFYWnFKvyb8xN5Kj4= Received: by 10.150.58.5 with SMTP id g5mr6840504yba.27.1223263990677; Sun, 05 Oct 2008 20:33:10 -0700 (PDT) Received: from ponoka.ab.hsia.telus.net (d205-206-84-231.abhsia.telus.net [205.206.84.231]) by mx.google.com with ESMTPS id 5sm6269043ywd.8.2008.10.05.20.33.07 (version=TLSv1/SSLv3 cipher=RC4-MD5); Sun, 05 Oct 2008 20:33:09 -0700 (PDT) From: Dale Hagglund To: freebsd-questions@freebsd.org References: <86r66v6gsj.fsf@ponoka.ab.hsia.telus.net> <200810051546.28440.fbsd.questions@rachie.is-a-geek.net> <86bpxz58l9.fsf@ponoka.ab.hsia.telus.net> <200810052019.01920.fbsd.questions@rachie.is-a-geek.net> Date: Sun, 05 Oct 2008 21:33:03 -0600 In-Reply-To: <200810052019.01920.fbsd.questions@rachie.is-a-geek.net> (Mel's message of "Sun, 5 Oct 2008 20:19:01 +0200") Message-ID: <86skra4cuo.fsf@ponoka.ab.hsia.telus.net> User-Agent: Gnus/5.110008 (No Gnus v0.8) Emacs/22.1 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: db@freebsd.org, Mel Subject: Re: processes hanging in _umtx_op X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Oct 2008 03:33:12 -0000 [Mel, the last time I replied to your @rachie address, I got a bounce. I'm still including it here on the CC list. Should I remove it and just reply to you via this list? --rdh] Diane, Mel, thanks for your suggestions so far. Mel> If upgrading ports is a possible solution, then you have the Mel> fine task of finding out, which library in everything that's Mel> being loaded is *NOT* linked with libthr, cause a likely Mel> candidate would be two different threading libraries being Mel> used. I would start with ldd -a /path/to/python/wx.so and see Mel> if both libthr.so and libpthread.so (or maybe even libkse) show Mel> up. What I did was this: $ python -c "import wx"& which hangs. Then I did $ lsof -p $pid | grep '\.so' to get a list of open shared objects. The only matches for "thr" are /lib/libthr.so.3 /usr/local/lib/libgthread-2.0.so.0 There are no matches for "kse". Then I started doing $ lsof -p $pid | > grep '\.so' | > awk '{print $NF}' | > xargs -n 1 ldd -a | less When I looked closely at the many libthr.so.3 references, though, I saw something quite interesting. As far as I can tell, not all are loaded at the same address. This is quite confusing to me. $ lsof -p $pid | > grep '\.so' | > awk '{print $NF}' | > xargs -n 1 ldd -f '\t%o %p %x\n' -a | > awk 'NF==1 {prefix=$1; next} {print prefix, $0}' | > awk '$2 ~ /libthr/ { print $4 }' | > sort | > uniq -c | > sort -nr 22 0x28bc8000 7 0x2953f000 5 0x2945f000 5 0x29371000 4 0x29407000 4 0x293fa000 3 0x2934d000 2 0x28952000 2 0x2894b000 1 0x29a79000 1 0x2960d000 1 0x289fb000 1 0x28921000 1 0x28548000 1 0x281b6000 $ However, closer inspection shows that, confusing as it is, this behaviour is common to almost all the shared libraries loaded into the stuck python process. Indeed, only libc seems to have just one loaded address. Also, this pipeline is actually inspecting the results from many different runs of ldd on each .so, instead of looking at the state of the running process. A little more poking leads to the following result that is again confusing to me $ lsof -p 79117 | > grep '\.so' | > awk '{print $NF}' | > sort | uniq -c | sort -nr | > head 2 /usr/local/lib/python2.5/site-packages/wx-2.8-gtk2-ansi/wx/_core_.so 1 /usr/local/lib/libxml2.so.5 1 /usr/local/lib/libwx_gtk2_xrc-2.8.so.0.2.0 1 /usr/local/lib/libwx_gtk2_qa-2.8.so.0.2.0 1 /usr/local/lib/libwx_gtk2_html-2.8.so.0.2.0 1 /usr/local/lib/libwx_gtk2_core-2.8.so.0.2.0 1 /usr/local/lib/libwx_gtk2_aui-2.8.so.0.2.0 1 /usr/local/lib/libwx_gtk2_adv-2.8.so.0.2.0 1 /usr/local/lib/libwx_base_xml-2.8.so.0.2.0 1 /usr/local/lib/libwx_base_net-2.8.so.0.2.0 $ The python wx core library seem to have been opened twice, unlike every other shared object that the python process has opened. Anyway, I don't know what to make of these results. Also, they seem at least somewhat unlikely to be related to seeing the same hang in ooo3. Mel> Also inspect /etc/libmap.conf for entries you may have added in Mel> a not too recent past and forgot about. No such file on my system. Mel> Unfortunately, I see no obvious candidates in your package list (ie: Mel> compat-[456]x, *flash*). I had compat-5x installed and removed it, but the problem persisted. I still have compat-6x installed. So, the upshot is I still don't see a smoking gun anywhere, but I certainly see some things that are confusing, although that has no bearing on whether or not they're actually problems. If anything above inspires you with more questions, let me know and I can do more poking around. The next step, I guess, is to rebuild with ULE and/or try out 7.1 prerelease. Thanks again for your help so far. Dale.