Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 05 Oct 2008 21:33:03 -0600
From:      Dale Hagglund <dale.hagglund@gmail.com>
To:        freebsd-questions@freebsd.org
Cc:        db@freebsd.org, Mel <fbsd.questions@rachie.is-a-geek.net>
Subject:   Re: processes hanging in _umtx_op
Message-ID:  <86skra4cuo.fsf@ponoka.ab.hsia.telus.net>
In-Reply-To: <200810052019.01920.fbsd.questions@rachie.is-a-geek.net> (Mel's message of "Sun, 5 Oct 2008 20:19:01 %2B0200")
References:  <86r66v6gsj.fsf@ponoka.ab.hsia.telus.net> <200810051546.28440.fbsd.questions@rachie.is-a-geek.net> <86bpxz58l9.fsf@ponoka.ab.hsia.telus.net> <200810052019.01920.fbsd.questions@rachie.is-a-geek.net>

next in thread | previous in thread | raw e-mail | index | archive | help
[Mel, the last time I replied to your @rachie address, I got a bounce.
I'm still including it here on the CC list.  Should I remove it and just
reply to you via this list?  --rdh]

Diane, Mel, thanks for your suggestions so far.

    Mel> If upgrading ports is a possible solution, then you have the
    Mel> fine task of finding out, which library in everything that's
    Mel> being loaded is *NOT* linked with libthr, cause a likely
    Mel> candidate would be two different threading libraries being
    Mel> used.  I would start with ldd -a /path/to/python/wx.so and see
    Mel> if both libthr.so and libpthread.so (or maybe even libkse) show
    Mel> up.

What I did was this:

        $ python -c "import wx"&

which hangs.  Then I did

        $ lsof -p $pid | grep '\.so'

to get a list of open shared objects.  The only matches for "thr" are

        /lib/libthr.so.3
        /usr/local/lib/libgthread-2.0.so.0

There are no matches for "kse".  Then I started doing

        $ lsof -p $pid | 
        > grep '\.so' | 
        > awk '{print $NF}' | 
        > xargs -n 1 ldd -a | less

When I looked closely at the many libthr.so.3 references, though, I saw
something quite interesting.  As far as I can tell, not all are loaded
at the same address.  This is quite confusing to me.

        $ lsof -p $pid | 
        > grep '\.so' | 
        > awk '{print $NF}' | 
        > xargs -n 1 ldd -f '\t%o %p %x\n' -a | 
        > awk 'NF==1 {prefix=$1; next} {print prefix, $0}' | 
        > awk '$2 ~ /libthr/ { print $4 }' | 
        > sort |
        > uniq -c | 
        > sort -nr
          22 0x28bc8000
           7 0x2953f000
           5 0x2945f000
           5 0x29371000
           4 0x29407000
           4 0x293fa000
           3 0x2934d000
           2 0x28952000
           2 0x2894b000
           1 0x29a79000
           1 0x2960d000
           1 0x289fb000
           1 0x28921000
           1 0x28548000
           1 0x281b6000
        $

However, closer inspection shows that, confusing as it is, this
behaviour is common to almost all the shared libraries loaded into the
stuck python process.  Indeed, only libc seems to have just one loaded
address.

Also, this pipeline is actually inspecting the results from many
different runs of ldd on each .so, instead of looking at the state of
the running process.

A little more poking leads to the following result that is again
confusing to me

        $ lsof -p 79117 | 
        > grep '\.so' | 
        > awk '{print $NF}' | 
        > sort | uniq -c | sort -nr | 
        > head
           2 /usr/local/lib/python2.5/site-packages/wx-2.8-gtk2-ansi/wx/_core_.so
           1 /usr/local/lib/libxml2.so.5
           1 /usr/local/lib/libwx_gtk2_xrc-2.8.so.0.2.0
           1 /usr/local/lib/libwx_gtk2_qa-2.8.so.0.2.0
           1 /usr/local/lib/libwx_gtk2_html-2.8.so.0.2.0
           1 /usr/local/lib/libwx_gtk2_core-2.8.so.0.2.0
           1 /usr/local/lib/libwx_gtk2_aui-2.8.so.0.2.0
           1 /usr/local/lib/libwx_gtk2_adv-2.8.so.0.2.0
           1 /usr/local/lib/libwx_base_xml-2.8.so.0.2.0
           1 /usr/local/lib/libwx_base_net-2.8.so.0.2.0
        $

The python wx core library seem to have been opened twice, unlike every
other shared object that the python process has opened.

Anyway, I don't know what to make of these results.  Also, they seem at
least somewhat unlikely to be related to seeing the same hang in ooo3.

    Mel> Also inspect /etc/libmap.conf for entries you may have added in
    Mel> a not too recent past and forgot about.

No such file on my system.

    Mel> Unfortunately, I see no obvious candidates in your package list (ie: 
    Mel> compat-[456]x, *flash*).

I had compat-5x installed and removed it, but the problem persisted.  I
still have compat-6x installed.

So, the upshot is I still don't see a smoking gun anywhere, but I
certainly see some things that are confusing, although that has no
bearing on whether or not they're actually problems.

If anything above inspires you with more questions, let me know and I
can do more poking around.  The next step, I guess, is to rebuild with
ULE and/or try out 7.1 prerelease.

Thanks again for your help so far.

Dale.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?86skra4cuo.fsf>