Date: Mon, 21 May 2012 11:28:09 +0800 From: David Xu <listlog2011@gmail.com> To: davidxu@freebsd.org Cc: Konstantin Belousov <kostikbel@gmail.com>, Alberto Villa <avilla@freebsd.org>, Gustau P?rez i Querol <gperez@entel.upc.edu>, FreeBSD current <freebsd-current@freebsd.org> Subject: Re: RFC: jemalloc: qdbus sigsegv in malloc_init Message-ID: <4FB9B649.8020906@gmail.com> In-Reply-To: <4FB9AE7E.6090109@gmail.com> References: <4F9E9E06.4070004@entel.upc.edu> <4FB88925.4070008@gmail.com> <CAJp7RHaOkEzyfD5e6pLMSBxvCBYCn9BWv=9BWu0CYsQHzGyFdg@mail.gmail.com> <20120520172419.GQ2358@deviant.kiev.zoral.com.ua> <4FB9AE7E.6090109@gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2012/5/21 10:54, David Xu wrote: > On 2012/5/21 1:24, Konstantin Belousov wrote: >> On Sun, May 20, 2012 at 06:42:35PM +0200, Alberto Villa wrote: >>> On Sun, May 20, 2012 at 8:03 AM, David Xu<listlog2011@gmail.com> >>> wrote: >>>> qdbus segfaults on my machine too, I tracked it down, and found the >>>> problem >>>> is in QT, >>>> it deleted current_thread_data_key, but it still uses it in some >>>> cxa hooks, >>>> I applied the >>>> following patch, and it works fine. >>> Thanks for the analysis David! >>> >>>> I think the bug depends on linking order in QT library ? if the >>>> qthread_unix.cpp is linked >>>> as lastest module, the key will be deleted after all cxa hooks run, >>>> then it >>>> will be fine, >>>> otherwise, it would crash. >>> Is this really possible? >> No, I do not think it is possible. >> >> The only possibility for something weird happen is for >> atexit/__cxa_atexit >> functions to be registered from another atexit function, and then we >> indeed could call the newly registered function too late. >> >> I wonder if the following hack makes any change in the observed >> behaviour. >> >> diff --git a/lib/libc/stdlib/atexit.c b/lib/libc/stdlib/atexit.c >> index 511172a..bab850c 100644 >> --- a/lib/libc/stdlib/atexit.c >> +++ b/lib/libc/stdlib/atexit.c >> @@ -72,6 +72,7 @@ struct atexit { >> }; >> >> static struct atexit *__atexit; /* points to head of LIFO >> stack */ >> +static int atexit_gen; >> >> /* >> * Register the function described by 'fptr' to be called at >> application >> @@ -107,6 +108,7 @@ atexit_register(struct atexit_fn *fptr) >> __atexit = p; >> } >> p->fns[p->ind++] = *fptr; >> + atexit_gen++; >> _MUTEX_UNLOCK(&atexit_mutex); >> return 0; >> } >> @@ -162,7 +164,7 @@ __cxa_finalize(void *dso) >> struct dl_phdr_info phdr_info; >> struct atexit *p; >> struct atexit_fn fn; >> - int n, has_phdr; >> + int atexit_gen_prev, n, has_phdr; >> >> if (dso != NULL) >> has_phdr = _rtld_addr_phdr(dso,&phdr_info); >> @@ -170,6 +172,8 @@ __cxa_finalize(void *dso) >> has_phdr = 0; >> >> _MUTEX_LOCK(&atexit_mutex); >> +retry: >> + atexit_gen_prev = atexit_gen; >> for (p = __atexit; p; p = p->next) { >> for (n = p->ind; --n>= 0;) { >> if (p->fns[n].fn_type == ATEXIT_FN_EMPTY) >> @@ -196,6 +200,8 @@ __cxa_finalize(void *dso) >> _MUTEX_LOCK(&atexit_mutex); >> } >> } >> + if (atexit_gen_prev != atexit_gen) >> + goto retry; >> _MUTEX_UNLOCK(&atexit_mutex); >> if (dso == NULL) >> _MUTEX_DESTROY(&atexit_mutex); > I have tried your patch, it does not fix the problem. As I said, it > is a bug in QT, > the bug is pthread key current_thread_data_key is deleted by a global > C++ object > too early, other C++ global objects still need this pthread key. The > following procedure > shows how I found the problem: > > davidxu@xyf:~%gdb qdbus > GNU gdb 6.1.1 [FreeBSD] > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and > you are > welcome to change it and/or distribute copies of it under certain > conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for > details. > This GDB was configured as "i386-marcel-freebsd"...(no debugging > symbols found)... > (gdb) break __cxa_finalize > Function "__cxa_finalize" not defined. > Make breakpoint pending on future shared library load? (y or [n]) y > Breakpoint 1 (__cxa_finalize) pending. > (gdb) run > Starting program: /usr/local/bin/qdbus > (no debugging symbols found)...(no debugging symbols found)...(no > debugging symbols found)...(no debugging symbols found)...(no > debugging symbols found)...(no debugging symbols found)...(no > debugging symbols found)...[New LWP 100077] > (no debugging symbols found)...(no debugging symbols found)...(no > debugging symbols found)...(no debugging symbols found)...(no > debugging symbols found)...(no debugging symbols found)...(no > debugging symbols found)...(no debugging symbols found)...Breakpoint 2 > at 0x2864ac26 > Pending breakpoint "__cxa_finalize" resolved > (no debugging symbols found)...[New Thread 29007300 (LWP 100077/qdbus)] > (no debugging symbols found)...:1.0 > org.gnome.SessionManager > :1.11 > :1.111 > :1.12 > :1.13 > org.gtk.vfs.Daemon > :1.143 > :1.15 > org.pulseaudio.Server > :1.17 > org.gnome.Panel > :1.18 > :1.19 > :1.20 > org.gtk.Private.HalVolumeMonitor > :1.21 > org.gtk.Private.GPhoto2VolumeMonitor > :1.22 > :1.24 > org.gnome.ScreenSaver > :1.25 > :1.27 > :1.28 > :1.29 > :1.30 > :1.31 > org.gnome.panel.applet.WnckletFactory > :1.32 > :1.33 > :1.34 > :1.35 > org.gnome.panel.applet.CPUFreqAppletFactory > :1.36 > org.gnome.panel.applet.NotificationAreaAppletFactory > :1.37 > org.gnome.panel.applet.MultiLoadAppletFactory > :1.38 > :1.39 > :1.4 > org.gnome.GConf > :1.41 > org.gnome.panel.applet.ClockAppletFactory > :1.49 > :1.5 > org.gnome.SettingsDaemon > :1.50 > :1.53 > :1.64 > :1.7 > org.freedesktop.secrets > org.gnome.keyring > :1.75 > org.gtk.vfs.Metadata > :1.76 > org.gnome.Terminal.Display_0_0 > :1.77 > org.freedesktop.DBus > [Switching to Thread 29007300 (LWP 100077/qdbus)] > > Breakpoint 2, 0x2864ac26 in __cxa_finalize () from /lib/libc.so.7 > (gdb) print current_thread_data_key > $1 = 0 > (gdb) thread tsd > Key 0, destructor 0x281d77f0 <_Z27destroy_current_thread_dataPv> > Key 1, destructor 0x28732dc0 <g_thread_create_full> > Key 2, destructor 0x28726a00 <g_slice_get_config> > Key 3, destructor 0x0 <???> > > > Here you can find that the function destroy_current_thread_data() is > registered. > > > (gdb) bt > #0 0x2864ac26 in __cxa_finalize () from /lib/libc.so.7 > #1 0x285efe1a in exit () from /lib/libc.so.7 > #2 0x08051db5 in main () > (gdb) break QThreadData::current() > Breakpoint 3 at 0x281d7856 > (gdb) info breakpoints > Num Type Disp Enb Address What > 2 breakpoint keep y 0x2864ac26 <__cxa_finalize+6> > breakpoint already hit 1 time > 3 breakpoint keep y 0x281d7856 <QThreadData::current()+6> > (gdb) delete 2 > (gdb) cont > Continuing. > > Breakpoint 3, 0x281d7856 in QThreadData::current () from > /usr/local/lib/qt4/libQtCore.so.4 > (gdb) bt > #0 0x281d7856 in QThreadData::current () from > /usr/local/lib/qt4/libQtCore.so.4 > #1 0x281d4747 in QThread::currentThread () from > /usr/local/lib/qt4/libQtCore.so.4 > #2 0x28097248 in QDBusConnectionPrivate::deleteYourself () from > /usr/local/lib/qt4/libQtDBus.so.4 > #3 0x2808f2ea in QDBusConnection::~QDBusConnection () from > /usr/local/lib/qt4/libQtDBus.so.4 > #4 0x2864ad8f in __cxa_finalize () from /lib/libc.so.7 > #5 0x285efe1a in exit () from /lib/libc.so.7 > #6 0x08051db5 in main () > (gdb) thread tsd > Key 1, destructor 0x0 <???> > Key 2, destructor 0x0 <???> > Key 3, destructor 0x0 <???> > > Here you can see the destroy_current_thread_data() was executed, and > unregistered. > the key current_thread_key_data which is index 0 is deleted. > > > (gdb) cont > Continuing. > > Breakpoint 3, 0x281d7856 in QThreadData::current () from > /usr/local/lib/qt4/libQtCore.so.4 > (gdb) bt > #0 0x281d7856 in QThreadData::current () from > /usr/local/lib/qt4/libQtCore.so.4 > #1 0x282f58e3 in QObject::QObject () from > /usr/local/lib/qt4/libQtCore.so.4 > #2 0x281d4710 in QThread::QThread () from > /usr/local/lib/qt4/libQtCore.so.4 > #3 0x281d5a9e in QAdoptedThread::QAdoptedThread () from > /usr/local/lib/qt4/libQtCore.so.4 > #4 0x281d7934 in QThreadData::current () from > /usr/local/lib/qt4/libQtCore.so.4 > #5 0x281d4747 in QThread::currentThread () from > /usr/local/lib/qt4/libQtCore.so.4 > #6 0x28097248 in QDBusConnectionPrivate::deleteYourself () from > /usr/local/lib/qt4/libQtDBus.so.4 > #7 0x2808f2ea in QDBusConnection::~QDBusConnection () from > /usr/local/lib/qt4/libQtDBus.so.4 > #8 0x2864ad8f in __cxa_finalize () from /lib/libc.so.7 > #9 0x285efe1a in exit () from /lib/libc.so.7 > #10 0x08051db5 in main () > > now the stupid code starts to create a new thread... > > (gdb) cont > Continuing. > > Breakpoint 3, 0x281d7856 in QThreadData::current () from > /usr/local/lib/qt4/libQtCore.so.4 > (gdb) bt > #0 0x281d7856 in QThreadData::current () from > /usr/local/lib/qt4/libQtCore.so.4 > #1 0x282f58e3 in QObject::QObject () from > /usr/local/lib/qt4/libQtCore.so.4 > #2 0x281d4710 in QThread::QThread () from > /usr/local/lib/qt4/libQtCore.so.4 > #3 0x281d5a9e in QAdoptedThread::QAdoptedThread () from > /usr/local/lib/qt4/libQtCore.so.4 > #4 0x281d7934 in QThreadData::current () from > /usr/local/lib/qt4/libQtCore.so.4 > #5 0x282f58e3 in QObject::QObject () from > /usr/local/lib/qt4/libQtCore.so.4 > #6 0x281d4710 in QThread::QThread () from > /usr/local/lib/qt4/libQtCore.so.4 > #7 0x281d5a9e in QAdoptedThread::QAdoptedThread () from > /usr/local/lib/qt4/libQtCore.so.4 > #8 0x281d7934 in QThreadData::current () from > /usr/local/lib/qt4/libQtCore.so.4 > #9 0x281d4747 in QThread::currentThread () from > /usr/local/lib/qt4/libQtCore.so.4 > #10 0x28097248 in QDBusConnectionPrivate::deleteYourself () from > /usr/local/lib/qt4/libQtDBus.so.4 > #11 0x2808f2ea in QDBusConnection::~QDBusConnection () from > /usr/local/lib/qt4/libQtDBus.so.4 > #12 0x2864ad8f in __cxa_finalize () from /lib/libc.so.7 > #13 0x285efe1a in exit () from /lib/libc.so.7 > #14 0x08051db5 in main () > (gdb) > #0 0x281d7856 in QThreadData::current () from > /usr/local/lib/qt4/libQtCore.so.4 > #1 0x282f58e3 in QObject::QObject () from > /usr/local/lib/qt4/libQtCore.so.4 > #2 0x281d4710 in QThread::QThread () from > /usr/local/lib/qt4/libQtCore.so.4 > #3 0x281d5a9e in QAdoptedThread::QAdoptedThread () from > /usr/local/lib/qt4/libQtCore.so.4 > #4 0x281d7934 in QThreadData::current () from > /usr/local/lib/qt4/libQtCore.so.4 > #5 0x282f58e3 in QObject::QObject () from > /usr/local/lib/qt4/libQtCore.so.4 > #6 0x281d4710 in QThread::QThread () from > /usr/local/lib/qt4/libQtCore.so.4 > #7 0x281d5a9e in QAdoptedThread::QAdoptedThread () from > /usr/local/lib/qt4/libQtCore.so.4 > #8 0x281d7934 in QThreadData::current () from > /usr/local/lib/qt4/libQtCore.so.4 > #9 0x281d4747 in QThread::currentThread () from > /usr/local/lib/qt4/libQtCore.so.4 > #10 0x28097248 in QDBusConnectionPrivate::deleteYourself () from > /usr/local/lib/qt4/libQtDBus.so.4 > #11 0x2808f2ea in QDBusConnection::~QDBusConnection () from > /usr/local/lib/qt4/libQtDBus.so.4 > #12 0x2864ad8f in __cxa_finalize () from /lib/libc.so.7 > #13 0x285efe1a in exit () from /lib/libc.so.7 > #14 0x08051db5 in main () > (gdb) > > dead-loop in QT library until the stack overflow. > > As I said, it depends on ordering the global objects are destructed, > if the object which deleting > the current_thread_data_key is destructed lastly, the problem wont > happen, but now > it is destructed too early. I believe there is no specification said > that which C++ object should be > destructed first if they are in different compiled module and then are > linked together to generated > a shared object, .so file. > > Now let me dig into qthread_unix.cpp, see how QThreadData::current() works: QThreadData *QThreadData::current() { QThreadData *data = get_thread_data(); if (!data) { void *a; if (QInternal::activateCallbacks(QInternal::AdoptCurrentThread, &a)) { QThread *adopted = static_cast<QThread*>(a); Q_ASSERT(adopted); data = QThreadData::get2(adopted); set_thread_data(data); adopted->d_func()->running = true; adopted->d_func()->finished = false; static_cast<QAdoptedThread *>(adopted)->init(); } else { data = new QThreadData; QT_TRY { set_thread_data(data); data->thread = new QAdoptedThread(data); } QT_CATCH(...) { clear_thread_data(); data->deref(); data = 0; QT_RETHROW; } data->deref(); } if (!QCoreApplicationPrivate::theMainThread) QCoreApplicationPrivate::theMainThread = data->thread; } return data; } it calls get_thread_data(), if it returns NULL, it create a new thread, and try to set the new thread as "current thread data", it calls set_thread_data(). let's see how get_thread_data() and set_thread_data() work : static QThreadData *get_thread_data() { #ifdef Q_OS_SYMBIAN return reinterpret_cast<QThreadData *>(Dll::Tls()); #else pthread_once(¤t_thread_data_once, create_current_thread_data_key); return reinterpret_cast<QThreadData *>(pthread_getspecific(current_thread_data_key)); #endif } static void set_thread_data(QThreadData *data) { #ifdef Q_OS_SYMBIAN qt_symbian_throwIfError(Dll::SetTls(data)); #endif pthread_once(¤t_thread_data_once, create_current_thread_data_key); pthread_setspecific(current_thread_data_key, data); } They just use pthread_getspecific and pthread_setspecific, the current_thread_data_key was only created once which is guarded by pthread_once(), but as you know, the key has already been deleted by Q_DESTRUCTOR_FUNCTION(destroy_current_thread_data_key) which is a global object which has been destructed early, the key is no longer recreated, it is a stale key. So pthread_setspecific should fail, and pthread_getspecific would return NULL, this causes QThreadData::current() create a new thread, it seems QAdoptedThread also calls QThreadData::current(), so it is a recursion, the recursion is never ended until stack overflow.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4FB9B649.8020906>