Date: Wed, 26 Jun 2013 22:31:33 +0200 From: Michael Gmelin <freebsd@grem.de> To: Dimitry Andric <dim@FreeBSD.org> Cc: Kostik Belousov <kostikbel@gmail.com>, Brooks Davis <brooks@FreeBSD.org>, David Chisnall <theraven@freebsd.org>, "freebsd-ports@freebsd.org Ports" <freebsd-ports@freebsd.org>, Matthias Andree <mandree@FreeBSD.org> Subject: Re: Global destructor order problems (was: Re: Are ports supposed to build and run on 10-CURRENT?) Message-ID: <20130626223133.1cc1e009@bsd64.grem.de> In-Reply-To: <7CD9075C-F8D6-41C1-8D21-8B10DF866ECE@FreeBSD.org> References: <20130613031535.4087d7f9@bsd64.grem.de> <EF830CD7-00F1-4628-8515-76133BBE85E7@FreeBSD.org> <C1CC40FC-4489-4164-96B7-5E1A25DCB37F@FreeBSD.org> <20130626015508.426ab5b9@bsd64.grem.de> <51CAADB8.7090603@FreeBSD.org> <20130626133149.4835f14a@bsd64.grem.de> <7CD9075C-F8D6-41C1-8D21-8B10DF866ECE@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 26 Jun 2013 21:26:09 +0200 Dimitry Andric <dim@FreeBSD.org> wrote: > On Jun 26, 2013, at 13:31, Michael Gmelin <freebsd@grem.de> wrote: > > On Wed, 26 Jun 2013 11:00:40 +0200 > > Dimitry Andric <dim@FreeBSD.org> wrote: > >> On 2013-06-26 01:55, Michael Gmelin wrote: > >> ... > >>> The problem is that static initialization happens in the expected > >>> order (same translation unit), but termination does *not* happen > >>> in the reverse order of initialization, > ... > > Yep, strange indeed - my test cases didn't use fPIC at first, so it > > took a while to figure it out. It's seems to be some sort of > > combined link/runtime problem, since the same executable built on 10 > > runs fine on 9.1-RELEASE when copied over. I tried replacing various > > system libraries with their versions from 9.1 in a jail to see if I > > could make it run on 10, but to no success. > > > > By the way, the same code built on 9.1 using clang 3.1 or clang 3.3 > > runs fine on 10 as well, so the only case that does NOT work is > > build on 10 and run on 10 using clang. Also, when I link copies of > > main.o and libout.so that have been built on 10 on 9.1 using > > clang33 the problem doesn't happen as well. So it appears that the > > problem happens when linking the executable when one of the objects > > is position independent and then only surfaces on 10. > > So I did a bit of investigation, and the root cause is that both clang > and newer versions of gcc emit direct calls to the destructors of > global objects, while older gcc's, such as the one in base, generate > anonymous wrapper functions, which in turn call the destructors. > > The direct destructor calls will not work correctly, if the > destructors are located in shared objects, while the global objects > themselves are in the main program, and if the main program is > compiled with -fPIC. This problem happens after the following > revision, which changed the behavior of __cxa_finalize(); > > http://svnweb.freebsd.org/base?view=revision&revision=211706 > > This revision is not in 9.1-RELEASE, but it is in 9-STABLE, so the > problem can also be reproduced there. > > To illustrate: if you compile your test program's main.cpp with gcc > -fPIC, it produces (excerpted the assembly a bit for readability): > > .section .ctors,"aw",@progbits > .align 4 > .long _GLOBAL__I_main > [...] > __tcf_1: > pushl %ebp > movl %esp, %ebp > pushl %ebx > call __i686.get_pc_thunk.bx > addl $_GLOBAL_OFFSET_TABLE_, %ebx > subl $16, %esp > leal innerInstance@GOTOFF(%ebx), %eax > pushl %eax > call _ZN5InnerD1Ev@PLT > addl $16, %esp > movl -4(%ebp), %ebx > leave > ret > [...] > _Z41__static_initialization_and_destruction_0ii: > pushl %ebp > movl %esp, %ebp > pushl %esi > pushl %ebx > call __i686.get_pc_thunk.bx > addl $_GLOBAL_OFFSET_TABLE_, %ebx > decl %eax > jne .L14 > cmpl $65535, %edx > jne .L14 > subl $12, %esp > leal outerInstance@GOTOFF(%ebx), %eax > pushl %eax > call _ZN5OuterC1Ev@PLT > movl __dso_handle@GOT(%ebx), %esi > addl $12, %esp > leal __tcf_0@GOTOFF(%ebx), %eax > pushl %esi > pushl $0 > pushl %eax > call __cxa_atexit@PLT > leal innerInstance@GOTOFF(%ebx), %eax > movl %eax, (%esp) > call _ZN5InnerC1Ev@PLT > addl $12, %esp > pushl %esi > pushl $0 > leal __tcf_1@GOTOFF(%ebx), %eax > pushl %eax > call __cxa_atexit@PLT > addl $16, %esp > .L14: > leal -8(%ebp), %esp > popl %ebx > popl %esi > popl %ebp > ret > [...] > _GLOBAL__I_main: > pushl %ebp > movl $65535, %edx > movl %esp, %ebp > movl $1, %eax > popl %ebp > jmp _Z41__static_initialization_and_destruction_0ii > [...] > __tcf_0: > pushl %ebp > movl %esp, %ebp > pushl %ebx > call __i686.get_pc_thunk.bx > addl $_GLOBAL_OFFSET_TABLE_, %ebx > subl $16, %esp > leal outerInstance@GOTOFF(%ebx), %eax > pushl %eax > call _ZN5OuterD1Ev@PLT > addl $16, %esp > movl -4(%ebp), %ebx > leave > ret > [...] > > Summarizing: > - the startup code calls _GLOBAL__I_main, a.k.a. "global constructors > keyed to main" > - jumps to _Z41__static_initialization_and_destruction_0ii, a.k.a. > __static_initialization_and_destruction_0(int, int) > - calls _ZN5OuterC1Ev, a.k.a. Outer::Outer(), to construct the > outerInstance object > - calls __cxa_atexit(), registering a generated wrapper function > __tcf_0(), which actually calls _ZN5OuterD1Ev, a.k.a. > Outer::~Outer() > - similar for the innerInstance object > > In contrast, clang produces the following: > > _GLOBAL__I_a: # @_GLOBAL__I_a > pushl %ebp > movl %esp, %ebp > pushl %ebx > pushl %edi > pushl %esi > subl $12, %esp > calll .L2$pb > .L2$pb: > popl %ebx > addl $_GLOBAL_OFFSET_TABLE_+(.Ltmp13-.L2$pb), %ebx > leal _ZL13outerInstance@GOTOFF(%ebx), %edi > movl %edi, (%esp) > calll _ZN5OuterC1Ev@PLT > movl __dso_handle@GOT(%ebx), %esi > movl %esi, 8(%esp) > movl %edi, 4(%esp) > movl _ZN5OuterD1Ev@GOT(%ebx), %eax > movl %eax, (%esp) > calll __cxa_atexit@PLT > leal .Lstr5@GOTOFF(%ebx), %eax > movl %eax, (%esp) > calll puts@PLT > movl %esi, 8(%esp) > leal _ZL13innerInstance@GOTOFF(%ebx), %eax > movl %eax, 4(%esp) > movl _ZN5InnerD1Ev@GOT(%ebx), %eax > movl %eax, (%esp) > calll __cxa_atexit@PLT > addl $12, %esp > popl %esi > popl %edi > popl %ebx > popl %ebp > ret > [...] > .section .ctors,"aw",@progbits > .align 4 > .long _GLOBAL__I_a > > Summarizing: > - the startup code calls _GLOBAL__I_a, a.k.a. "global constructors > keyed to a" > - calls _ZN5OuterC1Ev, a.k.a. Outer::Outer(), to construct the > outerInstance object > - calls __cxa_atexit(), directly registering _ZN5OuterD1Ev, a.k.a > Outer::~Outer() > - similar for the innerInstance object (though the constructor is > inlined) > > The crucial difference is that clang *directly* registers the > destructor's function pointer, instead of using a locally generated > wrapper. Newer versions of gcc behave the same way, since this > upstream revision: > > http://gcc.gnu.org/viewcvs/gcc?view=revision&revision=125253 > > This is roughly gcc 4.3.0 and later. For example, gcc 4.8 generates: > > _GLOBAL__sub_I_main.cpp: > pushl %ebp > movl %esp, %ebp > pushl %edi > pushl %esi > pushl %ebx > call __x86.get_pc_thunk.bx > addl $_GLOBAL_OFFSET_TABLE_, %ebx > subl $24, %esp > leal _ZL13outerInstance@GOTOFF(%ebx), %edi > pushl %edi > call _ZN5OuterC1Ev@PLT > leal __dso_handle@GOTOFF(%ebx), %esi > addl $12, %esp > pushl %esi > pushl %edi > pushl _ZN5OuterD1Ev@GOT(%ebx) > call __cxa_atexit@PLT > leal .LC2@GOTOFF(%ebx), %eax > movl %eax, (%esp) > call puts@PLT > addl $12, %esp > pushl %esi > leal _ZL13innerInstance@GOTOFF(%ebx), %eax > pushl %eax > pushl _ZN5InnerD1Ev@GOT(%ebx) > call __cxa_atexit@PLT > addl $16, %esp > leal -12(%ebp), %esp > popl %ebx > popl %esi > popl %edi > popl %ebp > ret > [...] > .section .ctors,"aw",@progbits > .align 4 > .long _GLOBAL__sub_I_main.cpp > > In each case, __cxa_exit() is called with the following three > arguments: the address of the destructor, the pointer to the object > ('this'), and the dso handle, which in this case belongs to main. > > Now, when the program exits, it will repeatedly call __cxa_finalize() > to actually call the registered exit functions, each time passing a > pointer to the dso being unloaded (or NULL for main): > > void > __cxa_finalize(void *dso) > { > struct dl_phdr_info phdr_info; > struct atexit *p; > struct atexit_fn fn; > int n, has_phdr; > > if (dso != NULL) > has_phdr = _rtld_addr_phdr(dso, &phdr_info); > else > has_phdr = 0; > > _MUTEX_LOCK(&atexit_mutex); > for (p = __atexit; p; p = p->next) { > for (n = p->ind; --n >= 0;) { > if (p->fns[n].fn_type == ATEXIT_FN_EMPTY) > continue; /* already been called */ > fn = p->fns[n]; > if (dso != NULL && dso != fn.fn_dso) { > /* wrong DSO ? */ > if (!has_phdr > || !__elf_phdr_match_addr( &phdr_info, fn.fn_ptr.cxa_func)) > continue; > } > /* > Mark entry to indicate that this particular > handler has already been called. > */ > p->fns[n].fn_type = ATEXIT_FN_EMPTY; > _MUTEX_UNLOCK(&atexit_mutex); > > /* Call the function of correct type. */ > if (fn.fn_type == ATEXIT_FN_CXA) > fn.fn_ptr.cxa_func(fn.fn_arg); > else if (fn.fn_type == ATEXIT_FN_STD) > fn.fn_ptr.std_func(); > [...] > > The problem is in the part with the comment "wrong DSO?". When the > main program is compiled with -fPIC, and __cxa_finalize() is called > for libout.so (which is the first dso to be processed), it will > encounter the entry for Outer::~Outer(). > > Then, the "wrong DSO?" part will be entered, and because has_phdr is > true, it will call __elf_phdr_match_addr() with the address of the > destructor. Since the destructor is registered with > _ZN5OuterD1Ev@GOT, it will match, and it will be called. > > In contrast, if the main program is not compiled with -fPIC, the > destructor will be registered with _ZN5OuterD1Ev (e.g. without @GOT), > and __elf_phdr_match_addr() will not match, and the loop continues > without calling the destructor. > > Finally, if the main program is compiled with gcc and -fPIC, the > destructor itself is never considered in the __cxa_finalize() loop, > only the locally generated wrapper function. That function will only > be called in the __cxa_finalize() call for the main program, and so > the destructor will be called at the right time. > > I am not entirely sure what can be done to remedy this scenario, and I > also do not know the exact reasons for r211706 changing the behavior. > > E.g., before r211706, if the atexit_fn's fn_dso did not match the dso > being unloaded, the loop would unconditionally continue without > calling the handler. On the other hand, r211706 seems to make sure > functions from dso's will be called before they are unloaded, as > calling them afterwards would not always make sense... :-) > Thanks for the in-depth analysis, quite interesting read that makes a lot of sense and matches the gut feeling that "it's destroying everything defined in the shared lib first". Call me Mr. Obvious, but I assume clang and gcc won't change the way destructors are registered, so we need a fix in FreeBSD. Maybe kib@ could shed some light on this? Cheers, Michael -- Michael Gmelin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130626223133.1cc1e009>