Date: Wed, 17 Oct 2018 02:59:51 -0700 From: Mark Millard <marklmi@yahoo.com> To: FreeBSD Toolchain <freebsd-toolchain@freebsd.org>, FreeBSD PowerPC ML <freebsd-ppc@freebsd.org> Cc: Justin Hibbits <chmeeedalf@gmail.com> Subject: /lib/libgcc_s.so.1 mishandles eh_frame information that /usr/local/lib/gcc8/libgcc_s.so.1 handles (powerpc64 test context): a simple example program Message-ID: <4D444DB3-A472-42BC-973E-3E468C07757B@yahoo.com>
next in thread | raw e-mail | index | archive | help
(I happen to be using head -r339076 and ports -r480180 vintage materials, not that I expect such narrow vintage ties.) I finally have a simple example of the issue on powerpc64 . . . The following simple C++ program shows a significant difference for powerpc64 depending on which libgcc_s.so is used (system's vs. gcc8's): # more exception_test1.cpp=20 #include <exception> // -O2 context used. volatile unsigned int v =3D 1; extern int f() { volatile unsigned char c =3D 'a'; v++; // Despite "volatile" the access to v in g // was otherwise optimized out and the // std::exception was not followed by // code for f(). So force g's use. return c; } extern void g() { if (v) throw std::exception(); f(); // ends up inlined but the problem is demonstrated. } int main(void) { try {g();} // Used a separate function to avoid any potential // special handling of code in main. Call not // optimized out. catch (std::exception& e) {} return 0; } (gcc8 just happens to be the lang/gcc* that I have installed. Similar points likely apply to gcc[?-8]. The same problem can be demonstrated by devel/powerpc64-gcc use, which ends up using /lib/libgcc_s.so.1 as well --but does not provide the contrasting "it works" case.) The only reason for the try/catch is to avoid the "it works" case from doing: # ./a.out terminate called after throwing an instance of 'std::exception' what(): std::exception Abort trap (core dumped) Just calling g() is enough to have the problem with /lib/libgcc_s.so.1 . The program works fine for being built via: # g++8 -Wl,-rpath=3D/usr/local/lib/gcc8 -g -O2 exception_test1.cpp # ldd a.out a.out: libstdc++.so.6 =3D> /usr/local/lib/gcc8/libstdc++.so.6 = (0x81006e000) libm.so.5 =3D> /lib/libm.so.5 (0x8102c7000) libgcc_s.so.1 =3D> /usr/local/lib/gcc8/libgcc_s.so.1 = (0x810307000) libc.so.7 =3D> /lib/libc.so.7 (0x810330000) But fails, stuck looping in _Unwind_RaiseException, for being built via: # g++8 -g -O2 exception_test1.cpp # ldd a.out a.out: libstdc++.so.6 =3D> /usr/local/lib/gcc8/libstdc++.so.6 = (0x81006e000) libm.so.5 =3D> /lib/libm.so.5 (0x8102c7000) libgcc_s.so.1 =3D> /lib/libgcc_s.so.1 (0x810307000) libc.so.7 =3D> /lib/libc.so.7 (0x81032d000) The only difference (other than detailed addresses) is: libgcc_s.so.1 =3D> /usr/local/lib/gcc8/libgcc_s.so.1 = (0x810307000) vs. libgcc_s.so.1 =3D> /lib/libgcc_s.so.1 (0x810307000) The dwarfdump -v -v -F reports match exactly for the two builds of the program, as does the code for the function g where the problem is observed. What is different is that /lib/libgcc_s.so.1 misinterprets the .eh_frame information (disagreeing with the dwarfdump report and with /usr/local/lib/gcc8/libgcc_s.so.1 behavior). # dwarfdump -v -v -F a.out | more .eh_frame fde: < 0><0x100007a0:0x10000840><><cie offset 0x00000018::cie index = 0><fde offset 0x00000014 length: 0x00000010> <eh aug data len 0x0> 0x100007a0: <off cfa=3D00(r1) >=20 fde section offset 20 0x00000014 cie offset for fde: 24 0x00000018 0 DW_CFA_nop 1 DW_CFA_nop 2 DW_CFA_nop < 1><0x10000840:0x10000894><main><cie offset 0x00000024::cie index = 1><fde offset 0x00000098 length: 0x00000024> <eh aug data len 0x8 bytes 0x00 00 00 00 00 00 00 1b > 0x10000840: <off cfa=3D00(r1) >=20 0x1000084c: <off cfa=3D112(r1) > <off r65=3D16(cfa) >=20 0x10000854: <off cfa=3D00(r1) > <off r65=3D16(cfa) >=20 0x10000860: <off cfa=3D00(r1) >=20 0x10000864: <off cfa=3D112(r1) > <off r65=3D16(cfa) >=20 fde section offset 152 0x00000098 cie offset for fde: 36 0x00000024 0 DW_CFA_advance_loc 12 (3 * 4) 1 DW_CFA_def_cfa_offset 112 3 DW_CFA_offset_extended_sf r65 16 (-2 * -8) 6 DW_CFA_advance_loc 8 (2 * 4) 7 DW_CFA_remember_state 8 DW_CFA_def_cfa_offset 0 10 DW_CFA_advance_loc 12 (3 * 4) 11 DW_CFA_restore_extended r65 13 DW_CFA_advance_loc 4 (1 * 4) 14 DW_CFA_restore_state < 2><0x10000db0:0x10000ddc><f><cie offset 0x00000044::cie index = 0><fde offset 0x00000040 length: 0x00000010> <eh aug data len 0x0> 0x10000db0: <off cfa=3D00(r1) >=20 fde section offset 64 0x00000040 cie offset for fde: 68 0x00000044 0 DW_CFA_nop 1 DW_CFA_nop 2 DW_CFA_nop < 3><0x10000de0:0x10000e5c><g><cie offset 0x00000058::cie index = 0><fde offset 0x00000054 length: 0x00000020> <eh aug data len 0x0> 0x10000de0: <off cfa=3D00(r1) >=20 0x10000de8: <off cfa=3D128(r1) >=20 0x10000e14: <off cfa=3D00(r1) >=20 0x10000e18: <off cfa=3D128(r1) >=20 0x10000e1c: <off cfa=3D128(r1) > <off r65=3Dr0 >=20 0x10000e24: <off cfa=3D128(r1) > <off r65=3D16(cfa) >=20 fde section offset 84 0x00000054 cie offset for fde: 88 0x00000058 0 DW_CFA_advance_loc 8 (2 * 4) 1 DW_CFA_def_cfa_offset 128 4 DW_CFA_advance_loc 44 (11 * 4) 5 DW_CFA_remember_state 6 DW_CFA_def_cfa_offset 0 8 DW_CFA_advance_loc 4 (1 * 4) 9 DW_CFA_restore_state 10 DW_CFA_advance_loc 4 (1 * 4) 11 DW_CFA_register r65 =3D r0 14 DW_CFA_advance_loc 8 (2 * 4) 15 DW_CFA_offset_extended_sf r65 16 (-2 * -8) 18 DW_CFA_nop < 4><0x10000ee0:0x10000f34><><cie offset 0x0000002c::cie index = 0><fde offset 0x00000028 length: 0x00000014> <eh aug data len 0x0> 0x10000ee0: <off cfa=3D00(r1) >=20 0x10000ee4: <off cfa=3D00(r1) > <off r65=3Dr12 >=20 0x10000ef8: <off cfa=3D00(r1) >=20 fde section offset 40 0x00000028 cie offset for fde: 44 0x0000002c 0 DW_CFA_advance_loc 4 (1 * 4) 1 DW_CFA_register r65 =3D r12 4 DW_CFA_advance_loc 20 (5 * 4) 5 DW_CFA_restore_extended r65 cie: < 0> version 1 cie section offset 0 0x00000000 augmentation zR code_alignment_factor 4 data_alignment_factor -8 return_address_register 65 eh aug data len 0x1 bytes 0x1b=20 bytes of initial instructions 3 cie length 16 initial instructions 0 DW_CFA_def_cfa r1 0 < 1> version 1 cie section offset 120 0x00000078 augmentation zPLR code_alignment_factor 4 data_alignment_factor -8 return_address_register 65 eh aug data len 0xb bytes 0x94 00 00 00 00 00 01 04 c9 14 1b=20 bytes of initial instructions 3 cie length 28 initial instructions 0 DW_CFA_def_cfa r1 0 In: < 3><0x10000de0:0x10000e5c><g><cie offset 0x00000058::cie index = 0><fde offset 0x00000054 length: 0x00000020> <eh aug data len 0x0> 0x10000de0: <off cfa=3D00(r1) >=20 0x10000de8: <off cfa=3D128(r1) >=20 0x10000e14: <off cfa=3D00(r1) >=20 0x10000e18: <off cfa=3D128(r1) >=20 0x10000e1c: <off cfa=3D128(r1) > <off r65=3Dr0 >=20 0x10000e24: <off cfa=3D128(r1) > <off r65=3D16(cfa) >=20 The last 3 128's are from the DW_CFA_restore_state from the sequence: 1 DW_CFA_def_cfa_offset 128 . . . 5 DW_CFA_remember_state . . . 9 DW_CFA_restore_state But with /lib/libgcc_s.so.1 the 128 is not saved and restored, leaving default 0's in place instead. And use of the wrong stack addresses results, which in turn prevents the stack from unwinding past g()'s frame. [Note: For FreeBSD on powerpc64 r1 is the stack-pointer.] The code described by the: < 3><0x10000de0:0x10000e5c><g> . . . is as follows. Note the stdu r1,-128(r1) and the addi r1,r1,128 and what code only used via bne cr7,0x10000e18 <g()+56> and that it has the stdu r1,-128(r1) prior context, not addi r1,r1,128: (gdb) disass g Dump of assembler code for function g(): 0x0000000010000de0 <+0>: nop 0x0000000010000de4 <+4>: stdu r1,-128(r1) 0x0000000010000de8 <+8>: lwz r9,-32536(r2) 0x0000000010000dec <+12>: cmpdi cr7,r9,0 0x0000000010000df0 <+16>: bne cr7,0x10000e18 <g()+56> 0x0000000010000df4 <+20>: li r9,97 0x0000000010000df8 <+24>: nop 0x0000000010000dfc <+28>: stb r9,112(r1) 0x0000000010000e00 <+32>: lwz r9,-32536(r2) 0x0000000010000e04 <+36>: addi r9,r9,1 0x0000000010000e08 <+40>: stw r9,-32536(r2) 0x0000000010000e0c <+44>: lbz r9,112(r1) 0x0000000010000e10 <+48>: addi r1,r1,128 0x0000000010000e14 <+52>: blr 0x0000000010000e18 <+56>: mflr r0 0x0000000010000e1c <+60>: li r3,8 0x0000000010000e20 <+64>: std r0,144(r1) 0x0000000010000e24 <+68>: bl 0x100007a0 = <0000004b.plt_call.__cxa_allocate_exception@@CXXABI_1.3> 0x0000000010000e28 <+72>: ld r2,40(r1) 0x0000000010000e2c <+76>: nop 0x0000000010000e30 <+80>: nop 0x0000000010000e34 <+84>: ld r9,-32720(r2) 0x0000000010000e38 <+88>: ld r5,-32712(r2) 0x0000000010000e3c <+92>: nop 0x0000000010000e40 <+96>: ld r4,-32704(r2) 0x0000000010000e44 <+100>: std r9,0(r3) 0x0000000010000e48 <+104>: bl 0x10000820 = <0000004b.plt_call.__cxa_throw@@CXXABI_1.3> 0x0000000010000e4c <+108>: ld r2,40(r1) 0x0000000010000e50 <+112>: .long 0x0 0x0000000010000e54 <+116>: .long 0x90001 0x0000000010000e58 <+120>: lwz r0,0(0) [Note: more than the 128's might not be handled right for more general code, but the example only shows the 128's issue (i.e., the cfa_offset mishandling issue).] I'll note that throw_exception in /lib/libgcc_s.so.1 has the same sort of machine-code structure as g relative to cfa_offset's and that, without a workaround to avoid that structure being generated, all thrown C++ exceptions fail by _Unwind_RaiseException being stuck in a loop for powerpc64. In order to test the simple program I used the workaround: # svnlite diff /usr/src/contrib/libcxxrt/ Index: /usr/src/contrib/libcxxrt/exception.cc =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- /usr/src/contrib/libcxxrt/exception.cc (revision 339076) +++ /usr/src/contrib/libcxxrt/exception.cc (working copy) @@ -772,10 +772,71 @@ info->globals.uncaughtExceptions++; _Unwind_Reason_Code err =3D = _Unwind_RaiseException(&ex->unwindHeader); +#if !defined(__powerpc64__) && !defined(__ppc64__) // The _Unwind_RaiseException() function should not return, it = should // unwind the stack past this function. If it does return, then = something // has gone wrong. report_failure(err, ex); +#else +// NOTE: Only tested for devel/powerpc64-gcc based buildworld +// because clang still silently ignores +// __builtin_eh_return(offset,handler) for powerpc64 +// (and powerpc), thus not generating correct output. +// +// NOTE: I've no clue if other archtiectures might have +// analogous issues to powerpc64. I'm not sure +// about powerpc because of it still being stuck +// at gcc 4.2.1 . (clang problems and no devel/powerpc-gcc .) +// +// The above/normal code produced the following sort of structure +// for throw_exception. r1 is the stack pointer, note its adjustments +// via stdu r1,-128(r1) and via addi r1,r1,128 . +// +// <throw_exception+0>: mflr r0 +// <throw_exception+4>: std r31,-8(r1) +// <throw_exception+8>: mr r31,r3 +// <throw_exception+12>: std r0,16(r1) +// <throw_exception+16>: stdu r1,-128(r1) +// . . . +// <throw_exception+140>: bl = <00000018.plt_call._Unwind_RaiseException@@GCC_3.0> +// <throw_exception+144>: ld r2,40(r1) +// <throw_exception+148>: addi r1,r1,128 +// <throw_exception+152>: mr r4,r31 +// <throw_exception+156>: ld r0,16(r1) +// <throw_exception+160>: ld r31,-8(r1) +// <throw_exception+164>: mtlr r0 +// <throw_exception+168>: b <report_failure> +// +// The loop in __Unwind_RaiseException had its "fs" +// used with uw_frame_state_for and uw_update_context get +// stuck with the pc field having the address for +// throw_exception+152 (just after the stack adjustment +// addi r1,r1,128). Effectively, throw_exception unwinds +// its stack use before calling report_failure in a +// way that throw_exception is no longer on the stack. +// The exception unwinding logic did not handle this +// correctly and got stuck looping. +// +// The below avoids having any such stack adjustment here +// by avoiding the report_failure call and directly doing +// what case _URC_END_OF_STACK in report_failure does for +// its first couple of lines. (It is also the kind of +// thing that src/contrib/libstdc++/libsupc++/eh_throw.cc +// has in its __cxxabiv1::__cxa_throw after the +// _Unwind_RaiseException call.) +// +// Another option could be to turn report_failure into +// a macro so that no subroutine call could be involved. +// That should avoid the early stack pointer kadjsutment. +// +// Also: For the other archtiectures that I looked at, no +// such stack adjsutments were involved in the code +// generated (or the matching dwarfdump output). +// But I did not look at many. + + __cxa_begin_catch (&(ex->unwindHeader)); + std::terminate(); +#endif } =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4D444DB3-A472-42BC-973E-3E468C07757B>