Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 17 Oct 2018 02:59:51 -0700
From:      Mark Millard <marklmi@yahoo.com>
To:        FreeBSD Toolchain <freebsd-toolchain@freebsd.org>, FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>
Cc:        Justin Hibbits <chmeeedalf@gmail.com>
Subject:   /lib/libgcc_s.so.1 mishandles eh_frame information that /usr/local/lib/gcc8/libgcc_s.so.1 handles (powerpc64 test context): a simple example program
Message-ID:  <4D444DB3-A472-42BC-973E-3E468C07757B@yahoo.com>

next in thread | raw e-mail | index | archive | help
(I happen to be using head -r339076 and ports -r480180
vintage materials, not that I expect such narrow vintage
ties.)

I finally have a simple example of the
issue on powerpc64 . . .

The following simple C++ program shows
a significant difference for powerpc64
depending on which libgcc_s.so is used
(system's vs. gcc8's):

# more exception_test1.cpp=20
#include <exception>

// -O2 context used.

volatile unsigned int v =3D 1;

extern int f()
{
    volatile unsigned char c =3D 'a';
    v++; // Despite "volatile" the access to v in g
         // was otherwise optimized out and the
         // std::exception was not followed by
         // code for f(). So force g's use.
    return c;
}

extern void g()
{
    if (v) throw std::exception();
    f(); // ends up inlined but the problem is demonstrated.
}

int main(void)
{
    try {g();} // Used a separate function to avoid any potential
               // special handling of code in main. Call not
               // optimized out.
    catch (std::exception& e) {}
    return 0;
}


(gcc8 just happens to be the lang/gcc* that I have installed.
Similar points likely apply to gcc[?-8]. The same problem
can be demonstrated by devel/powerpc64-gcc use, which ends
up using /lib/libgcc_s.so.1 as well --but does not provide
the contrasting "it works" case.)

The only reason for the try/catch is to avoid the "it
works" case from doing:

# ./a.out
terminate called after throwing an instance of 'std::exception'
  what():  std::exception
Abort trap (core dumped)

Just calling g() is enough to have the problem with
/lib/libgcc_s.so.1 .

The program works fine for being built via:

# g++8 -Wl,-rpath=3D/usr/local/lib/gcc8 -g -O2 exception_test1.cpp
# ldd a.out
a.out:
	libstdc++.so.6 =3D> /usr/local/lib/gcc8/libstdc++.so.6 =
(0x81006e000)
	libm.so.5 =3D> /lib/libm.so.5 (0x8102c7000)
	libgcc_s.so.1 =3D> /usr/local/lib/gcc8/libgcc_s.so.1 =
(0x810307000)
	libc.so.7 =3D> /lib/libc.so.7 (0x810330000)

But fails, stuck looping in _Unwind_RaiseException, for being built via:

# g++8 -g -O2 exception_test1.cpp
# ldd a.out
a.out:
	libstdc++.so.6 =3D> /usr/local/lib/gcc8/libstdc++.so.6 =
(0x81006e000)
	libm.so.5 =3D> /lib/libm.so.5 (0x8102c7000)
	libgcc_s.so.1 =3D> /lib/libgcc_s.so.1 (0x810307000)
	libc.so.7 =3D> /lib/libc.so.7 (0x81032d000)

The only difference (other than detailed addresses) is:

	libgcc_s.so.1 =3D> /usr/local/lib/gcc8/libgcc_s.so.1 =
(0x810307000)
vs.
	libgcc_s.so.1 =3D> /lib/libgcc_s.so.1 (0x810307000)

The dwarfdump -v -v -F reports match exactly for the two
builds of the program, as does the code for the function
g where the problem is observed.

What is different is that /lib/libgcc_s.so.1 misinterprets
the .eh_frame information (disagreeing with the dwarfdump
report and with /usr/local/lib/gcc8/libgcc_s.so.1 behavior).

# dwarfdump -v -v -F a.out | more

.eh_frame

fde:
<    0><0x100007a0:0x10000840><><cie offset 0x00000018::cie index     =
0><fde offset 0x00000014 length: 0x00000010>
       <eh aug data len 0x0>
        0x100007a0: <off cfa=3D00(r1) >=20
 fde section offset 20 0x00000014 cie offset for fde: 24 0x00000018
         0 DW_CFA_nop
         1 DW_CFA_nop
         2 DW_CFA_nop
<    1><0x10000840:0x10000894><main><cie offset 0x00000024::cie index    =
 1><fde offset 0x00000098 length: 0x00000024>
       <eh aug data len 0x8 bytes 0x00 00 00 00 00 00 00 1b >
        0x10000840: <off cfa=3D00(r1) >=20
        0x1000084c: <off cfa=3D112(r1) > <off r65=3D16(cfa) >=20
        0x10000854: <off cfa=3D00(r1) > <off r65=3D16(cfa) >=20
        0x10000860: <off cfa=3D00(r1) >=20
        0x10000864: <off cfa=3D112(r1) > <off r65=3D16(cfa) >=20
 fde section offset 152 0x00000098 cie offset for fde: 36 0x00000024
         0 DW_CFA_advance_loc 12  (3 * 4)
         1 DW_CFA_def_cfa_offset 112
         3 DW_CFA_offset_extended_sf r65 16  (-2 * -8)
         6 DW_CFA_advance_loc 8  (2 * 4)
         7 DW_CFA_remember_state
         8 DW_CFA_def_cfa_offset 0
        10 DW_CFA_advance_loc 12  (3 * 4)
        11 DW_CFA_restore_extended r65
        13 DW_CFA_advance_loc 4  (1 * 4)
        14 DW_CFA_restore_state
<    2><0x10000db0:0x10000ddc><f><cie offset 0x00000044::cie index     =
0><fde offset 0x00000040 length: 0x00000010>
       <eh aug data len 0x0>
        0x10000db0: <off cfa=3D00(r1) >=20
 fde section offset 64 0x00000040 cie offset for fde: 68 0x00000044
         0 DW_CFA_nop
         1 DW_CFA_nop
         2 DW_CFA_nop
<    3><0x10000de0:0x10000e5c><g><cie offset 0x00000058::cie index     =
0><fde offset 0x00000054 length: 0x00000020>
       <eh aug data len 0x0>
        0x10000de0: <off cfa=3D00(r1) >=20
        0x10000de8: <off cfa=3D128(r1) >=20
        0x10000e14: <off cfa=3D00(r1) >=20
        0x10000e18: <off cfa=3D128(r1) >=20
        0x10000e1c: <off cfa=3D128(r1) > <off r65=3Dr0 >=20
        0x10000e24: <off cfa=3D128(r1) > <off r65=3D16(cfa) >=20
 fde section offset 84 0x00000054 cie offset for fde: 88 0x00000058
         0 DW_CFA_advance_loc 8  (2 * 4)
         1 DW_CFA_def_cfa_offset 128
         4 DW_CFA_advance_loc 44  (11 * 4)
         5 DW_CFA_remember_state
         6 DW_CFA_def_cfa_offset 0
         8 DW_CFA_advance_loc 4  (1 * 4)
         9 DW_CFA_restore_state
        10 DW_CFA_advance_loc 4  (1 * 4)
        11 DW_CFA_register r65 =3D r0
        14 DW_CFA_advance_loc 8  (2 * 4)
        15 DW_CFA_offset_extended_sf r65 16  (-2 * -8)
        18 DW_CFA_nop
<    4><0x10000ee0:0x10000f34><><cie offset 0x0000002c::cie index     =
0><fde offset 0x00000028 length: 0x00000014>
       <eh aug data len 0x0>
        0x10000ee0: <off cfa=3D00(r1) >=20
        0x10000ee4: <off cfa=3D00(r1) > <off r65=3Dr12 >=20
        0x10000ef8: <off cfa=3D00(r1) >=20
 fde section offset 40 0x00000028 cie offset for fde: 44 0x0000002c
         0 DW_CFA_advance_loc 4  (1 * 4)
         1 DW_CFA_register r65 =3D r12
         4 DW_CFA_advance_loc 20  (5 * 4)
         5 DW_CFA_restore_extended r65

cie:
<    0> version                         1
        cie section offset              0 0x00000000
        augmentation                    zR
        code_alignment_factor           4
        data_alignment_factor           -8
        return_address_register         65
        eh aug data len 0x1 bytes 0x1b=20
        bytes of initial instructions   3
        cie length                      16
        initial instructions
         0 DW_CFA_def_cfa r1 0
<    1> version                         1
        cie section offset              120 0x00000078
        augmentation                    zPLR
        code_alignment_factor           4
        data_alignment_factor           -8
        return_address_register         65
        eh aug data len 0xb bytes 0x94 00 00 00 00 00 01 04 c9 14 1b=20
        bytes of initial instructions   3
        cie length                      28
        initial instructions
         0 DW_CFA_def_cfa r1 0



In:

<    3><0x10000de0:0x10000e5c><g><cie offset 0x00000058::cie index     =
0><fde offset 0x00000054 length: 0x00000020>
       <eh aug data len 0x0>
        0x10000de0: <off cfa=3D00(r1) >=20
        0x10000de8: <off cfa=3D128(r1) >=20
        0x10000e14: <off cfa=3D00(r1) >=20
        0x10000e18: <off cfa=3D128(r1) >=20
        0x10000e1c: <off cfa=3D128(r1) > <off r65=3Dr0 >=20
        0x10000e24: <off cfa=3D128(r1) > <off r65=3D16(cfa) >=20

The last 3 128's are from the DW_CFA_restore_state
from the sequence:

         1 DW_CFA_def_cfa_offset 128
. . .
         5 DW_CFA_remember_state
. . .
         9 DW_CFA_restore_state

But with /lib/libgcc_s.so.1 the 128 is not saved and
restored, leaving default 0's in place instead. And
use of the wrong stack addresses results, which in
turn prevents the stack from unwinding past g()'s
frame.

[Note: For FreeBSD on powerpc64 r1 is the stack-pointer.]

The code described by the:
<    3><0x10000de0:0x10000e5c><g> . . .
is as follows. Note the stdu r1,-128(r1) and the
addi r1,r1,128 and what code only used via
bne cr7,0x10000e18 <g()+56> and that it has
the stdu r1,-128(r1) prior context, not
addi r1,r1,128:

(gdb) disass g
Dump of assembler code for function g():
   0x0000000010000de0 <+0>:	nop
   0x0000000010000de4 <+4>:	stdu    r1,-128(r1)
   0x0000000010000de8 <+8>:	lwz     r9,-32536(r2)
   0x0000000010000dec <+12>:	cmpdi   cr7,r9,0
   0x0000000010000df0 <+16>:	bne     cr7,0x10000e18 <g()+56>
   0x0000000010000df4 <+20>:	li      r9,97
   0x0000000010000df8 <+24>:	nop
   0x0000000010000dfc <+28>:	stb     r9,112(r1)
   0x0000000010000e00 <+32>:	lwz     r9,-32536(r2)
   0x0000000010000e04 <+36>:	addi    r9,r9,1
   0x0000000010000e08 <+40>:	stw     r9,-32536(r2)
   0x0000000010000e0c <+44>:	lbz     r9,112(r1)
   0x0000000010000e10 <+48>:	addi    r1,r1,128
   0x0000000010000e14 <+52>:	blr
   0x0000000010000e18 <+56>:	mflr    r0
   0x0000000010000e1c <+60>:	li      r3,8
   0x0000000010000e20 <+64>:	std     r0,144(r1)
   0x0000000010000e24 <+68>:	bl      0x100007a0 =
<0000004b.plt_call.__cxa_allocate_exception@@CXXABI_1.3>
   0x0000000010000e28 <+72>:	ld      r2,40(r1)
   0x0000000010000e2c <+76>:	nop
   0x0000000010000e30 <+80>:	nop
   0x0000000010000e34 <+84>:	ld      r9,-32720(r2)
   0x0000000010000e38 <+88>:	ld      r5,-32712(r2)
   0x0000000010000e3c <+92>:	nop
   0x0000000010000e40 <+96>:	ld      r4,-32704(r2)
   0x0000000010000e44 <+100>:	std     r9,0(r3)
   0x0000000010000e48 <+104>:	bl      0x10000820 =
<0000004b.plt_call.__cxa_throw@@CXXABI_1.3>
   0x0000000010000e4c <+108>:	ld      r2,40(r1)
   0x0000000010000e50 <+112>:	.long 0x0
   0x0000000010000e54 <+116>:	.long 0x90001
   0x0000000010000e58 <+120>:	lwz     r0,0(0)

[Note: more than the 128's might not be handled right
for more general code, but the example only shows the
128's issue (i.e., the cfa_offset mishandling issue).]



I'll note that throw_exception in /lib/libgcc_s.so.1
has the same sort of machine-code structure as g
relative to cfa_offset's and that, without a workaround
to avoid that structure being generated, all thrown C++
exceptions fail by _Unwind_RaiseException being stuck
in a loop for powerpc64.

In order to test the simple program I used the workaround:

# svnlite diff /usr/src/contrib/libcxxrt/
Index: /usr/src/contrib/libcxxrt/exception.cc
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- /usr/src/contrib/libcxxrt/exception.cc	(revision 339076)
+++ /usr/src/contrib/libcxxrt/exception.cc	(working copy)
@@ -772,10 +772,71 @@
	info->globals.uncaughtExceptions++;

	_Unwind_Reason_Code err =3D =
_Unwind_RaiseException(&ex->unwindHeader);
+#if !defined(__powerpc64__) && !defined(__ppc64__)
	// The _Unwind_RaiseException() function should not return, it =
should
	// unwind the stack past this function.  If it does return, then =
something
	// has gone wrong.
	report_failure(err, ex);
+#else
+// NOTE: Only tested for devel/powerpc64-gcc based buildworld
+//       because clang still silently ignores
+//       __builtin_eh_return(offset,handler) for powerpc64
+//       (and powerpc), thus not generating correct output.
+//
+// NOTE: I've no clue if other archtiectures might have
+//       analogous issues to powerpc64. I'm not sure
+//       about powerpc because of it still being stuck
+//       at gcc 4.2.1 . (clang problems and no devel/powerpc-gcc .)
+//
+// The above/normal code produced the following sort of structure
+// for throw_exception. r1 is the stack pointer, note its adjustments
+// via stdu r1,-128(r1) and via addi r1,r1,128 .
+//
+// <throw_exception+0>:	mflr    r0
+// <throw_exception+4>:	std     r31,-8(r1)
+// <throw_exception+8>:	mr      r31,r3
+// <throw_exception+12>:	std     r0,16(r1)
+// <throw_exception+16>:	stdu    r1,-128(r1)
+// . . .
+// <throw_exception+140>:	bl      =
<00000018.plt_call._Unwind_RaiseException@@GCC_3.0>
+// <throw_exception+144>:	ld      r2,40(r1)
+// <throw_exception+148>:	addi    r1,r1,128
+// <throw_exception+152>:	mr      r4,r31
+// <throw_exception+156>:	ld      r0,16(r1)
+// <throw_exception+160>:	ld      r31,-8(r1)
+// <throw_exception+164>:	mtlr    r0
+// <throw_exception+168>:	b       <report_failure>
+//
+// The loop in __Unwind_RaiseException had its "fs"
+// used with uw_frame_state_for and uw_update_context get
+// stuck with the pc field having the address for
+// throw_exception+152 (just after the stack adjustment
+// addi r1,r1,128). Effectively, throw_exception unwinds
+// its stack use before calling report_failure in a
+// way that throw_exception is no longer on the stack.
+// The exception unwinding logic did not handle this
+// correctly and got stuck looping.
+//
+// The below avoids having any such stack adjustment here
+// by avoiding the report_failure call and directly doing
+// what case _URC_END_OF_STACK in report_failure does for
+// its first couple of lines. (It is also the kind of
+// thing that src/contrib/libstdc++/libsupc++/eh_throw.cc
+// has in its __cxxabiv1::__cxa_throw after the
+// _Unwind_RaiseException call.)
+//
+// Another option could be to turn report_failure into
+// a macro so that no subroutine call could be involved.
+// That should avoid the early stack pointer kadjsutment.
+//
+// Also: For the other archtiectures that I looked at, no
+//       such stack adjsutments were involved in the code
+//       generated (or the matching dwarfdump output).
+//       But I did not look at many.
+
+	__cxa_begin_catch (&(ex->unwindHeader));
+	std::terminate();
+#endif
}



=3D=3D=3D
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4D444DB3-A472-42BC-973E-3E468C07757B>