Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 27 Jan 2025 16:34:22 +0100
From:      Dimitry Andric <dim@FreeBSD.org>
To:        Steve Kargl <sgk@troutmask.apl.washington.edu>
Cc:        Konstantin Belousov <kostikbel@gmail.com>, freebsd-hackers@freebsd.org
Subject:   Re: gcc14 static linking ends with segfault
Message-ID:  <C09215C1-ABF0-4248-A69A-F0137BBC7E2B@FreeBSD.org>
In-Reply-To: <Z5cnYyWxcT5pk1Wf@troutmask.apl.washington.edu>
References:  <Z5aYNdVQdGdVImBG@troutmask.apl.washington.edu> <Z5cRe8tivqpgme1I@kib.kiev.ua> <Z5cnYyWxcT5pk1Wf@troutmask.apl.washington.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On 27 Jan 2025, at 07:27, Steve Kargl <sgk@troutmask.apl.washington.edu> =
wrote:
>=20
> On Mon, Jan 27, 2025 at 06:54:19AM +0200, Konstantin Belousov wrote:
>> On Sun, Jan 26, 2025 at 12:16:53PM -0800, Steve Kargl wrote:
>>> In replacing an ancient system with new I re-installed all ports
>>> including lang/gcc14 of FreeBSD-current.  -current is 2 day old
>>> sources.
>>>=20
>>> Consider,
>>>=20
>>> #include <stdio.h>
>>> #include <stdlib.h>
>>> #include "mpfr.h"
>>>=20
>>> int
>>> main(void)
>>> {
>>>   mpfr_t pi;
>>>   mpfr_inits2(512, pi, NULL);
>>>   mpfr_const_pi(pi, MPFR_RNDN);
>>>   mpfr_printf("pi =3D %25.20Rf\n", pi);
>>> // A conscientious programmer cleans up after themself,
>>> // but on exit the system should take care of memory.
>>> //   mpfr_clears(pi, NULL);
>>>   return (0);
>>> }
>>>=20
>>> % gcc14 -o z -O2 -I/usr/local/include a.c -L/usr/local/lib -lmpfr =
-lgmp
>>> % ./z
>>> pi =3D    3.14159265358979323846
>>>=20
>>> All seems to work with shared linking.
>>>=20
>>> The following used to work.
>>>=20
>>> % gcc14 -o z -O2 -I/usr/local/include a.c -L/usr/local/lib -lmpfr =
-lgmp \=20
>>>        -static
>>> % ./z
>>> pi =3D    3.14159265358979323846
>>> Segmentation fault (core dumped)
>>>=20
>>> % gdb151 ./z z.core=20
>>> ...
>>> #0  0x0000000000427ba5 in __gmpn_mul_1 ()
>>> (gdb) bt
>>> #0  0x0000000000427ba5 in __gmpn_mul_1 ()
>>> #1  0x000000000040051f in __do_global_dtors_aux ()
>>>    at /usr/src/lib/csu/common/crtbegin.c:83
>>> #2  0x00000000004bc165 in _fini ()
>>> #3  0x0000000000458a7f in __cxa_finalize (dso=3Ddso@entry=3D0x0)
>>>    at /usr/src/lib/libc/stdlib/atexit.c:234
>>> #4  0x0000000000458b70 in exit (status=3D0) at =
/usr/src/lib/libc/stdlib/exit.c:89
>>> #5  0x00000000004483d9 in __libc_start1 (argc=3D1, argv=3D0x820a52900,=
=20
>>>    env=3D0x820a52910, cleanup=3D<optimized out>, mainX=3D0x400480 =
<main>)
>>>    at /usr/src/lib/libc/csu/libc_start1.c:172
>>> #6  0x00000000004004f0 in _start () at =
/usr/src/lib/csu/amd64/crt1_s.S:83
>>>=20
>>> So, did someone break the startup files?
>> Why do you think that startup (crt) files are broken?
>> Note that they are involved in the trace above, but the lowest frame =
is
>> from gmp destructor, i.e. the problem formally happens in the gmp =
code.
>>=20
>> Perhaps try to rebuild gmp with debug info to get more information.
>=20
> You likely correct that its a gmp problem unmasked by the
> new hardware that I have.  Rebuilding gmp with debugging
> did not help :(
>=20
> (gdb) run
> ...
> Program received signal SIGSEGV, Segmentation fault.
> Address not mapped to object.
> __gmpn_sqr_basecase () at tmp-sqr_basecase.s:222
> warning: 222    tmp-sqr_basecase.s: No such file or directory
> (gdb) bt
> #0  __gmpn_sqr_basecase () at tmp-sqr_basecase.s:222
> #1  0x00000000004004df in __do_global_dtors_aux ()
>    at /usr/src/lib/csu/common/crtbegin.c:83
> #2  0x00000000004c8875 in _fini ()
> #3  0x000000000046518f in __cxa_finalize (dso=3Ddso@entry=3D0x0)
>    at /usr/src/lib/libc/stdlib/atexit.c:234
> #4  0x0000000000465280 in exit (status=3D0) at =
/usr/src/lib/libc/stdlib/exit.c:89
> #5  0x0000000000454ae9 in __libc_start1 (argc=3D1, =
argv=3D0x7fffffffe738,=20
>    env=3D0x7fffffffe748, cleanup=3D<optimized out>, mainX=3D0x400515 =
<main>)
>    at /usr/src/lib/libc/csu/libc_start1.c:172
> #6  0x00000000004004b0 in _start () at =
/usr/src/lib/csu/amd64/crt1_s.S:83
>=20
> It seems gmp's build infrastructure removes tmp files.=20

The sqr_basecase.s thing is a red herring. In fact, the whole mpfr/gmp
thing is a red herring. :) The actual problem is in the way gcc emits
the .dtors section:

  $ readelf --hex-dump=3D.dtors static-test-clang

  Hex dump of section '.dtors':
    0x004f3d30 ffffffff ffffffff 00000000 00000000 ................

  $ readelf --hex-dump=3D.dtors static-test-gcc=20

  Hex dump of section '.dtors':
    0x004efca8 ffffffff ffffffff                   ........

Our lib/csu/common/crtbegin.c's dtors handler starts at index 1,
however:

    69	static void
    70	__do_global_dtors_aux(void)
    71	{
    72		crt_func fn;
    73		int n;
    74=09
    75	#ifdef SHARED
    76		run_cxa_finalize();
    77	#endif
    78=09
    79		for (n =3D 1;; n++) {
    80			fn =3D __DTOR_LIST__[n];
    81			if (fn =3D=3D (crt_func)0 || fn =3D=3D =
(crt_func)-1)
    82				break;
    83			fn();
    84		}
    85	}

Because it doesn't check the section length, and expects the table to be
terminated with a 0 or -1, it goes off the rails and ends up calling
random function pointers after it!

In my static binary, compiled with gcc and linked with BFD ld, the
.dtors section is followed by an empty .jcr section (so it doesn't
matter), and then .data.rel.ro:

  Section Headers:
    [Nr] Name
         Type            Address          Off    Size   ES   Lk Inf Al
         Flags
  ...
    [14] .dtors
         PROGBITS        00000000004efca8 0eeca8 000008 00   0   0  8
         [0000000000000003]: WRITE, ALLOC
    [15] .jcr
         PROGBITS        00000000004efcb0 0eecb0 000000 00   0   0  8
         [0000000000000003]: WRITE, ALLOC
    [16] .data.rel.ro
         PROGBITS        00000000004efcb0 0eecb0 000088 00   0   0  8
         [0000000000000003]: WRITE, ALLOC

the latter of which contains:

  Hex dump of section '.data.rel.ro':
    0x004efcb0 0c9d4200 00000000 299c4200 00000000 ..B.....).B.....
    0x004efcc0 3b9c4200 00000000 869c4200 00000000 ;.B.......B.....
    0x004efcd0 e79d4200 00000000 9b9e4200 00000000 ..B.......B.....
    0x004efce0 9e9f4200 00000000 4ba04200 00000000 ..B.....K.B.....
    0x004efcf0 39354400 00000000 fb344400 00000000 95D......4D.....
    0x004efd00 2a354400 00000000 c8344400 00000000 *5D......4D.....
    0x004efd10 e8344400 00000000 17354400 00000000 .4D......5D.....
    0x004efd20 b5344400 00000000 d5344400 00000000 .4D......4D.....
    0x004efd30 04354400 00000000                   .5D.....

and indeed it is calling 0x429d0c, which happens to be a relocation into
the guts of sqr_basecase():

  Program received signal SIGSEGV, Segmentation fault.
  Address not mapped to object.
  0x0000000000429d0c in __gmpn_sqr_basecase ()

Going back to the bad .dtors table, we can see that in FreeBSD's
crtbegin.c we have:

  static crt_func __CTOR_LIST__[] __section(".ctors") __used =3D {
          (crt_func)-1
  };
  static crt_func __DTOR_LIST__[] __section(".dtors") __used =3D {
          (crt_func)-1
  };

with corresponding entries in crtend.c:

  static crt_func __CTOR_END__[] __section(".ctors") __used =3D {
          (crt_func)0
  };
  static crt_func __DTOR_END__[] __section(".dtors") __used =3D {
          (crt_func)0
  };

The linker merges these together, effectively forming the "ffffffff
ffffffff 00000000 00000000" block mentioned earlier.

But for some reason, when gcc links a static executable, it uses
FreeBSD's crtbeginT.o (which is byte-identical to crtbegin.o), but
_libgcc_'s crtend.o:

  gcc -v -static static-test.c -o static-test -lmpfr -lgmp
  ...
  =
/usr/local/libexec/gcc13/gcc/x86_64-portbld-freebsd15.0/13.3.0/collect2 =
\
    -plugin =
/usr/local/libexec/gcc13/gcc/x86_64-portbld-freebsd15.0/13.3.0/liblto_plug=
in.so \
    =
-plugin-opt=3D/usr/local/libexec/gcc13/gcc/x86_64-portbld-freebsd15.0/13.3=
.0/lto-wrapper \
    -plugin-opt=3D-fresolution=3D/tmp/cc00aXNF.res \
    -plugin-opt=3D-pass-through=3D-lgcc \
    -plugin-opt=3D-pass-through=3D-lgcc_eh \
    -plugin-opt=3D-pass-through=3D-lc \
    -plugin-opt=3D-pass-through=3D-lgcc \
    -plugin-opt=3D-pass-through=3D-lgcc_eh \
    -m  elf_x86_64_fbsd \
    -V \
    -Bstatic \
    -o static-test \
    /usr/lib/crt1.o \
    /usr/lib/crti.o \
    /usr/lib/crtbeginT.o \
    -L/usr/local/lib/gcc13/gcc/x86_64-portbld-freebsd15.0/13.3.0 \
    =
-L/usr/local/lib/gcc13/gcc/x86_64-portbld-freebsd15.0/13.3.0/../../../../.=
./x86_64-portbld-freebsd15.0/lib \
    =
-L/usr/local/lib/gcc13/gcc/x86_64-portbld-freebsd15.0/13.3.0/../../.. \
    /tmp/ccT2nihc.o \
    -lmpfr \
    -lgmp \
    -lgcc \
    -lgcc_eh \
    -lc \
    -lgcc \
    -lgcc_eh \
    /usr/local/lib/gcc13/gcc/x86_64-portbld-freebsd15.0/13.3.0/crtend.o =
\
    /usr/lib/crtn.o

The problem is that libgcc's crtend.o does _not_ contain .ctors or
.dtors sections at all, resulting in the "ffffffff ffffffff" block.

During gcc's configure phase, I see:

  checking for .preinit_array/.init_array/.fini_array support... yes

so initfini_array support is then enabled.

In libgcc's crtstuff.c, which is used to generate crtbegin.o and
crtend.o, the definitions of .ctors and .dtors are all conditional on
#ifndef USE_INITFINI_ARRAY. This is why gcc's crtbegin.o and crtend.o=20
only have .init and .fini sections, but no .ctors or .dtors.

Summarizing, I think that it is weird that gcc doesn't use its own
crtbegin object, at least for static linking. There may be some
historical reason for it, but it should then also not use its own crtend
object!

Another issue is how our __do_global_[cd]tors_aux() functions handle
improperly terminated sections. We could try to be more robust and cope
with it, or at least abort with a proper error message.

-Dimitry




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?C09215C1-ABF0-4248-A69A-F0137BBC7E2B>