FreeBSD Mail Archives

Date:      Fri, 14 Oct 2016 12:53:25 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Ed Maste <emaste@freebsd.org>
Cc:        src-committers@freebsd.org, svn-src-all@freebsd.org,  svn-src-head@freebsd.org
Subject:   Re: svn commit: r307231 - head/lib/libgcc_s
Message-ID:  <20161014113603.F1039@besplex.bde.org>
In-Reply-To: <201610131918.u9DJI0bX085695@repo.freebsd.org>

index | next in thread | previous in thread | raw e-mail

On Thu, 13 Oct 2016, Ed Maste wrote:

> Log:
>  libgcc_s: add libm dependencies from div{d,s,x}c3
>
>  compiler-rt's complex division support routines contain calls to
>  compiler builtins such as `__builtin_scalbnl`.  Unfortunately Clang
>  turns these back into a call to `scalbnl`.

gcc-4.2 has the same bug.

This causes problems in the implementation of libm (and other
libraries).  The implementation can never (but sometimes does) use
__builtin_foo() to get a possibly-better optimized or MD or
CFLAGS-dependent version, because when the compiler doesn't have a
better version it usually has the bug if of calling the library version
which calls just itself when it is misimplemented as the builtin.

The __has_builtin() macro is worse than useless for determining if the
builtin is better.  First, it doesn't really exist so is a dummy version
with value 0 on some compilers that have have some better builtins.
clang has the opposite problem -- it has squillions of builtins, but
most of them just call the standard function.  Next, even if the compiler
has a real builtin, there is no way except a benchmark to tell if it is
worth using.  The ones that are always worth using are usually used
automatically, but -ffreestanding and -fno-builtin often turns this off.

In libm, the most interesting builtin is __builtin_fma[fl](), but this
is unusable and not used.  fma*() even has a standard macro
FP_FAST_FMA[FL] to tell you if it is any good.  It is only any good if
it is in pure hardware, but libm hard-codes FP_FAST_FMAF = true and has
special a not-very-fast implementation for the float case.  On x86, it
takes later SSE and/or AVX to give fma*() in hardware, and unportable
CFLGS to use this hardware, and a compiler that supports this and the
__builtin_fma*() spelling to use the instruction (even clang is
excessively IEEE/C conformant on x86 -- it never turns x*y+z into
fma(x,y,z)).  So *fma*() is unusuable for efficiency in practice.  It
gives extra accuracy in some cases and is specified to do that, but
implementing this in software makes using it just a pessimization in
most cases.

>  For now link libm's C version of the required support routines.

Even libm doesn't use these in some cases.  i386 mostly uses asm
versions.  Hopefully the rt division routines don't need to be efficient
because they are rarely called.

>  Reviewed by:	ed
>  Sponsored by:	The FreeBSD Foundation
>  Differential Revision:	https://reviews.freebsd.org/D8190
>
> Modified:
>  head/lib/libgcc_s/Makefile
>
> Modified: head/lib/libgcc_s/Makefile
> ==============================================================================
> --- head/lib/libgcc_s/Makefile	Thu Oct 13 18:57:18 2016	(r307230)
> +++ head/lib/libgcc_s/Makefile	Thu Oct 13 19:18:00 2016	(r307231)
> @@ -11,4 +11,22 @@ VERSION_MAP=	${.CURDIR}/Version.map
> .include "../libcompiler_rt/Makefile.inc"
> .include "../libgcc_eh/Makefile.inc"
>
> +LIBCSRCDIR=	${SRCTOP}/lib/libc
> +LIBMSRCDIR=	${SRCTOP}/lib/msun/src
> +CFLAGS+=	-I${LIBCSRCDIR}/include -I${LIBCSRCDIR}/${MACHINE_CPUARCH}
> +CFLAGS+=	-I${LIBMSRCDIR}
> +.PATH:		${LIBMSRCDIR}
> +SRCS+=		s_fabs.c
> +SRCS+=		s_fabsf.c
> +SRCS+=		s_fabsl.c

The fabs functions cause a smaller set of problems for builtins:
- normally they are automatically inlined as a builtin if they are
   spelled fabs*()
- -ffreestanding turns this off, so rt might need to spell them
   __builtin_fabs*(), but on arches where they aren't real builtins
   the above is still needed
- i386 doesn't bother implementing these in asm since they are usually
   builtins
- the C implementations are good, but are often very badly optimized
   by compilers, due to problems with compilers not understanding
   load/store penalties for the current arch or the compile-time arch
   not matching the runtime arch
- the builtins have the same problem with arch mismatches.

> +SRCS+=		s_fmax.c
> +SRCS+=		s_fmaxf.c
> +SRCS+=		s_fmaxl.c

These are exotic functions which should rarely be used, especially in
portable code that doesn't need to be efficient.  They just give subtle
behaviour for NaNs.  I checked recently that the special builtins for
comparing possible NaNs are insignificantly faster than the generic
code which starts with isnan(), on x86 (this depends on isnan() being
a fast builtin).  The implementation of these functions is basically
to start with an inline C implementation of isnan().  This is likely
to be just slower than the natural max(x, y) code using a comparison,
after adding some isnan()s to the latter.  Division code should have
classified NaNs up front and never use these functions.

i386 doesn't bother to optimize these functions.  I think it can't
do any better than the C code using a builtin relop (x86 has special
relops that behave differently for NaNs, but IIRC these functions
treat NaNs too unusually for either the normal relop or a special
relop to work directly).

> +SRCS+=		s_logb.c
> +SRCS+=		s_logbf.c
> +SRCS+=		s_logbl.c

i386 does these in asm.  amd64 does only logbl() in asm.  These optimizations
are barely worth it, though they map directly to an x87 instruction and
this instruction is not slow.

> +SRCS+=		s_scalbn.c
> +SRCS+=		s_scalbnf.c
> +SRCS+=		s_scalbnl.c

Both i386 and amd64 do all of these in asm.  This is a dubious optimization.
The C versions are quite complicated and not very good.  libm knows this
and uses lots of inline expansions of core parts of these functions.  This
is much faster than calling the x86 MD versions too.

> +
> .include <bsd.lib.mk>

I think there are still namespace bugs.  scalbnl() is in the application
namespace for -ffreestanding.  There are similar bugs from calling mem*()
for struct copying.

libcompiler_rt.a now on amd64 now has the following namespace bugs:

                  U compilerrt_abort_impl
                  U fflush
                  U fprintf
                  U mprotect
                  U sysconf
                  U fmaxl
                  U logbl
                  U scalbnl
                  U logbf
                  U scalbnf
                  U logb
                  U scalbn
                  U abort

These are bugs since division must be available with -ffreestanding and
the freestanding library shouldn't have to reimplement it.

Bruce

home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20161014113603.F1039>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation