Date: Fri, 14 Oct 2016 12:53:25 +1100 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Ed Maste <emaste@freebsd.org> Cc: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r307231 - head/lib/libgcc_s Message-ID: <20161014113603.F1039@besplex.bde.org> In-Reply-To: <201610131918.u9DJI0bX085695@repo.freebsd.org> References: <201610131918.u9DJI0bX085695@repo.freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 13 Oct 2016, Ed Maste wrote: > Log: > libgcc_s: add libm dependencies from div{d,s,x}c3 > > compiler-rt's complex division support routines contain calls to > compiler builtins such as `__builtin_scalbnl`. Unfortunately Clang > turns these back into a call to `scalbnl`. gcc-4.2 has the same bug. This causes problems in the implementation of libm (and other libraries). The implementation can never (but sometimes does) use __builtin_foo() to get a possibly-better optimized or MD or CFLAGS-dependent version, because when the compiler doesn't have a better version it usually has the bug if of calling the library version which calls just itself when it is misimplemented as the builtin. The __has_builtin() macro is worse than useless for determining if the builtin is better. First, it doesn't really exist so is a dummy version with value 0 on some compilers that have have some better builtins. clang has the opposite problem -- it has squillions of builtins, but most of them just call the standard function. Next, even if the compiler has a real builtin, there is no way except a benchmark to tell if it is worth using. The ones that are always worth using are usually used automatically, but -ffreestanding and -fno-builtin often turns this off. In libm, the most interesting builtin is __builtin_fma[fl](), but this is unusable and not used. fma*() even has a standard macro FP_FAST_FMA[FL] to tell you if it is any good. It is only any good if it is in pure hardware, but libm hard-codes FP_FAST_FMAF = true and has special a not-very-fast implementation for the float case. On x86, it takes later SSE and/or AVX to give fma*() in hardware, and unportable CFLGS to use this hardware, and a compiler that supports this and the __builtin_fma*() spelling to use the instruction (even clang is excessively IEEE/C conformant on x86 -- it never turns x*y+z into fma(x,y,z)). So *fma*() is unusuable for efficiency in practice. It gives extra accuracy in some cases and is specified to do that, but implementing this in software makes using it just a pessimization in most cases. > For now link libm's C version of the required support routines. Even libm doesn't use these in some cases. i386 mostly uses asm versions. Hopefully the rt division routines don't need to be efficient because they are rarely called. > Reviewed by: ed > Sponsored by: The FreeBSD Foundation > Differential Revision: https://reviews.freebsd.org/D8190 > > Modified: > head/lib/libgcc_s/Makefile > > Modified: head/lib/libgcc_s/Makefile > ============================================================================== > --- head/lib/libgcc_s/Makefile Thu Oct 13 18:57:18 2016 (r307230) > +++ head/lib/libgcc_s/Makefile Thu Oct 13 19:18:00 2016 (r307231) > @@ -11,4 +11,22 @@ VERSION_MAP= ${.CURDIR}/Version.map > .include "../libcompiler_rt/Makefile.inc" > .include "../libgcc_eh/Makefile.inc" > > +LIBCSRCDIR= ${SRCTOP}/lib/libc > +LIBMSRCDIR= ${SRCTOP}/lib/msun/src > +CFLAGS+= -I${LIBCSRCDIR}/include -I${LIBCSRCDIR}/${MACHINE_CPUARCH} > +CFLAGS+= -I${LIBMSRCDIR} > +.PATH: ${LIBMSRCDIR} > +SRCS+= s_fabs.c > +SRCS+= s_fabsf.c > +SRCS+= s_fabsl.c The fabs functions cause a smaller set of problems for builtins: - normally they are automatically inlined as a builtin if they are spelled fabs*() - -ffreestanding turns this off, so rt might need to spell them __builtin_fabs*(), but on arches where they aren't real builtins the above is still needed - i386 doesn't bother implementing these in asm since they are usually builtins - the C implementations are good, but are often very badly optimized by compilers, due to problems with compilers not understanding load/store penalties for the current arch or the compile-time arch not matching the runtime arch - the builtins have the same problem with arch mismatches. > +SRCS+= s_fmax.c > +SRCS+= s_fmaxf.c > +SRCS+= s_fmaxl.c These are exotic functions which should rarely be used, especially in portable code that doesn't need to be efficient. They just give subtle behaviour for NaNs. I checked recently that the special builtins for comparing possible NaNs are insignificantly faster than the generic code which starts with isnan(), on x86 (this depends on isnan() being a fast builtin). The implementation of these functions is basically to start with an inline C implementation of isnan(). This is likely to be just slower than the natural max(x, y) code using a comparison, after adding some isnan()s to the latter. Division code should have classified NaNs up front and never use these functions. i386 doesn't bother to optimize these functions. I think it can't do any better than the C code using a builtin relop (x86 has special relops that behave differently for NaNs, but IIRC these functions treat NaNs too unusually for either the normal relop or a special relop to work directly). > +SRCS+= s_logb.c > +SRCS+= s_logbf.c > +SRCS+= s_logbl.c i386 does these in asm. amd64 does only logbl() in asm. These optimizations are barely worth it, though they map directly to an x87 instruction and this instruction is not slow. > +SRCS+= s_scalbn.c > +SRCS+= s_scalbnf.c > +SRCS+= s_scalbnl.c Both i386 and amd64 do all of these in asm. This is a dubious optimization. The C versions are quite complicated and not very good. libm knows this and uses lots of inline expansions of core parts of these functions. This is much faster than calling the x86 MD versions too. > + > .include <bsd.lib.mk> I think there are still namespace bugs. scalbnl() is in the application namespace for -ffreestanding. There are similar bugs from calling mem*() for struct copying. libcompiler_rt.a now on amd64 now has the following namespace bugs: U compilerrt_abort_impl U fflush U fprintf U mprotect U sysconf U fmaxl U logbl U scalbnl U logbf U scalbnf U logb U scalbn U abort These are bugs since division must be available with -ffreestanding and the freestanding library shouldn't have to reimplement it. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20161014113603.F1039>