Date: Mon, 30 Dec 1996 20:20:43 +1100 (EST) From: Julian Assange <proff@iq.org> To: hackers@freebsd.org Cc: rms@gnu.ai.mit.edu Subject: libmsun Message-ID: <199612300920.UAA18245@profane.iq.org>
next in thread | raw e-mail | index | archive | help
Having recently started a project that consumes libm() functions in great quantity, I thought I'd see what I could do to speed up FreeBSD's libm performance. Hope that there was room for improvement came from the posting of one researcher who had benchmarked the linux libm code (on a pentium) as twice as fast as FreeBSD's. Looking at the ieee libm (/usr/src/lib/libm/*) I discoverd that there was no FPU support, and no asm fast-pathing. Thinking, naturally enough, that this would more than easily account for all differences, I, somewhat painfully, ported the linux/glibc/dj libm to /usr/src/contrib and /usr/lib/libmdj Only then did I find out that, contrary to what one would intuit, libm is not used for /usr/lib/libm* but rather the SunPro fdlibm from /usr/src/lib/msun (perhaps it is time to nuke the old Berkeley one?). fdlibm does have i386/i387 support, though not as many functions are asm/fpu coded compared to the linux libm. If I notice a difference speed betwixt it and libmdj, then this is the obvious place for optimisation. The sun code is certainly much more pleasant to look at than the glib/linux/dj mess. Now, in terms of compile-time optimisation of fdlibm, we have: # There are two options in making libm at fdlibm compile time: # _IEEE_LIBM --- IEEE libm; smaller, and somewhat faster # _MULTI_LIBM --- Support multi-standard at runtime by # imposing wrapper functions defined in # fdlibm.h: # _IEEE_MODE -- IEEE # _XOPEN_MODE -- X/OPEN # _POSIX_MODE -- POSIX/ANSI # _SVID3_MODE -- SVID # # Here is how to set up CFLAGS to create the desired libm at # compile time: # # CFLAGS = -D_IEEE_LIBM ... IEEE libm (recommended) # CFLAGS = -D_SVID3_MODE ... Multi-standard supported # libm with SVID as the # default standard # CFLAGS = -D_XOPEN_MODE ... Multi-standard supported # libm with XOPEN as the # default standard # CFLAGS = -D_POSIX_MODE ... Multi-standard supported # libm with POSIX as the # default standard # CFLAGS = ... Multi-standard supported # libm with IEEE as the # default standard # CFLAGS+= -D_MULTI_LIBM -D_POSIX_MODE -D_IEEE_LIBM Do we really need MULTI and POSIX? The linux libm has the following specific optimisations applied: -ffast-math -mieee-fp Examining the GCC documentation, we find: `-ffast-math' This option allows GCC to violate some ANSI or IEEE rules and/or specifications in the interest of optimising code for speed. For example, it allows the compiler to assume arguments to the `sqrt' function are non-negative numbers and that no floating-point values are NaNs. This option should never be turned on by any `-O' option since it can result in incorrect output for programs which depend on an exact implementation of IEEE or ANSI rules/specifications for math functions. If the flow path in msun functions is such that NaNs/other invalid parameters are not used, then this optimisation would be a small win. However, we find that -ffast-math does more than this snippet of documentation would have us believe. `-mno-fancy-math-387' Some 387 emulators do not support the `sin', `cos' and `sqrt' instructions for the 387. Specify this option to avoid generating those instructions. This option is the default on FreeBSD. As of revision 2.6.1, these instructions are not generated unless you also use the `-ffast-math' switch. This is silly. The only way we can have gcc use its builtins for sin, cos and sqrt (possibly the three most used high-level fpu functions) is to ignore several error conditions. Clearly this isn't acceptable as a default for user code - however we can take precautions in libm for functions that use the gcc builtins and activate -mfancy-math-387 and -ffast-math for its compilation. `-mno-ieee-fp' `-mieee-fp' Control whether or not the compiler uses IEEE floating point comparisons. These handle correctly the case where the result of a comparison is unordered. ` This looks appropriate enough. I'm just trying to recall what an "unorded comparison" is. a>b : a=b? It is also possible the documentation is out of date. Someone care to check this? -Julian Assange (proff@iq.org)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199612300920.UAA18245>