Date: Tue, 24 Jul 2018 16:19:11 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Steve Kargl <sgk@troutmask.apl.washington.edu> Cc: enh via freebsd-numerics <freebsd-numerics@freebsd.org> Subject: Re: fmod nan_mix usage Message-ID: <20180724155513.A819@besplex.bde.org> In-Reply-To: <20180723215928.GA98418@troutmask.apl.washington.edu> References: <CAJgzZopb_0fxM9jbVjUEZ0JPOfcrgeQo_Ki-afZ5aRNr38tKVg@mail.gmail.com> <20180723193418.GA66380@troutmask.apl.washington.edu> <20180724071036.O868@besplex.bde.org> <20180723215928.GA98418@troutmask.apl.washington.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 23 Jul 2018, Steve Kargl wrote: > On Tue, Jul 24, 2018 at 07:41:17AM +1000, Bruce Evans wrote: >> ... >> clang normally evaluates this at compile, so it doesn't test the libary. >> This is arguably a bug in clang, since it doesn't set the exception flags. >> #pragma FENV_ACCESS should control this, but it is hard to use and rarely >> works. ["This" is fmod*(3, 0).] >> The test data needs to be non-literal and perhaps even volatile to prevent >> the compiler evaluating it at compile time. > > Whoops. I should know better! I have -fno-builtins hardcoded > in my development trees and completely forgot about constant > folding. I just realised that testing should be done with all combinations of builtin flags, or at least global -fbuiltin and -fnon-builtin. clang might inline all fmod calls. clang recently started inlining all fmin and fmax calls, and the result is different than the library -- the library is careful to order -0.0 before +0.0, but clang doesn't distingish between these values so it produces one depending on the order of the args and other details. C99 footnote 192 explicitly says that the sloppy comparison is allowed, so this is only a quality of implementation bug. Both gcc and clang have always inlined fabs calls and have almost always inlined sqrt calls. For efficiency testing, I rename functions by copying their file and editing the file, and rebuild them with the CFLAGS being tested, so that the main part of the function is independent of the library including the CFLAGS that it was built with, and builtins. This only renames fabs and sqrt when testing these functions. The function call overhead for small functions like fabs is about 10 cycles on modern x86, except for long double precision it is about 30 cycles. For accuracy testing, it is the function that will normally be used that should usually be tested. This is the builtin if there is one, or the library function. However, the builtins should be turned off sometimes, to get an idea of what the non-builtin function will do with other compilers/ arches where it is not a builtin. Similarly for optimized MD versions. My efficiency tests usually turn off the x86-optimized versions. Non-i386 arches don't have so many optimized MD versions, so just testing the library functions on them finds most differences that don't show up for the optimized MD versions. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180724155513.A819>