Date: Sun, 14 May 2017 02:05:33 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Steve Kargl <sgk@troutmask.apl.washington.edu> Cc: numerics@freebsd.org, freebsd-hackers@freebsd.org Subject: Re: catrig[fl].c and inexact Message-ID: <20170514011600.D1038@besplex.bde.org> In-Reply-To: <20170513060803.GA84399@troutmask.apl.washington.edu> References: <20170512215654.GA82545@troutmask.apl.washington.edu> <20170513103208.M845@besplex.bde.org> <20170513060803.GA84399@troutmask.apl.washington.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 12 May 2017, Steve Kargl wrote: > On Sat, May 13, 2017 at 11:35:49AM +1000, Bruce Evans wrote: >> On Fri, 12 May 2017, Steve Kargl wrote: >> >>> ... >>> /usr/home/kargl/trunk/math/libm/msun/src/catrigl.c:56:45: note: expanded from >>> macro 'raise_inexact' >>> #define raise_inexact() do { volatile float junk = 1 + tiny; } while(0) >>> ^ >>> Grepping catrig.o for the variable 'junk' suggests that 'junk' is >>> optimized out (with at least -O2). It is a local variable, so should be and is allocated on the stack, so you will never find it using grep. The problem seems to be that all compilers generated the intended code, but clang warns anyway. >> Just another bug in clang. Volatile variables cannot be optimized out >> (if they are accessed). > > Does this depend on scope? 'junk' is local to the do {...} while(0); > construct. Can a compiler completely eliminate a do-nothing scoping > unit? I don't know C well enough to know. I do know what I have > observed in clang. The semantics of volatile, but as a practical matter standards shouldn't specify much and compilers should be very conservative. BTW, I recently noticed that volatile doesn't work right in bus space macros. Some reduce to *(volatile int *)var = val, where var is for memory mapped-i/o that takes 10000 times as long as normal memory to access. Compilers still unroll loops setting such variables. This is only a pessimization for space. >>> ... >>> @@ -315,7 +315,7 @@ casinh(double complex z) >>> return (z); >>> >>> /* All remaining cases are inexact. */ >>> - raise_inexact(); >>> + raise_inexact(new_y); >>> >>> if (ax < SQRT_6_EPSILON / 4 && ay < SQRT_6_EPSILON / 4) >>> return (z); >> >> Now it doesn't take compiler bugs to optimize it out, since new_y is not >> volatile, and a good compiler would optimize it out in all cases. > > I've yet to find a good compiler. They all seem to have bugs. > >> new_y >> is obviously unused before the early returns, so it doesn't need to be >> evalated before the returns as far as the compiler can see. Later, >> new_y is initialized indirectly, and the compiler can see that too (not >> so easily, so it can see that raise_inexact() has no effect except possibly >> for its side effect of raising inexact for 1 + tiny. > > The later call passes the address of new_y to the routine. How > can the compiler short of inlining the called routine know that > the value assigned to new_y isn't used? The compiler does full inlining even when you don't want it. Full analysis of the whole source file is fundamental for generating useful warnings with -Wunused. Without full analysis, the compiler would have to assume that new_y is used uninitialized and either suppress warnings for all variables that might be initialized indirectly (including via aliased pointers), or generate many bogus warnings that variables "might be" used uninitialized. Old compilers mostly did the latter, and we still see ocasional spurious warnings from gcc-4.2.1. Old compilers also have man pages in which this is partly documented. gcc-3.3.3(1) says that: - Wuninitialized is null without -O - Wuninitialized is never generated for volatile variables - Wuninitialized is not the default since gcc is not smart enough to handle it well gcc-4.2.1(1) says much the same, plus that -Wall implies -Wuninitialized. It setill says that the compiler is not smart, and doesn't seem to document improvements that make this warning reasonable as the default with -Wall. This is mostly because -O now implies -funit-at-a-time, which I usually don't want, but which gives the full analysis needed for -Wunitialized and -Wunused. I usually don't want this because: - it slows down compilation - it allows unwanted inlining - it allows unportable code. clang doesn't support -funit-at-a-time. >> The change might defeat the intent of the original code in another way. >> 'junk' is intentionally independent of other variables, so that there are >> no dependencies on it. If the compiler doesn't optimize away the assignment >> to new_y, then it is probably because it doesn't see that the assignment is >> dead, so there is a dependency. > > It may defeat the intent of the original code, but it seems that > the original code provokes undefined behavior. Defined, but perhaps not what is wanted. It is using -W flags that gives undefined behaviour. They are undefined by the C standard, and also undefined by compilers with stub man pages. >> Actually, we want the variable 'junk' to be optimized away. We only want >> the side effect of evaluating 1 + tiny. Compilers have bugs evaluating >> expressions like 1 + tiny, tiny*tiny and huge*huge, and we use assignments >> of the result to volatile variables in tens if not hundreds of places to >> try to work around compiler bugs. If that doesn't work here, then all the >> other places are probably broken too. The other places mostly use a static >> volatile, while this uses an auto volatile. 'tiny' is also volatile, as >> required for the standard magic. I planned to fix all this magic using >> macros like raise_inexact(). > > If you plan to fix the magic with raise_inexact, then please > test with a suite of compilers. AFAICT, clang is optimizing > out the code. I haven't written a testcase to demonstrate this > as I have other irons in the fire. I only tested with 4 compilers when I wrote it. Actually, we agreed not to worry about compiler bugs for setting fenv, especially for compilers with even more of them than gcc. libm only has the volatile hack needed to fix huge*huge for clang in some places (gcc evaluates huge*huge at run time but tiny*tiny at compile time, so libm has more volatile hacks for the latter). Not to mention hacks to remove extra precision for huge*huge and tiny*tiny. On i386 with i387, huge*huge doesn't overflow since it is evaluated in extra precision. The wrong result is returned and the wrong result is used if it is assigned to a variable that can hold the extra precision. Overflow only occurs if the variable is converted to float ot double, and STRICT_ASSIGN() or a volatile hack must be used for this to work around other compiler bugs (which are actually features, but not allowed by C standards). C11 and compiler non-support for C11 breaks this further. C11 adds the extra pessimization auns subtraction of value of requiring extra precision (and range) to be destroyed on function return. clang ignores this requirement. Newer gcc supports it under certain pessimal CFLAGS including -std=c11. Bruce.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170514011600.D1038>