From owner-freebsd-numerics@freebsd.org Fri May 12 21:56:58 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 58BD6D69E31 for ; Fri, 12 May 2017 21:56:58 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id 45D31890 for ; Fri, 12 May 2017 21:56:58 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: by mailman.ysv.freebsd.org (Postfix) id 452B2D69E30; Fri, 12 May 2017 21:56:58 +0000 (UTC) Delivered-To: numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 42F02D69E2F; Fri, 12 May 2017 21:56:58 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 2FAE088E; Fri, 12 May 2017 21:56:55 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v4CLusWo082601 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Fri, 12 May 2017 14:56:54 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v4CLusWG082600; Fri, 12 May 2017 14:56:54 -0700 (PDT) (envelope-from sgk) Date: Fri, 12 May 2017 14:56:54 -0700 From: Steve Kargl To: numerics@freebsd.org, freebsd-hackers@freebsd.org Subject: catrig[fl].c and inexact Message-ID: <20170512215654.GA82545@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.7.2 (2016-11-26) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 May 2017 21:56:58 -0000 So, I've been making improvements to my implementations of the half-cycle trig functions. In doing so, I decide to add WARNS=2 to msun/Makefile. clang 4.0.0 dies with an error about an unused variable in raise_inexact() from catrig[fl].c. /usr/home/kargl/trunk/math/libm/msun/src/catrigl.c:195:2: error: unused variable 'junk' [-Werror,-Wunused-variable] raise_inexact(); ^ /usr/home/kargl/trunk/math/libm/msun/src/catrigl.c:56:45: note: expanded from macro 'raise_inexact' #define raise_inexact() do { volatile float junk = 1 + tiny; } while(0) ^ Grepping catrig.o for the variable 'junk' suggests that 'junk' is optimized out (with at least -O2). A quick and dirty patch to achieve the intent of the original code follows. It would be nice if some would like to commit the patch. Of course, you may want to wait for Bruce to review the diff. Index: src/catrig.c =================================================================== --- src/catrig.c (revision 1935) +++ src/catrig.c (working copy) @@ -37,7 +37,7 @@ __FBSDID("$FreeBSD: head/lib/msun/src/catrig.c 313863 #define isinf(x) (fabs(x) == INFINITY) #undef isnan #define isnan(x) ((x) != (x)) -#define raise_inexact() do { volatile float junk = 1 + tiny; } while(0) +#define raise_inexact(x) do { (x) = 1 + tiny; } while(0) #undef signbit #define signbit(x) (__builtin_signbit(x)) @@ -315,7 +315,7 @@ casinh(double complex z) return (z); /* All remaining cases are inexact. */ - raise_inexact(); + raise_inexact(new_y); if (ax < SQRT_6_EPSILON / 4 && ay < SQRT_6_EPSILON / 4) return (z); @@ -400,7 +400,7 @@ cacos(double complex z) return (CMPLX(0, -y)); /* All remaining cases are inexact. */ - raise_inexact(); + raise_inexact(new_x); if (ax < SQRT_6_EPSILON / 4 && ay < SQRT_6_EPSILON / 4) return (CMPLX(pio2_hi - (x - pio2_lo), -y)); @@ -607,7 +607,7 @@ catanh(double complex z) * inexact, but this is the only only that needs to do it * explicitly. */ - raise_inexact(); + raise_inexact(ax); return (z); } Index: src/catrigf.c =================================================================== --- src/catrigf.c (revision 1935) +++ src/catrigf.c (working copy) @@ -51,7 +51,7 @@ __FBSDID("$FreeBSD: head/lib/msun/src/catrigf.c 275819 #define isinf(x) (fabsf(x) == INFINITY) #undef isnan #define isnan(x) ((x) != (x)) -#define raise_inexact() do { volatile float junk = 1 + tiny; } while(0) +#define raise_inexact(x) do { (x) = 1 + tiny; } while(0) #undef signbit #define signbit(x) (__builtin_signbitf(x)) @@ -176,7 +176,7 @@ casinhf(float complex z) if (x == 0 && y == 0) return (z); - raise_inexact(); + raise_inexact(new_y); if (ax < SQRT_6_EPSILON / 4 && ay < SQRT_6_EPSILON / 4) return (z); @@ -234,7 +234,7 @@ cacosf(float complex z) if (x == 1 && y == 0) return (CMPLXF(0, -y)); - raise_inexact(); + raise_inexact(new_x); if (ax < SQRT_6_EPSILON / 4 && ay < SQRT_6_EPSILON / 4) return (CMPLXF(pio2_hi - (x - pio2_lo), -y)); @@ -365,7 +365,7 @@ catanhf(float complex z) copysignf(pio2_hi + pio2_lo, y))); if (ax < SQRT_3_EPSILON / 2 && ay < SQRT_3_EPSILON / 2) { - raise_inexact(); + raise_inexact(ax); return (z); } Index: src/catrigl.c =================================================================== --- src/catrigl.c (revision 1935) +++ src/catrigl.c (working copy) @@ -53,7 +53,7 @@ __FBSDID("$FreeBSD: head/lib/msun/src/catrigl.c 313761 #define isinf(x) (fabsl(x) == INFINITY) #undef isnan #define isnan(x) ((x) != (x)) -#define raise_inexact() do { volatile float junk = 1 + tiny; } while(0) +#define raise_inexact(x) do { (x) = 1 + tiny; } while(0) #undef signbit #define signbit(x) (__builtin_signbitl(x)) @@ -192,7 +192,7 @@ casinhl(long double complex z) if (x == 0 && y == 0) return (z); - raise_inexact(); + raise_inexact(new_y); if (ax < SQRT_6_EPSILON / 4 && ay < SQRT_6_EPSILON / 4) return (z); @@ -251,7 +251,7 @@ cacosl(long double complex z) if (x == 1 && y == 0) return (CMPLXL(0, -y)); - raise_inexact(); + raise_inexact(new_x); if (ax < SQRT_6_EPSILON / 4 && ay < SQRT_6_EPSILON / 4) return (CMPLXL(pio2_hi - (x - pio2_lo), -y)); @@ -383,7 +383,7 @@ catanhl(long double complex z) copysignl(pio2_hi + pio2_lo, y))); if (ax < SQRT_3_EPSILON / 2 && ay < SQRT_3_EPSILON / 2) { - raise_inexact(); + raise_inexact(ax); return (z); } -- Steve 20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4 20161221 https://www.youtube.com/watch?v=IbCHE-hONow From owner-freebsd-numerics@freebsd.org Sat May 13 02:02:32 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C3E49D6AD76 for ; Sat, 13 May 2017 02:02:32 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id B1C091F27 for ; Sat, 13 May 2017 02:02:32 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: by mailman.ysv.freebsd.org (Postfix) id B1231D6AD75; Sat, 13 May 2017 02:02:32 +0000 (UTC) Delivered-To: numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id AE746D6AD74; Sat, 13 May 2017 02:02:32 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail109.syd.optusnet.com.au (mail109.syd.optusnet.com.au [211.29.132.80]) by mx1.freebsd.org (Postfix) with ESMTP id 5D3A11F23; Sat, 13 May 2017 02:02:31 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c122-106-153-191.carlnfd1.nsw.optusnet.com.au [122.106.153.191]) by mail109.syd.optusnet.com.au (Postfix) with ESMTPS id 80712D654C9; Sat, 13 May 2017 11:35:54 +1000 (AEST) Date: Sat, 13 May 2017 11:35:49 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl cc: numerics@freebsd.org, freebsd-hackers@freebsd.org Subject: Re: catrig[fl].c and inexact In-Reply-To: <20170512215654.GA82545@troutmask.apl.washington.edu> Message-ID: <20170513103208.M845@besplex.bde.org> References: <20170512215654.GA82545@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=WvBbCZXv c=1 sm=1 tr=0 a=Tj3pCpwHnMupdyZSltBt7Q==:117 a=Tj3pCpwHnMupdyZSltBt7Q==:17 a=kj9zAlcOel0A:10 a=0K0djoc-qRL17_fbz0IA:9 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 13 May 2017 02:02:32 -0000 On Fri, 12 May 2017, Steve Kargl wrote: > So, I've been making improvements to my implementations of > the half-cycle trig functions. In doing so, I decide to > add WARNS=2 to msun/Makefile. clang 4.0.0 dies with an > error about an unused variable in raise_inexact() from > catrig[fl].c. > > /usr/home/kargl/trunk/math/libm/msun/src/catrigl.c:195:2: error: unused variable > 'junk' [-Werror,-Wunused-variable] > raise_inexact(); > ^ > /usr/home/kargl/trunk/math/libm/msun/src/catrigl.c:56:45: note: expanded from > macro 'raise_inexact' > #define raise_inexact() do { volatile float junk = 1 + tiny; } while(0) > ^ > Grepping catrig.o for the variable 'junk' suggests that 'junk' is > optimized out (with at least -O2). Just another bug in clang. Volatile variables cannot be optimized out (if they are accessed). > A quick and dirty patch to achieve the intent of the original > code follows. It would be nice if some would like to commit > the patch. Of course, you may want to wait for Bruce to > review the diff. > > Index: src/catrig.c > =================================================================== > --- src/catrig.c (revision 1935) > +++ src/catrig.c (working copy) > @@ -37,7 +37,7 @@ __FBSDID("$FreeBSD: head/lib/msun/src/catrig.c 313863 > #define isinf(x) (fabs(x) == INFINITY) > #undef isnan > #define isnan(x) ((x) != (x)) > -#define raise_inexact() do { volatile float junk = 1 + tiny; } while(0) > +#define raise_inexact(x) do { (x) = 1 + tiny; } while(0) > #undef signbit > #define signbit(x) (__builtin_signbit(x)) > > @@ -315,7 +315,7 @@ casinh(double complex z) > return (z); > > /* All remaining cases are inexact. */ > - raise_inexact(); > + raise_inexact(new_y); > > if (ax < SQRT_6_EPSILON / 4 && ay < SQRT_6_EPSILON / 4) > return (z); Now it doesn't take compiler bugs to optimize it out, since new_y is not volatile, and a good compiler would optimize it out in all cases. new_y is obviously unused before the early returns, so it doesn't need to be evalated before the returns as far as the compiler can see. Later, new_y is initialized indirectly, and the compiler can see that too (not so easily, so it can see that raise_inexact() has no effect except possibly for its side effect of raising inexact for 1 + tiny. The change might defeat the intent of the original code in another way. 'junk' is intentionally independent of other variables, so that there are no dependencies on it. If the compiler doesn't optimize away the assignment to new_y, then it is probably because it doesn't see that the assignment is dead, so there is a dependency. Actually, we want the variable 'junk' to be optimized away. We only want the side effect of evaluating 1 + tiny. Compilers have bugs evaluating expressions like 1 + tiny, tiny*tiny and huge*huge, and we use assignments of the result to volatile variables in tens if not hundreds of places to try to work around compiler bugs. If that doesn't work here, then all the other places are probably broken too. The other places mostly use a static volatile, while this uses an auto volatile. 'tiny' is also volatile, as required for the standard magic. I planned to fix all this magic using macros like raise_inexact(). Another subtety in the macro is that variable is float instead of double to possibly allow optimizations. Since the variable shouldn't be optimized away, it will waste sizeof(var) for each use of the macro. A file scope variable would work better here, but the macro is written to be self-contained to make it easier to use. The change also defeats that. Whether not evaluating 1 + tiny at compile time is a compiler bug is delicate. We don't have any C99 compilers yet, since gcc and clang don't support #pragma FENV_ACCESS ON/OFF. The pragma should be set to ON before the magic accesses, but we don't do that because it would be a lot of churn and we know that the pragma doesn't work. We more or less depend on the default state of the pragma being ON, but in gcc-4.2.1 it is documented as being OFF unless compiled with -frounding-math when it is documented as being ON, and for clang it is undocumented. -frounding-math is too inefficient to use by default, and another bug in clang is that it is not even supported. With macro or at least inline wrappers, the #pragma should only be needed in a few places. clang-3.9.0 seems to be only partly broken here. Volatile works correctly for v = huge*huge and also for v = 1+tiny provided v is static instead of auto. It also works to declare 'junk' as __unused. The following don't work with either clang-3.9.0 or gcc-4.2.1: - declaring 'junk' as __used (syntax error) - the expression 1+tiny not assigned to anything, or 1+tiny assigned to an __unused non-volatile variable. This gives the weird code of loading 'tiny' (because the compiler handles read accesses to volatile variables correctly), but not adding 1 (because the compiler doesn't know that adding 1 has a side effect, or is optimizing for FENV_ACCESS OFF). The following is documented to not work with gcc-4.2.1: - #pragma FENV_ACCESS ON. clang handles this correctly by warning that this is unsupported, but this makes it even more unusable. gcc-4.2.1 doesn't warn, so it is hard to tell if it worked. Bruce From owner-freebsd-numerics@freebsd.org Sat May 13 02:05:53 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C56D4D6AF8A for ; Sat, 13 May 2017 02:05:53 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id B33BAB7 for ; Sat, 13 May 2017 02:05:53 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: by mailman.ysv.freebsd.org (Postfix) id B2ACCD6AF89; Sat, 13 May 2017 02:05:53 +0000 (UTC) Delivered-To: numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B23B5D6AF88; Sat, 13 May 2017 02:05:53 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au [211.29.132.42]) by mx1.freebsd.org (Postfix) with ESMTP id 7E5D7B6; Sat, 13 May 2017 02:05:53 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c122-106-153-191.carlnfd1.nsw.optusnet.com.au [122.106.153.191]) by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id 847803CB4B9; Sat, 13 May 2017 11:44:45 +1000 (AEST) Date: Sat, 13 May 2017 11:44:41 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Bruce Evans cc: Steve Kargl , numerics@freebsd.org, freebsd-hackers@freebsd.org Subject: Re: catrig[fl].c and inexact In-Reply-To: <20170513103208.M845@besplex.bde.org> Message-ID: <20170513113852.M1045@besplex.bde.org> References: <20170512215654.GA82545@troutmask.apl.washington.edu> <20170513103208.M845@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=KeqiiUQD c=1 sm=1 tr=0 a=Tj3pCpwHnMupdyZSltBt7Q==:117 a=Tj3pCpwHnMupdyZSltBt7Q==:17 a=kj9zAlcOel0A:10 a=UZVNq-k9JjFpydrfkmMA:9 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 13 May 2017 02:05:53 -0000 On Sat, 13 May 2017, Bruce Evans wrote: > clang-3.9.0 seems to be only partly broken here. Volatile works correctly > for v = huge*huge and also for v = 1+tiny provided v is static instead of > auto. It also works to declare 'junk' as __unused. PS: only __unused on an auto volatile variable gives the intended but not quite wanted behaviour, by reminding the compiler than assignments to volatile variables are used, by spelling 'used' as __unused. This results in assigning to a variable on the stack in most cases, so there is no wastage of static space. Normal FP operations like this are usually the fastest way to set FP exception flags (50-100 times faster than an fenv access on i386). The only sub-optimal part is assigning the result to memory. Bruce From owner-freebsd-numerics@freebsd.org Sat May 13 06:08:05 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2BB8DD6A5A5 for ; Sat, 13 May 2017 06:08:05 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 1693818B3 for ; Sat, 13 May 2017 06:08:05 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: by mailman.ysv.freebsd.org (Postfix) id 15E7DD6A5A4; Sat, 13 May 2017 06:08:05 +0000 (UTC) Delivered-To: numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 13B43D6A5A3; Sat, 13 May 2017 06:08:05 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id F2C3218B2; Sat, 13 May 2017 06:08:04 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v4D6839N084468 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Fri, 12 May 2017 23:08:03 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v4D683TW084467; Fri, 12 May 2017 23:08:03 -0700 (PDT) (envelope-from sgk) Date: Fri, 12 May 2017 23:08:03 -0700 From: Steve Kargl To: Bruce Evans Cc: numerics@freebsd.org, freebsd-hackers@freebsd.org Subject: Re: catrig[fl].c and inexact Message-ID: <20170513060803.GA84399@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu References: <20170512215654.GA82545@troutmask.apl.washington.edu> <20170513103208.M845@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170513103208.M845@besplex.bde.org> User-Agent: Mutt/1.7.2 (2016-11-26) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 13 May 2017 06:08:05 -0000 On Sat, May 13, 2017 at 11:35:49AM +1000, Bruce Evans wrote: > On Fri, 12 May 2017, Steve Kargl wrote: > > > So, I've been making improvements to my implementations of > > the half-cycle trig functions. In doing so, I decide to > > add WARNS=2 to msun/Makefile. clang 4.0.0 dies with an > > error about an unused variable in raise_inexact() from > > catrig[fl].c. > > > > /usr/home/kargl/trunk/math/libm/msun/src/catrigl.c:195:2: error: unused variable > > 'junk' [-Werror,-Wunused-variable] > > raise_inexact(); > > ^ > > /usr/home/kargl/trunk/math/libm/msun/src/catrigl.c:56:45: note: expanded from > > macro 'raise_inexact' > > #define raise_inexact() do { volatile float junk = 1 + tiny; } while(0) > > ^ > > Grepping catrig.o for the variable 'junk' suggests that 'junk' is > > optimized out (with at least -O2). > > Just another bug in clang. Volatile variables cannot be optimized out > (if they are accessed). Does this depend on scope? 'junk' is local to the do {...} while(0); construct. Can a compiler completely eliminate a do-nothing scoping unit? I don't know C well enough to know. I do know what I have observed in clang. > > A quick and dirty patch to achieve the intent of the original > > code follows. It would be nice if some would like to commit > > the patch. Of course, you may want to wait for Bruce to > > review the diff. > > > > Index: src/catrig.c > > =================================================================== > > --- src/catrig.c (revision 1935) > > +++ src/catrig.c (working copy) > > @@ -37,7 +37,7 @@ __FBSDID("$FreeBSD: head/lib/msun/src/catrig.c 313863 > > #define isinf(x) (fabs(x) == INFINITY) > > #undef isnan > > #define isnan(x) ((x) != (x)) > > -#define raise_inexact() do { volatile float junk = 1 + tiny; } while(0) > > +#define raise_inexact(x) do { (x) = 1 + tiny; } while(0) > > #undef signbit > > #define signbit(x) (__builtin_signbit(x)) > > > > @@ -315,7 +315,7 @@ casinh(double complex z) > > return (z); > > > > /* All remaining cases are inexact. */ > > - raise_inexact(); > > + raise_inexact(new_y); > > > > if (ax < SQRT_6_EPSILON / 4 && ay < SQRT_6_EPSILON / 4) > > return (z); > > Now it doesn't take compiler bugs to optimize it out, since new_y is not > volatile, and a good compiler would optimize it out in all cases. I've yet to find a good compiler. They all seem to have bugs. > new_y > is obviously unused before the early returns, so it doesn't need to be > evalated before the returns as far as the compiler can see. Later, > new_y is initialized indirectly, and the compiler can see that too (not > so easily, so it can see that raise_inexact() has no effect except possibly > for its side effect of raising inexact for 1 + tiny. The later call passes the address of new_y to the routine. How can the compiler short of inlining the called routine know that the value assigned to new_y isn't used? > The change might defeat the intent of the original code in another way. > 'junk' is intentionally independent of other variables, so that there are > no dependencies on it. If the compiler doesn't optimize away the assignment > to new_y, then it is probably because it doesn't see that the assignment is > dead, so there is a dependency. It may defeat the intent of the original code, but it seems that the original code provokes undefined behavior. > Actually, we want the variable 'junk' to be optimized away. We only want > the side effect of evaluating 1 + tiny. Compilers have bugs evaluating > expressions like 1 + tiny, tiny*tiny and huge*huge, and we use assignments > of the result to volatile variables in tens if not hundreds of places to > try to work around compiler bugs. If that doesn't work here, then all the > other places are probably broken too. The other places mostly use a static > volatile, while this uses an auto volatile. 'tiny' is also volatile, as > required for the standard magic. I planned to fix all this magic using > macros like raise_inexact(). If you plan to fix the magic with raise_inexact, then please test with a suite of compilers. AFAICT, clang is optimizing out the code. I haven't written a testcase to demonstrate this as I have other irons in the fire. -- Steve 20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4 20161221 https://www.youtube.com/watch?v=IbCHE-hONow From owner-freebsd-numerics@freebsd.org Sat May 13 10:40:43 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 814C5D696D9 for ; Sat, 13 May 2017 10:40:43 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id 68BAD11B1 for ; Sat, 13 May 2017 10:40:43 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: by mailman.ysv.freebsd.org (Postfix) id 682CAD696D7; Sat, 13 May 2017 10:40:43 +0000 (UTC) Delivered-To: numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 67D78D696D6 for ; Sat, 13 May 2017 10:40:43 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-8.reflexion.net [208.70.210.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 2CB5311AE for ; Sat, 13 May 2017 10:40:42 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 4544 invoked from network); 13 May 2017 10:40:41 -0000 Received: from unknown (HELO mail-cs-02.app.dca.reflexion.local) (10.81.19.2) by 0 (rfx-qmail) with SMTP; 13 May 2017 10:40:41 -0000 Received: by mail-cs-02.app.dca.reflexion.local (Reflexion email security v8.40.0) with SMTP; Sat, 13 May 2017 06:40:41 -0400 (EDT) Received: (qmail 18264 invoked from network); 13 May 2017 10:40:40 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 13 May 2017 10:40:40 -0000 Received: from [192.168.1.106] (c-76-115-7-162.hsd1.or.comcast.net [76.115.7.162]) by iron2.pdx.net (Postfix) with ESMTPSA id 29F9BEC8697; Sat, 13 May 2017 03:40:40 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: catrig[fl].c and inexact From: Mark Millard In-Reply-To: <20170513060803.GA84399@troutmask.apl.washington.edu> Date: Sat, 13 May 2017 03:40:39 -0700 Cc: Bruce Evans , freebsd-hackers@freebsd.org, numerics@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: References: <20170512215654.GA82545@troutmask.apl.washington.edu> <20170513103208.M845@besplex.bde.org> <20170513060803.GA84399@troutmask.apl.washington.edu> To: sgk@troutmask.apl.washington.edu X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 13 May 2017 10:40:43 -0000 On 2017-May-12, at 11:08 PM, Steve Kargl wrote: > On Sat, May 13, 2017 at 11:35:49AM +1000, Bruce Evans wrote: >> On Fri, 12 May 2017, Steve Kargl wrote: >>=20 >>> So, I've been making improvements to my implementations of >>> the half-cycle trig functions. In doing so, I decide to >>> add WARNS=3D2 to msun/Makefile. clang 4.0.0 dies with an >>> error about an unused variable in raise_inexact() from >>> catrig[fl].c. >>>=20 >>> /usr/home/kargl/trunk/math/libm/msun/src/catrigl.c:195:2: error: = unused variable >>> 'junk' [-Werror,-Wunused-variable] >>> raise_inexact(); >>> ^ >>> /usr/home/kargl/trunk/math/libm/msun/src/catrigl.c:56:45: note: = expanded from >>> macro 'raise_inexact' >>> #define raise_inexact() do { volatile float junk =3D 1 + tiny; } = while(0) >>> ^ >>> Grepping catrig.o for the variable 'junk' suggests that 'junk' is >>> optimized out (with at least -O2). >>=20 >> Just another bug in clang. Volatile variables cannot be optimized = out >> (if they are accessed). >=20 > Does this depend on scope? 'junk' is local to the do {...} while(0); > construct. Can a compiler completely eliminate a do-nothing scoping > unit? I don't know C well enough to know. I do know what I have > observed in clang. [This note ignores other standards than C99/C11 that might place other constraints. And I've done no checking of compiler results, I've just looked at a couple of the C standards.] Note: I've not looking to tiny's declaration. It may contribute in a way not covered below. Unfortunately the declarator in an init-declarator that has an initializer is not part of an expression. The rules for volatile are tied to uses in expressions, not to the declarator. (Which is a hole in the language definition as far as I can tell.) There is one part of the wording that might mitigate this, tied to a full declarator having a sequence point at its end despite the declarator itself not being an expression, even if its initializer is one. There is another wording detail that might as well. Still, overall it would seem safer to be sure there is an expression that references the volatile object, not having only its declarator. But I would not take even that as a guarantee under the C standards. It may seem a silly difference but: do { volatile float junk=3D1; junk+=3Dtiny; } while(0) may well be a better way of writing the "must evaluate" part of the intent simply because junk is used in an expression. Also it has both read and write access, so is a little more "used". The sequence point before the assignment can help avoid compile-time evaluation as well. Details if you care. . . I used the C99 and C11 definitions here, I reference C11 section numbering but C99 agrees as I remember. 5.1.2.3 Program execution says: "Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment. Evaluation of an expression may produce side effects." Note that raising inexact does not fit in the definition of side effect as far as I can tell. So a compiler need not consider such a thing for side-effect issues if I understand right. [C11 specific wording:] "The presence of a sequence point between the evaluations of expressions A and B implies that every value computation and side effect associated with A is sequenced before every value compuation and side effect associated with B." [C99 is similar but is before the detailed "sequenced before" definition.] "An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object)." Can a accessing a volatile object ever be classified as having "no needed side effects"? More on this later. [Remember what "side effect" excludes, as noted earlier. So some consequences need not be considered by the compiler, all in the name of optimizations.] 6.7.3 Type Qualifiers says: "An object that has volatile-qualified type . . . Therefore any expression referring to such as object shall be evaluated strictly according to the rules of the abstract machine, as described in 5.1.2.3. Furthermore, at every sequence point the value last stored in the object shall agree with that prescribed by the abstract machine, except as modified by the unknown factors mentioned previously. What constitutes an access to an object that has volatile-qualified type is implementation-defined." This part is mixed: what the sequence point wording giveth the last sentence taketh away. (More later.) It also says in a note (134): "A volatile declaration may be used to describe an object corresponding to a memory-mapped input/output port or an object accessed by an asynchronously interrupting function. Actions on objects so declared shall, not be "optimized out" by an implementation or reordered except as permitted by the rules for evaluating expressions." Since rules for evaluating expressions are not rules for declarators (vs. initializers), this could be read as not allowing the "optimize out". (But the abstract machine's description is not explicit about declarators for such issues.) The C99 Rationale: The C99 Rationale was explicit about static volatile for a memory mapped I/O register, static const volatile for a memory mapped input port, const volatile and volatile for variables shared across processes. To some extent this identifies examples of contexts with "needed side effects" that have hardware details to take into account. For taking into account hardware details: ". . . Whatever decision are adopted on such issues must be documented, as volatile access is implementation-defined". For volatile use with no explicitly identified hardware details: volatile would appear to be no more than a potential hint for such a context, not an effective requirement. The implementation-defined status could allow lack of access. Overall, based on what I see in the C99 and C11 language definitions, I'd not be willing to declare clang wrong (if it did optimize out junk), even with my alternative formulation. C does not have an explicit Principle of Least Astonishment as a official guideline to its interpretation and the rules are very biased to allowing so-called optimizations. "junk" does not fit with being shared across processes (for example its address is not handed to anything) and is not static or even global. There is no known type of potential context for specific hardware details that would need to be taken into account for junk. That in turn leaves open not accessing it at all as far as I can tell. =3D=3D=3D Mark Millard markmi at dsl-only.net From owner-freebsd-numerics@freebsd.org Sat May 13 11:01:15 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 18869D6A250 for ; Sat, 13 May 2017 11:01:15 +0000 (UTC) (envelope-from dimitry@andric.com) Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id 02B12A97 for ; Sat, 13 May 2017 11:01:15 +0000 (UTC) (envelope-from dimitry@andric.com) Received: by mailman.ysv.freebsd.org (Postfix) id 02113D6A24F; Sat, 13 May 2017 11:01:15 +0000 (UTC) Delivered-To: numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0192CD6A24E; Sat, 13 May 2017 11:01:15 +0000 (UTC) (envelope-from dimitry@andric.com) Received: from tensor.andric.com (tensor.andric.com [87.251.56.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "tensor.andric.com", Issuer "COMODO RSA Domain Validation Secure Server CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id BD736A95; Sat, 13 May 2017 11:01:14 +0000 (UTC) (envelope-from dimitry@andric.com) Received: from [IPv6:2001:470:7a58::a8c1:a7f4:edbc:3331] (unknown [IPv6:2001:470:7a58:0:a8c1:a7f4:edbc:3331]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by tensor.andric.com (Postfix) with ESMTPSA id 5AF543FD73; Sat, 13 May 2017 13:01:12 +0200 (CEST) From: Dimitry Andric Message-Id: <42D3F536-42D7-4097-A500-0EF939584592@andric.com> Content-Type: multipart/signed; boundary="Apple-Mail=_3FD2EBD2-6E86-4877-858B-D4C2722775DB"; protocol="application/pgp-signature"; micalg=pgp-sha1 Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: catrig[fl].c and inexact Date: Sat, 13 May 2017 13:00:59 +0200 In-Reply-To: <20170512215654.GA82545@troutmask.apl.washington.edu> Cc: numerics@freebsd.org, freebsd-hackers@freebsd.org To: sgk@troutmask.apl.washington.edu References: <20170512215654.GA82545@troutmask.apl.washington.edu> X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 13 May 2017 11:01:15 -0000 --Apple-Mail=_3FD2EBD2-6E86-4877-858B-D4C2722775DB Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii On 12 May 2017, at 23:56, Steve Kargl = wrote: >=20 > So, I've been making improvements to my implementations of > the half-cycle trig functions. In doing so, I decide to > add WARNS=3D2 to msun/Makefile. clang 4.0.0 dies with an > error about an unused variable in raise_inexact() from > catrig[fl].c. >=20 > /usr/home/kargl/trunk/math/libm/msun/src/catrigl.c:195:2: error: = unused variable > 'junk' [-Werror,-Wunused-variable] > raise_inexact(); > ^ > /usr/home/kargl/trunk/math/libm/msun/src/catrigl.c:56:45: note: = expanded from > macro 'raise_inexact' > #define raise_inexact() do { volatile float junk =3D 1 + tiny; } = while(0) > ^ > Grepping catrig.o for the variable 'junk' suggests that 'junk' is > optimized out (with at least -O2). As far as I can see, this is not the case. The simplest reduction is this: static const volatile float tiny =3D 0x1p-100; void f(void) { volatile float junk =3D 1 + tiny; } For i386-freebsd, this results in the following (boilerplate left out): $ clang-4.0.0 -target i386-freebsd -O2 -S vol1.c -o - [...] pushl %ebp movl %esp, %ebp pushl %eax fld1 fadds tiny fstps -4(%ebp) addl $4, %esp popl %ebp retl [...] tiny: .long 226492416 # float 7.88860905E-31 For amd64-freebsd: $ clang-4.0.0 -target amd64-freebsd -O2 -S vol1.c -o - [...] .LCPI0_0: .long 1065353216 # float 1 [...] pushq %rbp movq %rsp, %rbp movss tiny(%rip), %xmm0 # xmm0 =3D mem[0],zero,zero,zero addss .LCPI0_0(%rip), %xmm0 movss %xmm0, -4(%rbp) popq %rbp retq [...] tiny: .long 226492416 # float 7.88860905E-31 I also tried -O3, but it doesn't change the result. -Dimitry --Apple-Mail=_3FD2EBD2-6E86-4877-858B-D4C2722775DB Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.30 iEYEARECAAYFAlkW53gACgkQsF6jCi4glqPkNACfTDp+YbDQinSkExo64JsidEmj bWMAnA3VM6qYzUFY/5BpESn9zX3x2nxk =NqYy -----END PGP SIGNATURE----- --Apple-Mail=_3FD2EBD2-6E86-4877-858B-D4C2722775DB-- From owner-freebsd-numerics@freebsd.org Sat May 13 13:08:37 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 71B6FD6A37C for ; Sat, 13 May 2017 13:08:37 +0000 (UTC) (envelope-from dimitry@andric.com) Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id 5A6F31BE2 for ; Sat, 13 May 2017 13:08:37 +0000 (UTC) (envelope-from dimitry@andric.com) Received: by mailman.ysv.freebsd.org (Postfix) id 59C42D6A37B; Sat, 13 May 2017 13:08:37 +0000 (UTC) Delivered-To: numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 577DAD6A37A; Sat, 13 May 2017 13:08:37 +0000 (UTC) (envelope-from dimitry@andric.com) Received: from tensor.andric.com (tensor.andric.com [IPv6:2001:470:7a58:1::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "tensor.andric.com", Issuer "COMODO RSA Domain Validation Secure Server CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id E16B51BE1; Sat, 13 May 2017 13:08:36 +0000 (UTC) (envelope-from dimitry@andric.com) Received: from [IPv6:2001:470:7a58::a8c1:a7f4:edbc:3331] (unknown [IPv6:2001:470:7a58:0:a8c1:a7f4:edbc:3331]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by tensor.andric.com (Postfix) with ESMTPSA id 8EF363FD81; Sat, 13 May 2017 15:08:33 +0200 (CEST) From: Dimitry Andric Message-Id: Content-Type: multipart/signed; boundary="Apple-Mail=_4FBC88C3-4C7E-4D97-8BD0-773DBE95BCD3"; protocol="application/pgp-signature"; micalg=pgp-sha1 Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: catrig[fl].c and inexact Date: Sat, 13 May 2017 15:08:26 +0200 In-Reply-To: <20170513060803.GA84399@troutmask.apl.washington.edu> Cc: Bruce Evans , freebsd-hackers@freebsd.org, numerics@freebsd.org To: sgk@troutmask.apl.washington.edu References: <20170512215654.GA82545@troutmask.apl.washington.edu> <20170513103208.M845@besplex.bde.org> <20170513060803.GA84399@troutmask.apl.washington.edu> X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 13 May 2017 13:08:37 -0000 --Apple-Mail=_4FBC88C3-4C7E-4D97-8BD0-773DBE95BCD3 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii On 13 May 2017, at 08:08, Steve Kargl = wrote: >=20 > On Sat, May 13, 2017 at 11:35:49AM +1000, Bruce Evans wrote: >> On Fri, 12 May 2017, Steve Kargl wrote: ... >> required for the standard magic. I planned to fix all this magic = using >> macros like raise_inexact(). >=20 > If you plan to fix the magic with raise_inexact, then please > test with a suite of compilers. AFAICT, clang is optimizing > out the code. I haven't written a testcase to demonstrate this > as I have other irons in the fire. Using the full catrig.c and -O3, I tried gcc 4.2.1, 4.7.4, 4.8.5, 4.9.4, 5.4.0, 6.3.0 and 7.0.1, in addition to clang 3.4.1, 3.8.0, 3.9.1, 4.0.0 and 5.0.0. All versions of gcc produced something similar to the following for i386: # /usr/src/lib/msun/src/catrig.c:314: if (x =3D=3D 0 && y =3D=3D 0) .loc 1 314 0 fldz fucom %st(3) # fnstsw %ax # tmp262 sahf setne %al #, tmp270 setnp %dl #, tmp259 subl $1, %eax #, tmp270 testb %al, %dl # tmp270, tmp259 je .L176 #, fucomp %st(1) # fnstsw %ax # tmp281 sahf setne %al #, tmp289 setnp %dl #, tmp278 subl $1, %eax #, tmp289 testb %al, %dl # tmp289, tmp278 je .L37 #, fstp %st(3) # fstp %st(0) # jmp .L153 # [...] .L176: fstp %st(0) # .L37: .LBB25: # /usr/src/lib/msun/src/catrig.c:318: raise_inexact(); flds tiny # tiny fadds .LC2 # fstps 120(%esp) # junk and for amd64: # /usr/src/lib/msun/src/catrig.c:314: if (x =3D=3D 0 && y =3D=3D 0) .loc 1 314 0 pxor %xmm7, %xmm7 # tmp386 ucomisd %xmm7, %xmm3 # tmp386, z setnp %dl #, tmp258 cmovne %eax, %edx # tmp258,, tmp207, tmp254 testb %dl, %dl # tmp254 je .L34 #, ucomisd %xmm7, %xmm1 # tmp386, z setnp %dl #, tmp266 cmove %edx, %eax # tmp266,, tmp262 testb %al, %al # tmp262 je .L34 #, [...] .L34: .LBB33: # /usr/src/lib/msun/src/catrig.c:318: raise_inexact(); movss tiny(%rip), %xmm0 # tiny, tiny.0_28 addss .LC13(%rip), %xmm0 #, _29 movss %xmm0, 188(%rsp) # _29, junk All versions of clang produced something similar to the following for i386: .loc 1 314 8 is_stmt 1 # = /usr/src/lib/msun/src/catrig.c:314:8 fldz .loc 1 314 13 is_stmt 0 # = /usr/src/lib/msun/src/catrig.c:314:13 fxch %st(1) fucom %st(1) fnstsw %ax sahf jne .LBB0_19 jp .LBB0_19 .loc 1 0 13 # = /usr/src/lib/msun/src/catrig.c:0:13 fxch %st(3) fucom %st(1) fstp %st(1) fnstsw %ax sahf fldz fxch %st(1) fxch %st(3) jne .LBB0_19 jp .LBB0_19 [...] .LBB0_19: # %do.body .loc 1 0 8 is_stmt 0 # = /usr/src/lib/msun/src/catrig.c:0:8 fstp %st(1) .loc 1 318 2 is_stmt 1 # = /usr/src/lib/msun/src/catrig.c:318:2 fld1 fadds tiny fstps 168(%esp) and for amd64: .loc 1 314 8 is_stmt 1 # = /usr/src/lib/msun/src/catrig.c:314:8 pxor %xmm2, %xmm2 .loc 1 314 13 is_stmt 0 # = /usr/src/lib/msun/src/catrig.c:314:13 ucomisd %xmm2, %xmm4 jne .LBB0_15 jp .LBB0_15 .loc 1 0 13 # = /usr/src/lib/msun/src/catrig.c:0:13 ucomisd %xmm2, %xmm3 jne .LBB0_15 jnp .LBB0_21 .LBB0_15: # %do.body .loc 1 318 2 is_stmt 1 # = /usr/src/lib/msun/src/catrig.c:318:2 movss tiny(%rip), %xmm2 # xmm2 =3D mem[0],zero,zero,zero addss .LCPI0_2(%rip), %xmm2 .Ltmp11: movss %xmm2, -16(%rbp) E.g., these all look good, at least with regards to not optimizing out the desired addition. The only compiler I could find that does optimize everything away (at least in the simplified test case), is the Intel compiler: https://godbolt.org/g/g1UT2m -Dimitry --Apple-Mail=_4FBC88C3-4C7E-4D97-8BD0-773DBE95BCD3 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.30 iEYEARECAAYFAlkXBVEACgkQsF6jCi4glqP6KQCg2xk6WB11svnu92R6Rr2NtmO5 9TIAoK00DaX+gGpjflMpSreyQ5iVCdy0 =FHkh -----END PGP SIGNATURE----- --Apple-Mail=_4FBC88C3-4C7E-4D97-8BD0-773DBE95BCD3-- From owner-freebsd-numerics@freebsd.org Sat May 13 16:05:38 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 151A2D6BA9F for ; Sat, 13 May 2017 16:05:38 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 0201C18F9 for ; Sat, 13 May 2017 16:05:38 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: by mailman.ysv.freebsd.org (Postfix) id 01463D6BA9E; Sat, 13 May 2017 16:05:38 +0000 (UTC) Delivered-To: numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id F338FD6BA9D; Sat, 13 May 2017 16:05:37 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail109.syd.optusnet.com.au (mail109.syd.optusnet.com.au [211.29.132.80]) by mx1.freebsd.org (Postfix) with ESMTP id 9E66718F7; Sat, 13 May 2017 16:05:37 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c122-106-153-191.carlnfd1.nsw.optusnet.com.au [122.106.153.191]) by mail109.syd.optusnet.com.au (Postfix) with ESMTPS id 44982D6B4B0; Sun, 14 May 2017 02:05:33 +1000 (AEST) Date: Sun, 14 May 2017 02:05:33 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl cc: numerics@freebsd.org, freebsd-hackers@freebsd.org Subject: Re: catrig[fl].c and inexact In-Reply-To: <20170513060803.GA84399@troutmask.apl.washington.edu> Message-ID: <20170514011600.D1038@besplex.bde.org> References: <20170512215654.GA82545@troutmask.apl.washington.edu> <20170513103208.M845@besplex.bde.org> <20170513060803.GA84399@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=KeqiiUQD c=1 sm=1 tr=0 a=Tj3pCpwHnMupdyZSltBt7Q==:117 a=Tj3pCpwHnMupdyZSltBt7Q==:17 a=kj9zAlcOel0A:10 a=NOglxHdSkPoQZBT6KtcA:9 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 13 May 2017 16:05:38 -0000 On Fri, 12 May 2017, Steve Kargl wrote: > On Sat, May 13, 2017 at 11:35:49AM +1000, Bruce Evans wrote: >> On Fri, 12 May 2017, Steve Kargl wrote: >> >>> ... >>> /usr/home/kargl/trunk/math/libm/msun/src/catrigl.c:56:45: note: expanded from >>> macro 'raise_inexact' >>> #define raise_inexact() do { volatile float junk = 1 + tiny; } while(0) >>> ^ >>> Grepping catrig.o for the variable 'junk' suggests that 'junk' is >>> optimized out (with at least -O2). It is a local variable, so should be and is allocated on the stack, so you will never find it using grep. The problem seems to be that all compilers generated the intended code, but clang warns anyway. >> Just another bug in clang. Volatile variables cannot be optimized out >> (if they are accessed). > > Does this depend on scope? 'junk' is local to the do {...} while(0); > construct. Can a compiler completely eliminate a do-nothing scoping > unit? I don't know C well enough to know. I do know what I have > observed in clang. The semantics of volatile, but as a practical matter standards shouldn't specify much and compilers should be very conservative. BTW, I recently noticed that volatile doesn't work right in bus space macros. Some reduce to *(volatile int *)var = val, where var is for memory mapped-i/o that takes 10000 times as long as normal memory to access. Compilers still unroll loops setting such variables. This is only a pessimization for space. >>> ... >>> @@ -315,7 +315,7 @@ casinh(double complex z) >>> return (z); >>> >>> /* All remaining cases are inexact. */ >>> - raise_inexact(); >>> + raise_inexact(new_y); >>> >>> if (ax < SQRT_6_EPSILON / 4 && ay < SQRT_6_EPSILON / 4) >>> return (z); >> >> Now it doesn't take compiler bugs to optimize it out, since new_y is not >> volatile, and a good compiler would optimize it out in all cases. > > I've yet to find a good compiler. They all seem to have bugs. > >> new_y >> is obviously unused before the early returns, so it doesn't need to be >> evalated before the returns as far as the compiler can see. Later, >> new_y is initialized indirectly, and the compiler can see that too (not >> so easily, so it can see that raise_inexact() has no effect except possibly >> for its side effect of raising inexact for 1 + tiny. > > The later call passes the address of new_y to the routine. How > can the compiler short of inlining the called routine know that > the value assigned to new_y isn't used? The compiler does full inlining even when you don't want it. Full analysis of the whole source file is fundamental for generating useful warnings with -Wunused. Without full analysis, the compiler would have to assume that new_y is used uninitialized and either suppress warnings for all variables that might be initialized indirectly (including via aliased pointers), or generate many bogus warnings that variables "might be" used uninitialized. Old compilers mostly did the latter, and we still see ocasional spurious warnings from gcc-4.2.1. Old compilers also have man pages in which this is partly documented. gcc-3.3.3(1) says that: - Wuninitialized is null without -O - Wuninitialized is never generated for volatile variables - Wuninitialized is not the default since gcc is not smart enough to handle it well gcc-4.2.1(1) says much the same, plus that -Wall implies -Wuninitialized. It setill says that the compiler is not smart, and doesn't seem to document improvements that make this warning reasonable as the default with -Wall. This is mostly because -O now implies -funit-at-a-time, which I usually don't want, but which gives the full analysis needed for -Wunitialized and -Wunused. I usually don't want this because: - it slows down compilation - it allows unwanted inlining - it allows unportable code. clang doesn't support -funit-at-a-time. >> The change might defeat the intent of the original code in another way. >> 'junk' is intentionally independent of other variables, so that there are >> no dependencies on it. If the compiler doesn't optimize away the assignment >> to new_y, then it is probably because it doesn't see that the assignment is >> dead, so there is a dependency. > > It may defeat the intent of the original code, but it seems that > the original code provokes undefined behavior. Defined, but perhaps not what is wanted. It is using -W flags that gives undefined behaviour. They are undefined by the C standard, and also undefined by compilers with stub man pages. >> Actually, we want the variable 'junk' to be optimized away. We only want >> the side effect of evaluating 1 + tiny. Compilers have bugs evaluating >> expressions like 1 + tiny, tiny*tiny and huge*huge, and we use assignments >> of the result to volatile variables in tens if not hundreds of places to >> try to work around compiler bugs. If that doesn't work here, then all the >> other places are probably broken too. The other places mostly use a static >> volatile, while this uses an auto volatile. 'tiny' is also volatile, as >> required for the standard magic. I planned to fix all this magic using >> macros like raise_inexact(). > > If you plan to fix the magic with raise_inexact, then please > test with a suite of compilers. AFAICT, clang is optimizing > out the code. I haven't written a testcase to demonstrate this > as I have other irons in the fire. I only tested with 4 compilers when I wrote it. Actually, we agreed not to worry about compiler bugs for setting fenv, especially for compilers with even more of them than gcc. libm only has the volatile hack needed to fix huge*huge for clang in some places (gcc evaluates huge*huge at run time but tiny*tiny at compile time, so libm has more volatile hacks for the latter). Not to mention hacks to remove extra precision for huge*huge and tiny*tiny. On i386 with i387, huge*huge doesn't overflow since it is evaluated in extra precision. The wrong result is returned and the wrong result is used if it is assigned to a variable that can hold the extra precision. Overflow only occurs if the variable is converted to float ot double, and STRICT_ASSIGN() or a volatile hack must be used for this to work around other compiler bugs (which are actually features, but not allowed by C standards). C11 and compiler non-support for C11 breaks this further. C11 adds the extra pessimization auns subtraction of value of requiring extra precision (and range) to be destroyed on function return. clang ignores this requirement. Newer gcc supports it under certain pessimal CFLAGS including -std=c11. Bruce. From owner-freebsd-numerics@freebsd.org Sat May 13 16:19:34 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3A273D6BDCB for ; Sat, 13 May 2017 16:19:34 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 263391DA0 for ; Sat, 13 May 2017 16:19:34 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: by mailman.ysv.freebsd.org (Postfix) id 257B9D6BDC6; Sat, 13 May 2017 16:19:34 +0000 (UTC) Delivered-To: numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 232CFD6BDC3; Sat, 13 May 2017 16:19:34 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246]) by mx1.freebsd.org (Postfix) with ESMTP id DF7351D9F; Sat, 13 May 2017 16:19:32 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c122-106-153-191.carlnfd1.nsw.optusnet.com.au [122.106.153.191]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id B18B842C3B3; Sun, 14 May 2017 02:19:24 +1000 (AEST) Date: Sun, 14 May 2017 02:19:24 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Dimitry Andric cc: sgk@troutmask.apl.washington.edu, freebsd-hackers@freebsd.org, numerics@freebsd.org Subject: Re: catrig[fl].c and inexact In-Reply-To: Message-ID: <20170514020559.F1038@besplex.bde.org> References: <20170512215654.GA82545@troutmask.apl.washington.edu> <20170513103208.M845@besplex.bde.org> <20170513060803.GA84399@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=KeqiiUQD c=1 sm=1 tr=0 a=Tj3pCpwHnMupdyZSltBt7Q==:117 a=Tj3pCpwHnMupdyZSltBt7Q==:17 a=kj9zAlcOel0A:10 a=PeOOapuUAAAA:8 a=Wnqw8I5xCDkGpBuh6r0A:9 a=CjuIK1q_8ugA:10 a=0BaqRfgCL6CLbWgV2pdm:22 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 13 May 2017 16:19:34 -0000 On Sat, 13 May 2017, Dimitry Andric wrote: > On 13 May 2017, at 08:08, Steve Kargl wrote: >> >> On Sat, May 13, 2017 at 11:35:49AM +1000, Bruce Evans wrote: >>> On Fri, 12 May 2017, Steve Kargl wrote: > ... >>> required for the standard magic. I planned to fix all this magic using >>> macros like raise_inexact(). >> >> If you plan to fix the magic with raise_inexact, then please >> test with a suite of compilers. AFAICT, clang is optimizing >> out the code. I haven't written a testcase to demonstrate this >> as I have other irons in the fire. > > Using the full catrig.c and -O3, I tried gcc 4.2.1, 4.7.4, 4.8.5, 4.9.4, > 5.4.0, 6.3.0 and 7.0.1, in addition to clang 3.4.1, 3.8.0, 3.9.1, 4.0.0 > and 5.0.0. All versions of gcc produced something similar to the > following for i386: Yes, all compilers I tried (only gcc-3.3.3, gcc-4.2.1 and clang-3.9.0) generate the intended code, but clang-3.9.0 also generates a -Wunused warning about the variable that it has just used to generated the intended code! > # /usr/src/lib/msun/src/catrig.c:318: raise_inexact(); > flds tiny # tiny > fadds .LC2 # > fstps 120(%esp) # junk I don't know how to ask for the best code, which is more like flds tiny fadds one ffree %st(0) # or fstp %st(0) -- MD optimization but the best code runs insignificantly faster in practice. > and for amd64: > [...] > .L34: > .LBB33: > # /usr/src/lib/msun/src/catrig.c:318: raise_inexact(); > movss tiny(%rip), %xmm0 # tiny, tiny.0_28 > addss .LC13(%rip), %xmm0 #, _29 > movss %xmm0, 188(%rsp) # _29, junk Discarding the result is easier for amd64 (just omit the store). The volatile hack forces the store. > E.g., these all look good, at least with regards to not optimizing out > the desired addition. > > The only compiler I could find that does optimize everything away (at > least in the simplified test case), is the Intel compiler: > > https://godbolt.org/g/g1UT2m Urk. Bruce From owner-freebsd-numerics@freebsd.org Sat May 13 16:21:58 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5F9A3D6BF30 for ; Sat, 13 May 2017 16:21:58 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 492411FD0 for ; Sat, 13 May 2017 16:21:58 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: by mailman.ysv.freebsd.org (Postfix) id 48860D6BF2F; Sat, 13 May 2017 16:21:58 +0000 (UTC) Delivered-To: numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 46565D6BF2E; Sat, 13 May 2017 16:21:58 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 2C32D1FCF; Sat, 13 May 2017 16:21:58 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v4DGLrVL088880 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sat, 13 May 2017 09:21:53 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v4DGLrgG088879; Sat, 13 May 2017 09:21:53 -0700 (PDT) (envelope-from sgk) Date: Sat, 13 May 2017 09:21:53 -0700 From: Steve Kargl To: Dimitry Andric Cc: Bruce Evans , freebsd-hackers@freebsd.org, numerics@freebsd.org Subject: Re: catrig[fl].c and inexact Message-ID: <20170513162153.GB88653@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu References: <20170512215654.GA82545@troutmask.apl.washington.edu> <20170513103208.M845@besplex.bde.org> <20170513060803.GA84399@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.7.2 (2016-11-26) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 13 May 2017 16:21:58 -0000 On Sat, May 13, 2017 at 03:08:26PM +0200, Dimitry Andric wrote: > On 13 May 2017, at 08:08, Steve Kargl wrote: > > > > On Sat, May 13, 2017 at 11:35:49AM +1000, Bruce Evans wrote: > >> On Fri, 12 May 2017, Steve Kargl wrote: > ... > >> required for the standard magic. I planned to fix all this magic using > >> macros like raise_inexact(). > > > > If you plan to fix the magic with raise_inexact, then please > > test with a suite of compilers. AFAICT, clang is optimizing > > out the code. I haven't written a testcase to demonstrate this > > as I have other irons in the fire. > > Using the full catrig.c and -O3, I tried gcc 4.2.1, 4.7.4, 4.8.5, 4.9.4, > 5.4.0, 6.3.0 and 7.0.1, in addition to clang 3.4.1, 3.8.0, 3.9.1, 4.0.0 > and 5.0.0. Thanks for checking. I reduced catrig.c to a small self-contained program and indeed I was getting the desired addition of 1 + tiny to raise FE_INEXACT. I suppose that I'll need to add an appropriate -Wno-foo to my CFLAGS line to suppress the spurious warning, which might be tricky because -Wunused is one option I'ld like to have. -- Steve 20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4 20161221 https://www.youtube.com/watch?v=IbCHE-hONow From owner-freebsd-numerics@freebsd.org Sat May 13 16:55:38 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9AD0AD6B872 for ; Sat, 13 May 2017 16:55:38 +0000 (UTC) (envelope-from dimitry@andric.com) Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id 82C661613 for ; Sat, 13 May 2017 16:55:38 +0000 (UTC) (envelope-from dimitry@andric.com) Received: by mailman.ysv.freebsd.org (Postfix) id 7EF6AD6B870; Sat, 13 May 2017 16:55:38 +0000 (UTC) Delivered-To: numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7B18CD6B86F; Sat, 13 May 2017 16:55:38 +0000 (UTC) (envelope-from dimitry@andric.com) Received: from tensor.andric.com (tensor.andric.com [87.251.56.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "tensor.andric.com", Issuer "COMODO RSA Domain Validation Secure Server CA" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 004711612; Sat, 13 May 2017 16:55:37 +0000 (UTC) (envelope-from dimitry@andric.com) Received: from [IPv6:2001:470:7a58::a8c1:a7f4:edbc:3331] (unknown [IPv6:2001:470:7a58:0:a8c1:a7f4:edbc:3331]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by tensor.andric.com (Postfix) with ESMTPSA id 09D253FD9B; Sat, 13 May 2017 18:55:34 +0200 (CEST) From: Dimitry Andric Message-Id: Content-Type: multipart/signed; boundary="Apple-Mail=_1A165ECB-BD13-4967-A0C3-5C9609FF1B6F"; protocol="application/pgp-signature"; micalg=pgp-sha1 Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: catrig[fl].c and inexact Date: Sat, 13 May 2017 18:55:27 +0200 In-Reply-To: <20170513162153.GB88653@troutmask.apl.washington.edu> Cc: freebsd-hackers@freebsd.org, numerics@freebsd.org, Bruce Evans To: sgk@troutmask.apl.washington.edu References: <20170512215654.GA82545@troutmask.apl.washington.edu> <20170513103208.M845@besplex.bde.org> <20170513060803.GA84399@troutmask.apl.washington.edu> <20170513162153.GB88653@troutmask.apl.washington.edu> X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 13 May 2017 16:55:38 -0000 --Apple-Mail=_1A165ECB-BD13-4967-A0C3-5C9609FF1B6F Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii On 13 May 2017, at 18:21, Steve Kargl = wrote: >=20 > On Sat, May 13, 2017 at 03:08:26PM +0200, Dimitry Andric wrote: ... >=20 >> Using the full catrig.c and -O3, I tried gcc 4.2.1, 4.7.4, 4.8.5, = 4.9.4, >> 5.4.0, 6.3.0 and 7.0.1, in addition to clang 3.4.1, 3.8.0, 3.9.1, = 4.0.0 >> and 5.0.0. >=20 > Thanks for checking. I reduced catrig.c to a small self-contained > program and indeed I was getting the desired addition of 1 + tiny > to raise FE_INEXACT. I suppose that I'll need to add an appropriate > -Wno-foo to my CFLAGS line to suppress the spurious warning, which > might be tricky because -Wunused is one option I'ld like to have. The following also gets rid of the warnings: Index: lib/msun/src/catrig.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- lib/msun/src/catrig.c (revision 318032) +++ lib/msun/src/catrig.c (working copy) @@ -37,7 +37,7 @@ #define isinf(x) (fabs(x) =3D=3D INFINITY) #undef isnan #define isnan(x) ((x) !=3D (x)) -#define raise_inexact() do { volatile float junk =3D 1 + tiny; } = while(0) +#define raise_inexact() do { volatile float junk __unused =3D 1 = + tiny; } while(0) #undef signbit #define signbit(x) (__builtin_signbit(x)) Index: lib/msun/src/catrigf.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- lib/msun/src/catrigf.c (revision 318032) +++ lib/msun/src/catrigf.c (working copy) @@ -51,7 +51,7 @@ #define isinf(x) (fabsf(x) =3D=3D INFINITY) #undef isnan #define isnan(x) ((x) !=3D (x)) -#define raise_inexact() do { volatile float junk =3D 1 + tiny; } = while(0) +#define raise_inexact() do { volatile float junk __unused =3D 1 = + tiny; } while(0) #undef signbit #define signbit(x) (__builtin_signbitf(x)) Index: lib/msun/src/catrigl.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- lib/msun/src/catrigl.c (revision 318032) +++ lib/msun/src/catrigl.c (working copy) @@ -53,7 +53,7 @@ #define isinf(x) (fabsl(x) =3D=3D INFINITY) #undef isnan #define isnan(x) ((x) !=3D (x)) -#define raise_inexact() do { volatile float junk =3D 1 + tiny; } = while(0) +#define raise_inexact() do { volatile float junk __unused =3D 1 = + tiny; } while(0) #undef signbit #define signbit(x) (__builtin_signbitl(x)) If you are OK with that, I will commit it later today. -Dimitry --Apple-Mail=_1A165ECB-BD13-4967-A0C3-5C9609FF1B6F Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.30 iEYEARECAAYFAlkXOoUACgkQsF6jCi4glqOjeQCgrp2JTdTaC/b3j/+gqf56C3AV GT0AoO+KGbDi+qxoOxNrez97cSEMi/Vv =zJHP -----END PGP SIGNATURE----- --Apple-Mail=_1A165ECB-BD13-4967-A0C3-5C9609FF1B6F-- From owner-freebsd-numerics@freebsd.org Sat May 13 17:12:13 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 68227D6B2A6 for ; Sat, 13 May 2017 17:12:13 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id 50FA7155 for ; Sat, 13 May 2017 17:12:13 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: by mailman.ysv.freebsd.org (Postfix) id 50556D6B2A5; Sat, 13 May 2017 17:12:13 +0000 (UTC) Delivered-To: numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4FD07D6B2A4; Sat, 13 May 2017 17:12:13 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 2B29E154; Sat, 13 May 2017 17:12:13 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v4DHC8hu089182 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sat, 13 May 2017 10:12:08 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v4DHC8qr089181; Sat, 13 May 2017 10:12:08 -0700 (PDT) (envelope-from sgk) Date: Sat, 13 May 2017 10:12:08 -0700 From: Steve Kargl To: Dimitry Andric Cc: freebsd-hackers@freebsd.org, numerics@freebsd.org, Bruce Evans Subject: Re: catrig[fl].c and inexact Message-ID: <20170513171208.GA89162@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu References: <20170512215654.GA82545@troutmask.apl.washington.edu> <20170513103208.M845@besplex.bde.org> <20170513060803.GA84399@troutmask.apl.washington.edu> <20170513162153.GB88653@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.7.2 (2016-11-26) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 13 May 2017 17:12:13 -0000 On Sat, May 13, 2017 at 06:55:27PM +0200, Dimitry Andric wrote: > On 13 May 2017, at 18:21, Steve Kargl wrote: > > > > On Sat, May 13, 2017 at 03:08:26PM +0200, Dimitry Andric wrote: > ... > > > >> Using the full catrig.c and -O3, I tried gcc 4.2.1, 4.7.4, 4.8.5, 4.9.4, > >> 5.4.0, 6.3.0 and 7.0.1, in addition to clang 3.4.1, 3.8.0, 3.9.1, 4.0.0 > >> and 5.0.0. > > > > Thanks for checking. I reduced catrig.c to a small self-contained > > program and indeed I was getting the desired addition of 1 + tiny > > to raise FE_INEXACT. I suppose that I'll need to add an appropriate > > -Wno-foo to my CFLAGS line to suppress the spurious warning, which > > might be tricky because -Wunused is one option I'ld like to have. > > The following also gets rid of the warnings: > > Index: lib/msun/src/catrig.c > =================================================================== > --- lib/msun/src/catrig.c (revision 318032) > +++ lib/msun/src/catrig.c (working copy) > @@ -37,7 +37,7 @@ > #define isinf(x) (fabs(x) == INFINITY) > #undef isnan > #define isnan(x) ((x) != (x)) > -#define raise_inexact() do { volatile float junk = 1 + tiny; } while(0) > +#define raise_inexact() do { volatile float junk __unused = 1 + tiny; } while(0) > #undef signbit > #define signbit(x) (__builtin_signbit(x)) > > Index: lib/msun/src/catrigf.c > =================================================================== > --- lib/msun/src/catrigf.c (revision 318032) > +++ lib/msun/src/catrigf.c (working copy) > @@ -51,7 +51,7 @@ > #define isinf(x) (fabsf(x) == INFINITY) > #undef isnan > #define isnan(x) ((x) != (x)) > -#define raise_inexact() do { volatile float junk = 1 + tiny; } while(0) > +#define raise_inexact() do { volatile float junk __unused = 1 + tiny; } while(0) > #undef signbit > #define signbit(x) (__builtin_signbitf(x)) > > Index: lib/msun/src/catrigl.c > =================================================================== > --- lib/msun/src/catrigl.c (revision 318032) > +++ lib/msun/src/catrigl.c (working copy) > @@ -53,7 +53,7 @@ > #define isinf(x) (fabsl(x) == INFINITY) > #undef isnan > #define isnan(x) ((x) != (x)) > -#define raise_inexact() do { volatile float junk = 1 + tiny; } while(0) > +#define raise_inexact() do { volatile float junk __unused = 1 + tiny; } while(0) > #undef signbit > #define signbit(x) (__builtin_signbitl(x)) > > If you are OK with that, I will commit it later today. > I'm OK with this change, but I typically defer to Bruce as he knows much more about C and floating point. -- Steve 20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4 20161221 https://www.youtube.com/watch?v=IbCHE-hONow From owner-freebsd-numerics@freebsd.org Sat May 13 18:21:41 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3C520D6B011 for ; Sat, 13 May 2017 18:21:41 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 27D9BEE4 for ; Sat, 13 May 2017 18:21:41 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: by mailman.ysv.freebsd.org (Postfix) id 271C1D6B010; Sat, 13 May 2017 18:21:41 +0000 (UTC) Delivered-To: numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 26A47D6B00F; Sat, 13 May 2017 18:21:41 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail108.syd.optusnet.com.au (mail108.syd.optusnet.com.au [211.29.132.59]) by mx1.freebsd.org (Postfix) with ESMTP id 9B43EEE1; Sat, 13 May 2017 18:21:39 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c122-106-153-191.carlnfd1.nsw.optusnet.com.au [122.106.153.191]) by mail108.syd.optusnet.com.au (Postfix) with ESMTPS id D78651A400E; Sun, 14 May 2017 04:21:31 +1000 (AEST) Date: Sun, 14 May 2017 04:21:30 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Mark Millard cc: sgk@troutmask.apl.washington.edu, Bruce Evans , freebsd-hackers@freebsd.org, numerics@freebsd.org Subject: Re: catrig[fl].c and inexact In-Reply-To: Message-ID: <20170514023721.O1230@besplex.bde.org> References: <20170512215654.GA82545@troutmask.apl.washington.edu> <20170513103208.M845@besplex.bde.org> <20170513060803.GA84399@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=VbSHBBh9 c=1 sm=1 tr=0 a=Tj3pCpwHnMupdyZSltBt7Q==:117 a=Tj3pCpwHnMupdyZSltBt7Q==:17 a=kj9zAlcOel0A:10 a=iROBt-5bZgHvOzUyjZ0A:9 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 13 May 2017 18:21:41 -0000 On Sat, 13 May 2017, Mark Millard wrote: > > On 2017-May-12, at 11:08 PM, Steve Kargl wrote: > >> On Sat, May 13, 2017 at 11:35:49AM +1000, Bruce Evans wrote: >>> On Fri, 12 May 2017, Steve Kargl wrote: >>>> ... >>>> /usr/home/kargl/trunk/math/libm/msun/src/catrigl.c:56:45: note: expanded from >>>> macro 'raise_inexact' >>>> #define raise_inexact() do { volatile float junk = 1 + tiny; } while(0) >>>> ^ >>>> Grepping catrig.o for the variable 'junk' suggests that 'junk' is >>>> optimized out (with at least -O2). It is easy to write unportable code that works perfectly. On i386(i387): #define use(x) __asm("" : : "t" (x)) #define raise_inexact() use(1 + tiny) looks cleaner except for the asm, and generates perfect code with fstp %st(0) and no store of the result to memory. Unfortunately, the "t" (top of i387 stack) is too unportable. "g" might be portable enough, but generatates wose code that the volatile variable. >>> Just another bug in clang. Volatile variables cannot be optimized out >>> (if they are accessed). >> >> Does this depend on scope? 'junk' is local to the do {...} while(0); >> construct. Can a compiler completely eliminate a do-nothing scoping >> unit? I don't know C well enough to know. I do know what I have >> observed in clang. > > [This note ignores other standards than C99/C11 > that might place other constraints. And I've done > no checking of compiler results, I've just looked > at a couple of the C standards.] > > Note: I've not looking to tiny's declaration. It > may contribute in a way not covered below. > > Unfortunately the declarator in an init-declarator > that has an initializer is not part of an > expression. The rules for volatile are tied to uses > in expressions, not to the declarator. (Which is a > hole in the language definition as far as I can > tell.) But the very first mention of volatile in C99 (5.1.2.3 Program Execution #1) says that "Accessing a volatile object ... [is a side effect]. ... [All previous side effects shall be complete at certain sequence points.]" It doesn't make any exceptions for auto objects. Also, #3 explicitly says for side effects in expressions that the implementation may optimize away the evaluation if it can determine that the evaluation has no side effects, including by calling a function or accessing a volatile object. But here the compiler can't do that for 1 + tiny, since this expression does have side effects (perhaps modulo pragma FENV_ACCESS). This rule is redundant if not wrong. The implementation can always use the "as if" rule to avoid doing work to produce nothing. And according to #1, any access to a volatile variable has side effects, so the compiler can never determine that an evaluation involving volatile variables has no side effects. So the correctness of the compiler using #3 to avoid the assignment reduces to the standard breaking its own definition of volatile, and then the compiler using the broken definition. > There is one part of the wording that might mitigate > this, tied to a full declarator having a sequence > point at its end despite the declarator itself not > being an expression, even if its initializer is > one. There is another wording detail that might > as well. Surely the assignment gives a sequence point for initializers? Actually, this is not too clear. I don't even like initialization in declarations, partly because it obscure the order, and only wrote the code with an initializer to get a 1-line macro. It could be written as "volatile float junk; junk = 1 + tiny;". Also, the use() macro can be written in C, with similar problems to the asm version, as "#define use(x) do { volatile float junk; junk = x; } while (0)" or better in gnuC as "#define use(x) do { volatile __typeof(x) junk; junk = x; } while (0)". This allows keeping the volatile hack and variations to make it work (maybe just __unused) in 1 place. #9 (Example 1) says that an implementation may make the volatile keyword redundant, essentially by making volatile-memory non-magic. I don't like this. It reduces the side effects of volatile to just the ordering of accesses to volatiles relative to sequence points, but practical implementations need much more than that. This clause just says that impractical implementations are allowed, but so does the "as if" rule. #10 is much more of the same. 6.7.3 #6 says that accesses to a volatile-qualified object "may" have side effects unknown to the implementation. Misimplementations may still apply the "as if" rule and comform to this clause weaselishly by knowing their own badness. They just have to do what is allowed in Example 1 to make volatile have no useful effect. Then this clause is null. > Still, overall it would seem safer to be sure there > is an expression that references the volatile object, > not having only its declarator. But I would not take > even that as a guarantee under the C standards. The standard seems a bit too weighted towards read accesses. We cold try writing to a non-volatile variable and reading back the result as volatile using a *(volatile type_t *)&var hack. But that would give an unwanted extra memory access. > It may seem a silly difference but: > > do { volatile float junk=1; junk+=tiny; } while(0) > > may well be a better way of writing the "must > evaluate" part of the intent simply because > junk is used in an expression. Also it has both read > and write access, so is a little more "used". The > sequence point before the assignment can help avoid > compile-time evaluation as well. That would give 1 more unwanted memory access (if it works normally): - write 1 to junk - read 1 from junk; add tiny (usually) in a register - write result to junk. > Details if you care. . . > > I used the C99 and C11 definitions here, I > reference C11 section numbering but C99 agrees > as I remember. > > 5.1.2.3 Program execution says: > > "Accessing a volatile object, modifying an object, > modifying a file, or calling a function that does > any of those operations are all side effects, > which are changes in the state of the execution > environment. Evaluation of an expression may > produce side effects." > > Note that raising inexact does not fit in the > definition of side effect as far as I can tell. > So a compiler need not consider such a thing > for side-effect issues if I understand right. I think it does, modulo #pragma FENV_ACCESS. Indeed, F.7.1 says it does explicitly (and without Annex F, floating point can do almost anything). It says that when FENV_ACCESS is "on" (should be "ON"), for FP operations that implicitly raise exception flags, these changes to the FP state are treated as side effects which respect sequence points [footnote 291]. The footnote wastes space to remind the reader that optimizations are allowed when FENV_ACCESS is "off". > [C11 specific wording:] "The presence of a > sequence point between the evaluations of > expressions A and B implies that every value > computation and side effect associated with A > is sequenced before every value compuation and > side effect associated with B." > > [C99 is similar but is before the detailed > "sequenced before" definition.] > > "An actual implementation need not evaluate part > of an expression if it can deduce that its value > is not used and that no needed side effects are > produced (including any caused by calling a > function or accessing a volatile object)." I didn't expect any problems with volatile or sequence points. With FENV_ACCESS OFF, the compiler is free to ignore the side effect for 1+tiny, but with FENV_ACCESS broken in all available compilers, we have to assume that the compiler doesn't ignore this side affect. In practice, compilers do ignore it for (void)(1+tiny) with tiny non-volatile, so we use a several volatile hacks. Volatile for tiny alone isn't enough... > Can a accessing a volatile object ever be > classified as having "no needed side effects"? > More on this later. [Remember what "side effect" > excludes, as noted earlier. So some consequences > need not be considered by the compiler, all in > the name of optimizations.] ...we need the write access to junk it to have side effects. Since tiny is volatile, 1+tiny has an unknown value even with FENV_ACCESS OFF. Then we want the side effects for accessing junk to depend on the value, so that the value must be calculated even though it it unused except for its effects on the side effects. This is fragile. > 6.7.3 Type Qualifiers says: > > "An object that has volatile-qualified type . . . > Therefore any expression referring to such as object > shall be evaluated strictly according to the rules > of the abstract machine, as described in 5.1.2.3. > Furthermore, at every sequence point the value last > stored in the object shall agree with that prescribed > by the abstract machine, except as modified by the > unknown factors mentioned previously. What constitutes > an access to an object that has volatile-qualified > type is implementation-defined." > > This part is mixed: what the sequence point wording > giveth the last sentence taketh away. (More later.) The implementation must work for memory mapped-devices since that is the most important case for us. Anything that reads or writes a value to a memory-mapped device has lots of side effects that depend on the value. So junk = 1 + tiny must load tiny if tiny is for a memory-mapped device, evaluate 1+tiny to get a value to store, and do the store if junk is for a memory-mapped device. The compiler is doing too much optimization if it "knows" that junk is not for a memory-mapped device because the compiler allocated it on the stack. The compiler allocated the static tiny in ordinary memory too. If volatile is broken for tiny, and FENV_ACCESS is OFF or broken (unsupported) then the compiler is free to evaluate 1+tiny as 1 at compile time, and similarly for later expressions involving the result. extern volatile usually prevents the compiler from knowing that the variable is not for device memory. > It also says in a note (134): > > "A volatile declaration may be used to describe an > object corresponding to a memory-mapped input/output > port or an object accessed by an asynchronously > interrupting function. Actions on objects so declared > shall, not be "optimized out" by an implementation > or reordered except as permitted by the rules for > evaluating expressions." "so declared" must be read as simple "volatile", since there is no declaration like "volatile memory mapped ..." though such declarations would be very useful for kernels. > Since rules for evaluating expressions are not rules > for declarators (vs. initializers), this could be > read as not allowing the "optimize out". (But the > abstract machine's description is not explicit about > declarators for such issues.) It just allows all optimizations which the compiler can tell are safe. But compilers can never tell. Maybe the programmer mapped the stack memory-mapped... This is well outside the scope of the C abstract machine, but would be just another hack for kernels. > The C99 Rationale: > > The C99 Rationale was explicit about static > volatile for a memory mapped I/O register, > static const volatile for a memory mapped > input port, const volatile and volatile > for variables shared across processes. To > some extent this identifies examples of > contexts with "needed side effects" that > have hardware details to take into account. > > For taking into account hardware details: > ". . . Whatever decision are adopted on such > issues must be documented, as volatile access > is implementation-defined". > > For volatile use with no explicitly identified > hardware details: volatile would appear to be > no more than a potential hint for such a > context, not an effective requirement. The > implementation-defined status could allow lack > of access. > > Overall, based on what I see in the C99 and > C11 language definitions, I'd not be willing to > declare clang wrong (if it did optimize out junk), > even with my alternative formulation. > > C does not have an explicit Principle of Least > Astonishment as a official guideline to its > interpretation and the rules are very biased to > allowing so-called optimizations. "junk" does not > fit with being shared across processes (for > example its address is not handed to anything) > and is not static or even global. There is no > known type of potential context for specific > hardware details that would need to be taken > into account for junk. That in turn leaves open > not accessing it at all as far as I can tell. Yes, it is only a hint, and the C standard would be improved by saying just that, or requiring the strong meaning that is needed in practice. The strong meaning is that accesses to volatile variables always have side effects even if the implementation "knows" that the don't. Bruce From owner-freebsd-numerics@freebsd.org Sat May 13 19:14:57 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6E1E0D6BFF6 for ; Sat, 13 May 2017 19:14:57 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 599CBA9C for ; Sat, 13 May 2017 19:14:57 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: by mailman.ysv.freebsd.org (Postfix) id 56163D6BFF4; Sat, 13 May 2017 19:14:57 +0000 (UTC) Delivered-To: numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 52341D6BFF3; Sat, 13 May 2017 19:14:57 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au [211.29.132.42]) by mx1.freebsd.org (Postfix) with ESMTP id 10ACDA9B; Sat, 13 May 2017 19:14:56 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c122-106-153-191.carlnfd1.nsw.optusnet.com.au [122.106.153.191]) by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id 6F0963C6D9F; Sun, 14 May 2017 05:14:54 +1000 (AEST) Date: Sun, 14 May 2017 05:14:53 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Dimitry Andric cc: sgk@troutmask.apl.washington.edu, freebsd-hackers@freebsd.org, numerics@freebsd.org Subject: Re: catrig[fl].c and inexact In-Reply-To: Message-ID: <20170514043645.G2059@besplex.bde.org> References: <20170512215654.GA82545@troutmask.apl.washington.edu> <20170513103208.M845@besplex.bde.org> <20170513060803.GA84399@troutmask.apl.washington.edu> <20170513162153.GB88653@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=VbSHBBh9 c=1 sm=1 tr=0 a=Tj3pCpwHnMupdyZSltBt7Q==:117 a=Tj3pCpwHnMupdyZSltBt7Q==:17 a=kj9zAlcOel0A:10 a=Y88NXTGeRKpz0WgPRmcA:9 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 13 May 2017 19:14:57 -0000 On Sat, 13 May 2017, Dimitry Andric wrote: > On 13 May 2017, at 18:21, Steve Kargl wrote: >> >> On Sat, May 13, 2017 at 03:08:26PM +0200, Dimitry Andric wrote: > ... >> >>> Using the full catrig.c and -O3, I tried gcc 4.2.1, 4.7.4, 4.8.5, 4.9.4, >>> 5.4.0, 6.3.0 and 7.0.1, in addition to clang 3.4.1, 3.8.0, 3.9.1, 4.0.0 >>> and 5.0.0. >> >> Thanks for checking. I reduced catrig.c to a small self-contained >> program and indeed I was getting the desired addition of 1 + tiny >> to raise FE_INEXACT. I suppose that I'll need to add an appropriate >> -Wno-foo to my CFLAGS line to suppress the spurious warning, which >> might be tricky because -Wunused is one option I'ld like to have. > > The following also gets rid of the warnings: > > Index: lib/msun/src/catrig.c > =================================================================== > --- lib/msun/src/catrig.c (revision 318032) > +++ lib/msun/src/catrig.c (working copy) > @@ -37,7 +37,7 @@ > #define isinf(x) (fabs(x) == INFINITY) > #undef isnan > #define isnan(x) ((x) != (x)) > -#define raise_inexact() do { volatile float junk = 1 + tiny; } while(0) > +#define raise_inexact() do { volatile float junk __unused = 1 + tiny; } while(0) > #undef signbit > #define signbit(x) (__builtin_signbit(x)) > ... > > If you are OK with that, I will commit it later today. It is what I said was best yeseterday :-). Except, __unused is an obfuscation meaning __used. I couldn't get __used to work today either. It works with static variables, but for auto variables it generates "'__used__' attribute ignored" for both clang-3.9.0 and gcc-4.2.1, even without any -W flags to ask for excessive warnings. Today I looked at the macro used(expr), which would be used like used(1 + tiny) for inexact, used(huge * huge) for overflow, and used(tiny * tiny) for underflow. The difficulty is to declare the variable to hold the result, especially since we don't want this variable to be in memory. Also in some cases, we would like to return the result. For overflow, we can do either: ({ volatile float junk __unused = huge * huge; INFINITY; }) or ({ __typeof(huge) r; STRICT_ASSIGN(..., huge * huge); r; }) with different tradoffs (the second is broken if r is not used and there is no volatile hidden in STRICT_ASSIGN()), or better, only load huge once (float t = huge; junk = t * t;). Bruce From owner-freebsd-numerics@freebsd.org Sat May 13 20:55:19 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 78514D6B1DC; Sat, 13 May 2017 20:55:19 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 535621C2C; Sat, 13 May 2017 20:55:19 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v4DKtHpT091964 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sat, 13 May 2017 13:55:18 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v4DKtH98091963; Sat, 13 May 2017 13:55:17 -0700 (PDT) (envelope-from sgk) Date: Sat, 13 May 2017 13:55:17 -0700 From: Steve Kargl To: Bruce Evans Cc: freebsd-hackers@freebsd.org, freebsd-numerics@freebsd.org Subject: Re: Implementation of half-cycle trignometric functions Message-ID: <20170513205517.GA91911@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu References: <20170428010122.GA12814@troutmask.apl.washington.edu> <20170428183733.V1497@besplex.bde.org> <20170428165658.GA17560@troutmask.apl.washington.edu> <20170429035131.E3406@besplex.bde.org> <20170428201522.GA32785@troutmask.apl.washington.edu> <20170429070036.A4005@besplex.bde.org> <20170428233552.GA34580@troutmask.apl.washington.edu> <20170429005924.GA37947@troutmask.apl.washington.edu> <20170429151457.F809@besplex.bde.org> <20170429194239.P3294@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170429194239.P3294@besplex.bde.org> User-Agent: Mutt/1.7.2 (2016-11-26) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 13 May 2017 20:55:19 -0000 On Sat, Apr 29, 2017 at 08:19:23PM +1000, Bruce Evans wrote: > On Sat, 29 Apr 2017, Bruce Evans wrote: > > On Fri, 28 Apr 2017, Steve Kargl wrote: > >> On Fri, Apr 28, 2017 at 04:35:52PM -0700, Steve Kargl wrote: > >>> > >>> I was just backtracking with __kernel_sinpi. This gets a max ULP < 0.61. > > > > Comments on this below. > > > > This is all rather over-engineered. Optimizing these functions is > > unimportant comparing with finishing cosl() and sinl() and optimizing > > all of the standard trig functions better, but we need correctness. > > But I now see many simplifications and improvements: > > > > (1) There is no need for new kernels. The standard kernels already handle > > extra precision using approximations like: > > > > sin(x+y) ~= sin(x) + (1-x*x/2)*y. > > > > Simply reduce x and write Pi*x = hi+lo. Then > > > > sin(Pi*x) = __kernel_sin(hi, lo, 1). > > > > I now see how to do the extra-precision calculations without any > > multiplications. > > But that is over-engineered too. > > Using the standard kernels is easy and works well: Maybe works well. See below. > Efficiency is very good in some cases, but anomalous in others: all > times in cycles, on i386, on the range [0, 0.25] > > athlon-xp, gcc-3.3 Haswell, gcc-3.3 Haswell, gcc-4.2.1 > cos: 61-62 44 43 > cospi: 69-71 (8-9 extra) 78 (anomalous...) 42 (faster to do more!) > sin: 59-60 51 37 > sinpi: 67-68 (8 extra) 80 42 > tan: 136-172 93-195 67-94 > tanpi: 144-187 (8-15 extra) 145-176 61-189 > > That was a throughput test. Latency is not so good. My latency test > doesn't use serializing instructions, but uses random args and the > partial serialization of making each result depend on the previous > one. > > athlon-xp, gcc-3.3 Haswell, gcc-3.3 Haswell, gcc-4.2.1 > cos: 84-85 69 79 > cospi: 103-104 (19-21 extra) 117 94 > sin: 75-76 89 77 > sinpi: 105-106 (30 extra) 116 90 > tan: 168-170 167-168 147 > tanpi: 191-194 (23-24 extra) 191 154 > > This also indicates that the longest times for tan in the throughput > test are what happens when the function doesn't run in parallel with > itself. The high-degree polynomial and other complications in tan() > are too complicated for much cross-function parallelism. > > Anywyay, it looks like the cost of using the kernel is at most 8-9 > in the parallel case and at most 30 in the serial case. The extra- > precision code has about 10 dependent instructions, so it s is > doing OK to take 30. Based on other replies in this email exchange, I have gone back and looked at improvements to my __kernel_{cos|sin|tan}pi[fl] routines. The improvements where for both accuracy and speed. I have tested on i686 and x86_64 systems with libm built with -O2 -march=native -mtune=native. My timing loop is of the form float dx, f, x; long i, k; f = 0; k = 1 << 23; dx = (xmax - xmin) / (k - 1); time_start(); for (i = 0; i < k; i++) { x = xmin + i * dx; f += cospif(x); }; time_end(); x = (time_cpu() / k) * 1.e6; printf("cospif time: %.4f usec per call\n", x); if (f == 0) printf("Can't happen!\n"); The assumption here is that loop overhead is the same for all tested kernels. Test intervals for kernels. float: [0x1p-14, 0.25] double: [0x1p-29, 0.25] ld80: [0x1p-34, 0.25] Core2 Duo T7250 @ 2.00GHz || AMD FX8350 Eight-Core CPU (1995.05-MHz 686-class) || (4018.34-MHz K8-class) ----------------------------------++-------------------------- | Horner | Estrin | Fdlibm || Horner | Estrin | Fdlibm -------+--------+--------+--------++--------+--------+-------- cospif | 0.0223 | | 0.0325 || 0.0112 | | 0.0085 sinpif | 0.0233 | Note 1 | 0.0309 || 0.0125 | | 0.0085 tanpif | 0.0340 | | Note 2 || 0.0222 | | -------+--------+--------+--------++--------+--------+-------- cospi | 0.0641 | 0.0571 | 0.0604 || 0.0157 | 0.0142 | 0.0149 sinpi | 0.0722 | 0.0626 | 0.0712 || 0.0178 | 0.0161 | 0.0166 tanpi | 0.1049 | 0.0801 | || 0.0323 | 0.0238 | -------+--------+--------+--------++--------+--------+-------- cospil | 0.0817 | 0.0716 | 0.0921 || 0.0558 | 0.0560 | 0.0755 sinpil | 0.0951 | 0.0847 | 0.0994 || 0.0627 | 0.0568 | 0.0768 tanpil | 0.1310 | 0.1004 | || 0.1005 | 0.0827 | -------+--------+--------+--------++--------+--------+-------- Time in usec/call. Note 1. In re-arranging the polynomials for Estrin's method and float, I found appreciable benefit. Note 2. I have been unable to use the tan[fl] kernels to implement satisfactory kernels for tanpi[fl]. In particular, for x in [0.25,0.5] and using tanf kernel leads to 6 digit ULPs in 0.5 whereas my kernel near 2 ULP. -- Steve 20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4 20161221 https://www.youtube.com/watch?v=IbCHE-hONow From owner-freebsd-numerics@freebsd.org Sat May 13 22:30:39 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8FD54D66D9E; Sat, 13 May 2017 22:30:39 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246]) by mx1.freebsd.org (Postfix) with ESMTP id 33B216EC; Sat, 13 May 2017 22:30:38 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c122-106-153-191.carlnfd1.nsw.optusnet.com.au [122.106.153.191]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id AEF9E4296A3; Sun, 14 May 2017 08:30:35 +1000 (AEST) Date: Sun, 14 May 2017 08:30:34 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl cc: freebsd-hackers@freebsd.org, freebsd-numerics@freebsd.org Subject: Re: Implementation of half-cycle trignometric functions In-Reply-To: <20170513205517.GA91911@troutmask.apl.washington.edu> Message-ID: <20170514071942.T1084@besplex.bde.org> References: <20170428010122.GA12814@troutmask.apl.washington.edu> <20170428183733.V1497@besplex.bde.org> <20170428165658.GA17560@troutmask.apl.washington.edu> <20170429035131.E3406@besplex.bde.org> <20170428201522.GA32785@troutmask.apl.washington.edu> <20170429070036.A4005@besplex.bde.org> <20170428233552.GA34580@troutmask.apl.washington.edu> <20170429005924.GA37947@troutmask.apl.washington.edu> <20170429151457.F809@besplex.bde.org> <20170429194239.P3294@besplex.bde.org> <20170513205517.GA91911@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=KeqiiUQD c=1 sm=1 tr=0 a=Tj3pCpwHnMupdyZSltBt7Q==:117 a=Tj3pCpwHnMupdyZSltBt7Q==:17 a=kj9zAlcOel0A:10 a=YHl6NKQVYIZzuSoSgCMA:9 a=viboGBD9vYLM4oiE:21 a=fNZ8f2z6azax7mVy:21 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 13 May 2017 22:30:39 -0000 On Sat, 13 May 2017, Steve Kargl wrote: > On Sat, Apr 29, 2017 at 08:19:23PM +1000, Bruce Evans wrote: >> ... >> Using the standard kernels is easy and works well: > > Maybe works well. See below. >> ... >> Anywyay, it looks like the cost of using the kernel is at most 8-9 >> in the parallel case and at most 30 in the serial case. The extra- >> precision code has about 10 dependent instructions, so it s is >> doing OK to take 30. Probably a few more than 8. I got nowhere using inline versions for double precision. Apparently the only large win for inlining is when it avoids repeating the classification, as it does for e_rem_pio2*. The kernels don't repeat anything, so the only cost a function call, plus a few cycles for testing iy for __kernel_sin() only. > Based on other replies in this email exchange, I have gone back > and looked at improvements to my __kernel_{cos|sin|tan}pi[fl] > routines. The improvements where for both accuracy and speed. I really don't want another set of kernels (or more sets for degrees instead of radians, and sincos). Improvements to the existing kernels are welcome, but difficult except for long double precision. I got nowhere tweaking the polynominal in __kernel_sin(). Every change that that I tried just moved the tradeoff between accuracy and efficiency. The one best for efficiency is only about 4 cycles faster, and increases the error by 0.1 to 0.2 ulps. This change involves adding up the terms in a different order. > I have tested on i686 and x86_64 systems with libm built with > -O2 -march=native -mtune=native. My timing loop is of the > form > > float dx, f, x; > long i, k; > > f = 0; > k = 1 << 23; > dx = (xmax - xmin) / (k - 1); > time_start(); > for (i = 0; i < k; i++) { > x = xmin + i * dx; This asks for a conversions from long to double which tends to be slow, and a multiplication in the inner loop. The compiler shouldn't optimize it to x += dx since this has different inaccuracy. My test loop does x += dx with FP an test that x < limit. This sometimes has problems when dx is so small that x + dx == x. Also, x, dx and limit are double precision for testing all precision, so that the loop overhead is the same for all precisions. This works best on i386/i387. Otherwise, there are larger conversion overheads. This usually prevents x + dx == x in float precision, but in long double precison it results in x + dx == x more often. Double precision just can't handle a large limit like LDBL_MAX or even small steps up to DBL_MAX. > f += cospif(x); > }; > time_end(); > > x = (time_cpu() / k) * 1.e6; > printf("cospif time: %.4f usec per call\n", x); > > if (f == 0) > printf("Can't happen!\n"); Otherwise, this is a reasonable throughput test. But please count times in cycles if possible. rdtsc() is very easy to use on x86. > > The assumption here is that loop overhead is the same for > all tested kernels. It is probably much larger for long double precision. I get minimal times like 9 cycles for float and double precision, but more like 30 for long double on x86. > Test intervals for kernels. > > float: [0x1p-14, 0.25] > double: [0x1p-29, 0.25] > ld80: [0x1p-34, 0.25] > > Core2 Duo T7250 @ 2.00GHz || AMD FX8350 Eight-Core CPU > (1995.05-MHz 686-class) || (4018.34-MHz K8-class) > ----------------------------------++-------------------------- > | Horner | Estrin | Fdlibm || Horner | Estrin | Fdlibm > -------+--------+--------+--------++--------+--------+-------- > cospif | 0.0223 | | 0.0325 || 0.0112 | | 0.0085 > sinpif | 0.0233 | Note 1 | 0.0309 || 0.0125 | | 0.0085 > tanpif | 0.0340 | | Note 2 || 0.0222 | | The fdlibm kernels are almost impossible to beat in float precision, since they use double precision so the correct way to use them is for example 'cospif: return __kernel_cosdf(M_PI * x);' after reduction to |x| ~< 0.25 Any pure float precision method is going to take 10-20 cycles longer. It is interesting that you measured fdlibm to be faster on the newer system but much slower on the older system. The latter must be a bug somewhere. > -------+--------+--------+--------++--------+--------+-------- > cospi | 0.0641 | 0.0571 | 0.0604 || 0.0157 | 0.0142 | 0.0149 > sinpi | 0.0722 | 0.0626 | 0.0712 || 0.0178 | 0.0161 | 0.0166 > tanpi | 0.1049 | 0.0801 | || 0.0323 | 0.0238 | > -------+--------+--------+--------++--------+--------+-------- Now the differences are almost small enough to be noise. > cospil | 0.0817 | 0.0716 | 0.0921 || 0.0558 | 0.0560 | 0.0755 > sinpil | 0.0951 | 0.0847 | 0.0994 || 0.0627 | 0.0568 | 0.0768 > tanpil | 0.1310 | 0.1004 | || 0.1005 | 0.0827 | > -------+--------+--------+--------++--------+--------+-------- Now the differences are that the kernels for long double precision are unoptimized. They use Horner. Actually, they do use the optimization of using double precision constants if possible (but not the larger optimization for sparc64 of calculating higher terms in double precision). > Time in usec/call. > > Note 1. In re-arranging the polynomials for Estrin's method and > float, I found appreciable benefit. Do you mean "no appreciable benefit"? No times are shown. Short polynomials benefit less. There is also the problem that measuring throughput vs latency is hard. If the CPU can execute several functions in parallel, it is best (iff the load has candidates for such functions, as simple tests do) to use something like Horner's method to minimise the number of operations. Horner's method is only very bad for latency, and on in-order CPUs. Some of the timing anomalys are probably explained by this -- newer CPUs have fewer bottlenecks so do better at executing functions in parallel; this is also easier in float precision. > Note 2. I have been unable to use the tan[fl] kernels to implement > satisfactory kernels for tanpi[fl]. In particular, for x in [0.25,0.5] > and using tanf kernel leads to 6 digit ULPs in 0.5 whereas my kernel > near 2 ULP. The tanf kernel should be very accurate since it is in double precision. But its polynomial is chosen to only give an accuracy of 0.7999 ulps, while the polys for cosf and sing are chosen to give an accuracy of 0.5009 ulps, since the high accuracy is only almost free for the latter. Any extra error on 0.7999 might be too much. But multiplication by M_PI in double precision shouldn't change the error by more than 0.0001 ulps. The tanl kernel has to struggle to get even sub-ulp precision. Its degree is too high for efficiency, and I don't trust it to give even sub-ulp precision, especially for ld128. I didn't manage to get cospi(x) and sinpi(x) using the kernels as fast as cos(x) and sin(x), even with |x| restricted to < 0.25 so that the range reduction step is null. The extra precision operations just take longer than the range reduction even when the latter is not simplifed for the reduced range. Conversion of degrees to multiples of Pi is interesting. E.g., cosd(x) = cos(x * Pi / 180) = cospi(x / 180) in infinite precision. The natural way to implement it is to convert to cospi() first. This is only easy using a remainder operation. Remainder operations work for this, unlike for converting radians to a quadrand plus a remainder, because 180 is exactly representable but Pi isn't. But exact remainder operations are slow too. They are just not as slow or inexact as ones for 18000+ digit approximations to Pi. So cosd(x) can only be implemented much more efficiently than cos(x) for the unimportant case of large |x|. Bruce