From owner-freebsd-numerics@freebsd.org  Fri May 12 21:56:58 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 58BD6D69E31
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Fri, 12 May 2017 21:56:58 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3])
 by mx1.freebsd.org (Postfix) with ESMTP id 45D31890
 for <freebsd-numerics@freebsd.org>; Fri, 12 May 2017 21:56:58 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: by mailman.ysv.freebsd.org (Postfix)
 id 452B2D69E30; Fri, 12 May 2017 21:56:58 +0000 (UTC)
Delivered-To: numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 42F02D69E2F;
 Fri, 12 May 2017 21:56:58 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "troutmask", Issuer "troutmask" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 2FAE088E;
 Fri, 12 May 2017 21:56:55 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (localhost [127.0.0.1])
 by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v4CLusWo082601
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Fri, 12 May 2017 14:56:54 -0700 (PDT)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v4CLusWG082600;
 Fri, 12 May 2017 14:56:54 -0700 (PDT) (envelope-from sgk)
Date: Fri, 12 May 2017 14:56:54 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: numerics@freebsd.org, freebsd-hackers@freebsd.org
Subject: catrig[fl].c and inexact
Message-ID: <20170512215654.GA82545@troutmask.apl.washington.edu>
Reply-To: sgk@troutmask.apl.washington.edu
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.7.2 (2016-11-26)
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 12 May 2017 21:56:58 -0000

So, I've been making improvements to my implementations of
the half-cycle trig functions.  In doing so, I decide to 
add WARNS=2 to msun/Makefile.  clang 4.0.0 dies with an
error about an unused variable in raise_inexact() from 
catrig[fl].c.

/usr/home/kargl/trunk/math/libm/msun/src/catrigl.c:195:2: error: unused variable
      'junk' [-Werror,-Wunused-variable]
        raise_inexact();
        ^
/usr/home/kargl/trunk/math/libm/msun/src/catrigl.c:56:45: note: expanded from
      macro 'raise_inexact'
#define raise_inexact() do { volatile float junk = 1 + tiny; } while(0)
                                            ^
Grepping catrig.o for the variable 'junk' suggests that 'junk' is
optimized out (with at least -O2).

A quick and dirty patch to achieve the intent of the original
code follows.  It would be nice if some would like to commit
the patch.  Of course, you may want to wait for Bruce to 
review the diff.


Index: src/catrig.c
===================================================================
--- src/catrig.c	(revision 1935)
+++ src/catrig.c	(working copy)
@@ -37,7 +37,7 @@ __FBSDID("$FreeBSD: head/lib/msun/src/catrig.c 313863 
 #define isinf(x)	(fabs(x) == INFINITY)
 #undef isnan
 #define isnan(x)	((x) != (x))
-#define	raise_inexact()	do { volatile float junk = 1 + tiny; } while(0)
+#define	raise_inexact(x)	do { (x) = 1 + tiny; } while(0)
 #undef signbit
 #define signbit(x)	(__builtin_signbit(x))
 
@@ -315,7 +315,7 @@ casinh(double complex z)
 		return (z);
 
 	/* All remaining cases are inexact. */
-	raise_inexact();
+	raise_inexact(new_y);
 
 	if (ax < SQRT_6_EPSILON / 4 && ay < SQRT_6_EPSILON / 4)
 		return (z);
@@ -400,7 +400,7 @@ cacos(double complex z)
 		return (CMPLX(0, -y));
 
 	/* All remaining cases are inexact. */
-	raise_inexact();
+	raise_inexact(new_x);
 
 	if (ax < SQRT_6_EPSILON / 4 && ay < SQRT_6_EPSILON / 4)
 		return (CMPLX(pio2_hi - (x - pio2_lo), -y));
@@ -607,7 +607,7 @@ catanh(double complex z)
 		 * inexact, but this is the only only that needs to do it
 		 * explicitly.
 		 */
-		raise_inexact();
+		raise_inexact(ax);
 		return (z);
 	}
 
Index: src/catrigf.c
===================================================================
--- src/catrigf.c	(revision 1935)
+++ src/catrigf.c	(working copy)
@@ -51,7 +51,7 @@ __FBSDID("$FreeBSD: head/lib/msun/src/catrigf.c 275819
 #define isinf(x)	(fabsf(x) == INFINITY)
 #undef isnan
 #define isnan(x)	((x) != (x))
-#define	raise_inexact()	do { volatile float junk = 1 + tiny; } while(0)
+#define	raise_inexact(x)	do { (x) = 1 + tiny; } while(0)
 #undef signbit
 #define signbit(x)	(__builtin_signbitf(x))
 
@@ -176,7 +176,7 @@ casinhf(float complex z)
 	if (x == 0 && y == 0)
 		return (z);
 
-	raise_inexact();
+	raise_inexact(new_y);
 
 	if (ax < SQRT_6_EPSILON / 4 && ay < SQRT_6_EPSILON / 4)
 		return (z);
@@ -234,7 +234,7 @@ cacosf(float complex z)
 	if (x == 1 && y == 0)
 		return (CMPLXF(0, -y));
 
-	raise_inexact();
+	raise_inexact(new_x);
 
 	if (ax < SQRT_6_EPSILON / 4 && ay < SQRT_6_EPSILON / 4)
 		return (CMPLXF(pio2_hi - (x - pio2_lo), -y));
@@ -365,7 +365,7 @@ catanhf(float complex z)
 		    copysignf(pio2_hi + pio2_lo, y)));
 
 	if (ax < SQRT_3_EPSILON / 2 && ay < SQRT_3_EPSILON / 2) {
-		raise_inexact();
+		raise_inexact(ax);
 		return (z);
 	}
 
Index: src/catrigl.c
===================================================================
--- src/catrigl.c	(revision 1935)
+++ src/catrigl.c	(working copy)
@@ -53,7 +53,7 @@ __FBSDID("$FreeBSD: head/lib/msun/src/catrigl.c 313761
 #define isinf(x)	(fabsl(x) == INFINITY)
 #undef isnan
 #define isnan(x)	((x) != (x))
-#define	raise_inexact()	do { volatile float junk = 1 + tiny; } while(0)
+#define	raise_inexact(x)	do { (x) = 1 + tiny; } while(0)
 #undef signbit
 #define signbit(x)	(__builtin_signbitl(x))
 
@@ -192,7 +192,7 @@ casinhl(long double complex z)
 	if (x == 0 && y == 0)
 		return (z);
 
-	raise_inexact();
+	raise_inexact(new_y);
 
 	if (ax < SQRT_6_EPSILON / 4 && ay < SQRT_6_EPSILON / 4)
 		return (z);
@@ -251,7 +251,7 @@ cacosl(long double complex z)
 	if (x == 1 && y == 0)
 		return (CMPLXL(0, -y));
 
-	raise_inexact();
+	raise_inexact(new_x);
 
 	if (ax < SQRT_6_EPSILON / 4 && ay < SQRT_6_EPSILON / 4)
 		return (CMPLXL(pio2_hi - (x - pio2_lo), -y));
@@ -383,7 +383,7 @@ catanhl(long double complex z)
 		    copysignl(pio2_hi + pio2_lo, y)));
 
 	if (ax < SQRT_3_EPSILON / 2 && ay < SQRT_3_EPSILON / 2) {
-		raise_inexact();
+		raise_inexact(ax);
 		return (z);
 	}
 

-- 
Steve
20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4
20161221 https://www.youtube.com/watch?v=IbCHE-hONow

From owner-freebsd-numerics@freebsd.org  Sat May 13 02:02:32 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id C3E49D6AD76
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Sat, 13 May 2017 02:02:32 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3])
 by mx1.freebsd.org (Postfix) with ESMTP id B1C091F27
 for <freebsd-numerics@freebsd.org>; Sat, 13 May 2017 02:02:32 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: by mailman.ysv.freebsd.org (Postfix)
 id B1231D6AD75; Sat, 13 May 2017 02:02:32 +0000 (UTC)
Delivered-To: numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id AE746D6AD74;
 Sat, 13 May 2017 02:02:32 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail109.syd.optusnet.com.au (mail109.syd.optusnet.com.au
 [211.29.132.80])
 by mx1.freebsd.org (Postfix) with ESMTP id 5D3A11F23;
 Sat, 13 May 2017 02:02:31 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from besplex.bde.org (c122-106-153-191.carlnfd1.nsw.optusnet.com.au
 [122.106.153.191])
 by mail109.syd.optusnet.com.au (Postfix) with ESMTPS id 80712D654C9;
 Sat, 13 May 2017 11:35:54 +1000 (AEST)
Date: Sat, 13 May 2017 11:35:49 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
cc: numerics@freebsd.org, freebsd-hackers@freebsd.org
Subject: Re: catrig[fl].c and inexact
In-Reply-To: <20170512215654.GA82545@troutmask.apl.washington.edu>
Message-ID: <20170513103208.M845@besplex.bde.org>
References: <20170512215654.GA82545@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.2 cv=WvBbCZXv c=1 sm=1 tr=0
 a=Tj3pCpwHnMupdyZSltBt7Q==:117 a=Tj3pCpwHnMupdyZSltBt7Q==:17
 a=kj9zAlcOel0A:10 a=0K0djoc-qRL17_fbz0IA:9 a=CjuIK1q_8ugA:10
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 13 May 2017 02:02:32 -0000

On Fri, 12 May 2017, Steve Kargl wrote:

> So, I've been making improvements to my implementations of
> the half-cycle trig functions.  In doing so, I decide to
> add WARNS=2 to msun/Makefile.  clang 4.0.0 dies with an
> error about an unused variable in raise_inexact() from
> catrig[fl].c.
>
> /usr/home/kargl/trunk/math/libm/msun/src/catrigl.c:195:2: error: unused variable
>      'junk' [-Werror,-Wunused-variable]
>        raise_inexact();
>        ^
> /usr/home/kargl/trunk/math/libm/msun/src/catrigl.c:56:45: note: expanded from
>      macro 'raise_inexact'
> #define raise_inexact() do { volatile float junk = 1 + tiny; } while(0)
>                                            ^
> Grepping catrig.o for the variable 'junk' suggests that 'junk' is
> optimized out (with at least -O2).

Just another bug in clang.  Volatile variables cannot be optimized out
(if they are accessed).

> A quick and dirty patch to achieve the intent of the original
> code follows.  It would be nice if some would like to commit
> the patch.  Of course, you may want to wait for Bruce to
> review the diff.
>
> Index: src/catrig.c
> ===================================================================
> --- src/catrig.c	(revision 1935)
> +++ src/catrig.c	(working copy)
> @@ -37,7 +37,7 @@ __FBSDID("$FreeBSD: head/lib/msun/src/catrig.c 313863
> #define isinf(x)	(fabs(x) == INFINITY)
> #undef isnan
> #define isnan(x)	((x) != (x))
> -#define	raise_inexact()	do { volatile float junk = 1 + tiny; } while(0)
> +#define	raise_inexact(x)	do { (x) = 1 + tiny; } while(0)
> #undef signbit
> #define signbit(x)	(__builtin_signbit(x))
>
> @@ -315,7 +315,7 @@ casinh(double complex z)
> 		return (z);
>
> 	/* All remaining cases are inexact. */
> -	raise_inexact();
> +	raise_inexact(new_y);
>
> 	if (ax < SQRT_6_EPSILON / 4 && ay < SQRT_6_EPSILON / 4)
> 		return (z);

Now it doesn't take compiler bugs to optimize it out, since new_y is not
volatile, and a good compiler would optimize it out in all cases.  new_y
is obviously unused before the early returns, so it doesn't need to be
evalated before the returns as far as the compiler can see.  Later,
new_y is initialized indirectly, and the compiler can see that too (not
so easily, so it can see that raise_inexact() has no effect except possibly
for its side effect of raising inexact for 1 + tiny.

The change might defeat the intent of the original code in another way.
'junk' is intentionally independent of other variables, so that there are
no dependencies on it.  If the compiler doesn't optimize away the assignment
to new_y, then it is probably because it doesn't see that the assignment is
dead, so there is a dependency.

Actually, we want the variable 'junk' to be optimized away.  We only want
the side effect of evaluating 1 + tiny.  Compilers have bugs evaluating
expressions like 1 + tiny, tiny*tiny and huge*huge, and we use assignments
of the result to volatile variables in tens if not hundreds of places to
try to work around compiler bugs.  If that doesn't work here, then all the
other places are probably broken too.  The other places mostly use a static
volatile, while this uses an auto volatile.  'tiny' is also volatile, as
required for the standard magic.  I planned to fix all this magic using
macros like raise_inexact().

Another subtety in the macro is that variable is float instead of double
to possibly allow optimizations.  Since the variable shouldn't be
optimized away, it will waste sizeof(var) for each use of the macro.  A
file scope variable would work better here, but the macro is written to
be self-contained to make it easier to use.  The change also defeats that.

Whether not evaluating 1 + tiny at compile time is a compiler bug is
delicate.  We don't have any C99 compilers yet, since gcc and clang
don't support #pragma FENV_ACCESS ON/OFF.  The pragma should be set
to ON before the magic accesses, but we don't do that because it would
be a lot of churn and we know that the pragma doesn't work.  We more
or less depend on the default state of the pragma being ON, but in
gcc-4.2.1 it is documented as being OFF unless compiled with
-frounding-math when it is documented as being ON, and for clang it
is undocumented.  -frounding-math is too inefficient to use by default,
and another bug in clang is that it is not even supported.

With macro or at least inline wrappers, the #pragma should only be
needed in a few places.

clang-3.9.0 seems to be only partly broken here.  Volatile works correctly
for v = huge*huge and also for v = 1+tiny provided v is static instead of
auto.  It also works to declare 'junk' as __unused.

The following don't work with either clang-3.9.0 or gcc-4.2.1:
- declaring 'junk' as __used (syntax error)
- the expression 1+tiny not assigned to anything, or 1+tiny assigned to
   an __unused non-volatile variable.  This gives the weird code of loading
   'tiny' (because the compiler handles read accesses to volatile variables
   correctly), but not adding 1 (because the compiler doesn't know that
   adding 1 has a side effect, or is optimizing for FENV_ACCESS OFF).

The following is documented to not work with gcc-4.2.1:
- #pragma FENV_ACCESS ON.  clang handles this correctly by warning that
   this is unsupported, but this makes it even more unusable.  gcc-4.2.1
   doesn't warn, so it is hard to tell if it worked.

Bruce

From owner-freebsd-numerics@freebsd.org  Sat May 13 02:05:53 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id C56D4D6AF8A
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Sat, 13 May 2017 02:05:53 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3])
 by mx1.freebsd.org (Postfix) with ESMTP id B33BAB7
 for <freebsd-numerics@freebsd.org>; Sat, 13 May 2017 02:05:53 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: by mailman.ysv.freebsd.org (Postfix)
 id B2ACCD6AF89; Sat, 13 May 2017 02:05:53 +0000 (UTC)
Delivered-To: numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id B23B5D6AF88;
 Sat, 13 May 2017 02:05:53 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au
 [211.29.132.42]) by mx1.freebsd.org (Postfix) with ESMTP id 7E5D7B6;
 Sat, 13 May 2017 02:05:53 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from besplex.bde.org (c122-106-153-191.carlnfd1.nsw.optusnet.com.au
 [122.106.153.191])
 by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id 847803CB4B9;
 Sat, 13 May 2017 11:44:45 +1000 (AEST)
Date: Sat, 13 May 2017 11:44:41 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Bruce Evans <brde@optusnet.com.au>
cc: Steve Kargl <sgk@troutmask.apl.washington.edu>, numerics@freebsd.org, 
 freebsd-hackers@freebsd.org
Subject: Re: catrig[fl].c and inexact
In-Reply-To: <20170513103208.M845@besplex.bde.org>
Message-ID: <20170513113852.M1045@besplex.bde.org>
References: <20170512215654.GA82545@troutmask.apl.washington.edu>
 <20170513103208.M845@besplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.2 cv=KeqiiUQD c=1 sm=1 tr=0
 a=Tj3pCpwHnMupdyZSltBt7Q==:117 a=Tj3pCpwHnMupdyZSltBt7Q==:17
 a=kj9zAlcOel0A:10 a=UZVNq-k9JjFpydrfkmMA:9 a=CjuIK1q_8ugA:10
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 13 May 2017 02:05:53 -0000

On Sat, 13 May 2017, Bruce Evans wrote:

> clang-3.9.0 seems to be only partly broken here.  Volatile works correctly
> for v = huge*huge and also for v = 1+tiny provided v is static instead of
> auto.  It also works to declare 'junk' as __unused.

PS: only __unused on an auto volatile variable gives the intended but not
quite wanted behaviour, by reminding the compiler than assignments to
volatile variables are used, by spelling 'used' as __unused.  This results
in assigning to a variable on the stack in most cases, so there is no
wastage of static space.  Normal FP operations like this are usually the
fastest way to set FP exception flags (50-100 times faster than an fenv
access on i386).  The only sub-optimal part is assigning the result to
memory.

Bruce

From owner-freebsd-numerics@freebsd.org  Sat May 13 06:08:05 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2BB8DD6A5A5
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Sat, 13 May 2017 06:08:05 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org
 [IPv6:2001:1900:2254:206a::50:5])
 by mx1.freebsd.org (Postfix) with ESMTP id 1693818B3
 for <freebsd-numerics@freebsd.org>; Sat, 13 May 2017 06:08:05 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: by mailman.ysv.freebsd.org (Postfix)
 id 15E7DD6A5A4; Sat, 13 May 2017 06:08:05 +0000 (UTC)
Delivered-To: numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 13B43D6A5A3;
 Sat, 13 May 2017 06:08:05 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "troutmask", Issuer "troutmask" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id F2C3218B2;
 Sat, 13 May 2017 06:08:04 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (localhost [127.0.0.1])
 by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v4D6839N084468
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Fri, 12 May 2017 23:08:03 -0700 (PDT)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v4D683TW084467;
 Fri, 12 May 2017 23:08:03 -0700 (PDT) (envelope-from sgk)
Date: Fri, 12 May 2017 23:08:03 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Bruce Evans <brde@optusnet.com.au>
Cc: numerics@freebsd.org, freebsd-hackers@freebsd.org
Subject: Re: catrig[fl].c and inexact
Message-ID: <20170513060803.GA84399@troutmask.apl.washington.edu>
Reply-To: sgk@troutmask.apl.washington.edu
References: <20170512215654.GA82545@troutmask.apl.washington.edu>
 <20170513103208.M845@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20170513103208.M845@besplex.bde.org>
User-Agent: Mutt/1.7.2 (2016-11-26)
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 13 May 2017 06:08:05 -0000

On Sat, May 13, 2017 at 11:35:49AM +1000, Bruce Evans wrote:
> On Fri, 12 May 2017, Steve Kargl wrote:
> 
> > So, I've been making improvements to my implementations of
> > the half-cycle trig functions.  In doing so, I decide to
> > add WARNS=2 to msun/Makefile.  clang 4.0.0 dies with an
> > error about an unused variable in raise_inexact() from
> > catrig[fl].c.
> >
> > /usr/home/kargl/trunk/math/libm/msun/src/catrigl.c:195:2: error: unused variable
> >      'junk' [-Werror,-Wunused-variable]
> >        raise_inexact();
> >        ^
> > /usr/home/kargl/trunk/math/libm/msun/src/catrigl.c:56:45: note: expanded from
> >      macro 'raise_inexact'
> > #define raise_inexact() do { volatile float junk = 1 + tiny; } while(0)
> >                                            ^
> > Grepping catrig.o for the variable 'junk' suggests that 'junk' is
> > optimized out (with at least -O2).
> 
> Just another bug in clang.  Volatile variables cannot be optimized out
> (if they are accessed).

Does this depend on scope?  'junk' is local to the do {...} while(0);
construct.  Can a compiler completely eliminate a do-nothing scoping
unit?  I don't know C well enough to know.  I do know what I have
observed in clang.

> > A quick and dirty patch to achieve the intent of the original
> > code follows.  It would be nice if some would like to commit
> > the patch.  Of course, you may want to wait for Bruce to
> > review the diff.
> >
> > Index: src/catrig.c
> > ===================================================================
> > --- src/catrig.c	(revision 1935)
> > +++ src/catrig.c	(working copy)
> > @@ -37,7 +37,7 @@ __FBSDID("$FreeBSD: head/lib/msun/src/catrig.c 313863
> > #define isinf(x)	(fabs(x) == INFINITY)
> > #undef isnan
> > #define isnan(x)	((x) != (x))
> > -#define	raise_inexact()	do { volatile float junk = 1 + tiny; } while(0)
> > +#define	raise_inexact(x)	do { (x) = 1 + tiny; } while(0)
> > #undef signbit
> > #define signbit(x)	(__builtin_signbit(x))
> >
> > @@ -315,7 +315,7 @@ casinh(double complex z)
> > 		return (z);
> >
> > 	/* All remaining cases are inexact. */
> > -	raise_inexact();
> > +	raise_inexact(new_y);
> >
> > 	if (ax < SQRT_6_EPSILON / 4 && ay < SQRT_6_EPSILON / 4)
> > 		return (z);
> 
> Now it doesn't take compiler bugs to optimize it out, since new_y is not
> volatile, and a good compiler would optimize it out in all cases.

I've yet to find a good compiler.  They all seem to have bugs.

> new_y
> is obviously unused before the early returns, so it doesn't need to be
> evalated before the returns as far as the compiler can see.  Later,
> new_y is initialized indirectly, and the compiler can see that too (not
> so easily, so it can see that raise_inexact() has no effect except possibly
> for its side effect of raising inexact for 1 + tiny.

The later call passes the address of new_y to the routine.  How
can the compiler short of inlining the called routine know that
the value assigned to new_y isn't used?

> The change might defeat the intent of the original code in another way.
> 'junk' is intentionally independent of other variables, so that there are
> no dependencies on it.  If the compiler doesn't optimize away the assignment
> to new_y, then it is probably because it doesn't see that the assignment is
> dead, so there is a dependency.

It may defeat the intent of the original code, but it seems that
the original code provokes undefined behavior.

> Actually, we want the variable 'junk' to be optimized away.  We only want
> the side effect of evaluating 1 + tiny.  Compilers have bugs evaluating
> expressions like 1 + tiny, tiny*tiny and huge*huge, and we use assignments
> of the result to volatile variables in tens if not hundreds of places to
> try to work around compiler bugs.  If that doesn't work here, then all the
> other places are probably broken too.  The other places mostly use a static
> volatile, while this uses an auto volatile.  'tiny' is also volatile, as
> required for the standard magic.  I planned to fix all this magic using
> macros like raise_inexact().

If you plan to fix the magic with raise_inexact, then please
test with a suite of compilers.  AFAICT, clang is optimizing
out the code.  I haven't written a testcase to demonstrate this
as I have other irons in the fire.

-- 
Steve
20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4
20161221 https://www.youtube.com/watch?v=IbCHE-hONow

From owner-freebsd-numerics@freebsd.org  Sat May 13 10:40:43 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 814C5D696D9
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Sat, 13 May 2017 10:40:43 +0000 (UTC)
 (envelope-from markmi@dsl-only.net)
Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3])
 by mx1.freebsd.org (Postfix) with ESMTP id 68BAD11B1
 for <freebsd-numerics@freebsd.org>; Sat, 13 May 2017 10:40:43 +0000 (UTC)
 (envelope-from markmi@dsl-only.net)
Received: by mailman.ysv.freebsd.org (Postfix)
 id 682CAD696D7; Sat, 13 May 2017 10:40:43 +0000 (UTC)
Delivered-To: numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 67D78D696D6
 for <numerics@mailman.ysv.freebsd.org>; Sat, 13 May 2017 10:40:43 +0000 (UTC)
 (envelope-from markmi@dsl-only.net)
Received: from asp.reflexion.net (outbound-mail-210-8.reflexion.net
 [208.70.210.8])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 2CB5311AE
 for <numerics@freebsd.org>; Sat, 13 May 2017 10:40:42 +0000 (UTC)
 (envelope-from markmi@dsl-only.net)
Received: (qmail 4544 invoked from network); 13 May 2017 10:40:41 -0000
Received: from unknown (HELO mail-cs-02.app.dca.reflexion.local) (10.81.19.2)
 by 0 (rfx-qmail) with SMTP; 13 May 2017 10:40:41 -0000
Received: by mail-cs-02.app.dca.reflexion.local
 (Reflexion email security v8.40.0) with SMTP;
 Sat, 13 May 2017 06:40:41 -0400 (EDT)
Received: (qmail 18264 invoked from network); 13 May 2017 10:40:40 -0000
Received: from unknown (HELO iron2.pdx.net) (69.64.224.71)
 by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 13 May 2017 10:40:40 -0000
Received: from [192.168.1.106] (c-76-115-7-162.hsd1.or.comcast.net
 [76.115.7.162])
 by iron2.pdx.net (Postfix) with ESMTPSA id 29F9BEC8697;
 Sat, 13 May 2017 03:40:40 -0700 (PDT)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\))
Subject: Re: catrig[fl].c and inexact
From: Mark Millard <markmi@dsl-only.net>
In-Reply-To: <20170513060803.GA84399@troutmask.apl.washington.edu>
Date: Sat, 13 May 2017 03:40:39 -0700
Cc: Bruce Evans <brde@optusnet.com.au>, freebsd-hackers@freebsd.org,
 numerics@freebsd.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <DC2DA938-6A07-4CB0-AFB6-038368971B77@dsl-only.net>
References: <20170512215654.GA82545@troutmask.apl.washington.edu>
 <20170513103208.M845@besplex.bde.org>
 <20170513060803.GA84399@troutmask.apl.washington.edu>
To: sgk@troutmask.apl.washington.edu
X-Mailer: Apple Mail (2.3273)
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 13 May 2017 10:40:43 -0000


On 2017-May-12, at 11:08 PM, Steve Kargl <sgk at =
troutmask.apl.washington.edu> wrote:

> On Sat, May 13, 2017 at 11:35:49AM +1000, Bruce Evans wrote:
>> On Fri, 12 May 2017, Steve Kargl wrote:
>>=20
>>> So, I've been making improvements to my implementations of
>>> the half-cycle trig functions.  In doing so, I decide to
>>> add WARNS=3D2 to msun/Makefile.  clang 4.0.0 dies with an
>>> error about an unused variable in raise_inexact() from
>>> catrig[fl].c.
>>>=20
>>> /usr/home/kargl/trunk/math/libm/msun/src/catrigl.c:195:2: error: =
unused variable
>>>     'junk' [-Werror,-Wunused-variable]
>>>       raise_inexact();
>>>       ^
>>> /usr/home/kargl/trunk/math/libm/msun/src/catrigl.c:56:45: note: =
expanded from
>>>     macro 'raise_inexact'
>>> #define raise_inexact() do { volatile float junk =3D 1 + tiny; } =
while(0)
>>>                                           ^
>>> Grepping catrig.o for the variable 'junk' suggests that 'junk' is
>>> optimized out (with at least -O2).
>>=20
>> Just another bug in clang.  Volatile variables cannot be optimized =
out
>> (if they are accessed).
>=20
> Does this depend on scope?  'junk' is local to the do {...} while(0);
> construct.  Can a compiler completely eliminate a do-nothing scoping
> unit?  I don't know C well enough to know.  I do know what I have
> observed in clang.

[This note ignores other standards than C99/C11
that might place other constraints. And I've done
no checking of compiler results, I've just looked
at a couple of the C standards.]

Note: I've not looking to tiny's declaration. It
may contribute in a way not covered below.

Unfortunately the declarator in an init-declarator
that has an initializer is not part of an
expression. The rules for volatile are tied to uses
in expressions, not to the declarator. (Which is a
hole in the language definition as far as I can
tell.)

There is one part of the wording that might mitigate
this, tied to a full declarator having a sequence
point at its end despite the declarator itself not
being an expression, even if its initializer is
one. There is another wording detail that might
as well.

Still, overall it would seem safer to be sure there
is an expression that references the volatile object,
not having only its declarator. But I would not take
even that as a guarantee under the C standards.

It may seem a silly difference but:

do { volatile float junk=3D1; junk+=3Dtiny; } while(0)

may well be a better way of writing the "must
evaluate" part of the intent simply because
junk is used in an expression. Also it has both read
and write access, so is a little more "used". The
sequence point before the assignment can help avoid
compile-time evaluation as well.


Details if you care. . .

I used the C99 and C11 definitions here, I
reference C11 section numbering but C99 agrees
as I remember.

5.1.2.3 Program execution says:

"Accessing a volatile object, modifying an object,
modifying a file, or calling a function that does
any of those operations are all side effects,
which are changes in the state of the execution
environment. Evaluation of an expression may
produce side effects."

Note that raising inexact does not fit in the
definition of side effect as far as I can tell.
So a compiler need not consider such a thing
for side-effect issues if I understand right.

[C11 specific wording:] "The presence of a
sequence point between the evaluations of
expressions A and B implies that every value
computation and side effect associated with A
is sequenced before every value compuation and
side effect associated with B."

[C99 is similar but is before the detailed
"sequenced before" definition.]

"An actual implementation need not evaluate part
of an expression if it can deduce that its value
is not used and that no needed side effects are
produced (including any caused by calling a
function or accessing a volatile object)."

Can a accessing a volatile object ever be
classified as having "no needed side effects"?
More on this later. [Remember what "side effect"
excludes, as noted earlier. So some consequences
need not be considered by the compiler, all in
the name of optimizations.]

6.7.3 Type Qualifiers says:

"An object that has volatile-qualified type . . .
Therefore any expression referring to such as object
shall be evaluated strictly according to the rules
of the abstract machine, as described in 5.1.2.3.
Furthermore, at every sequence point the value last
stored in the object shall agree with that prescribed
by the abstract machine, except as modified by the
unknown factors mentioned previously. What constitutes
an access to an object that has volatile-qualified
type is implementation-defined."

This part is mixed: what the sequence point wording
giveth the last sentence taketh away. (More later.)

It also says in a note (134):

"A volatile declaration may be used to describe an
object corresponding to a memory-mapped input/output
port or an object accessed by an asynchronously
interrupting function. Actions on objects so declared
shall, not be "optimized out" by an implementation
or reordered except as permitted by the rules for
evaluating expressions."

Since rules for evaluating expressions are not rules
for declarators (vs. initializers), this could be
read as not allowing the "optimize out". (But the
abstract machine's description is not explicit about
declarators for such issues.)

The C99 Rationale:

The C99 Rationale was explicit about static
volatile for a memory mapped I/O register,
static const volatile for a memory mapped
input port, const volatile and volatile
for variables shared across processes. To
some extent this identifies examples of
contexts with "needed side effects" that
have hardware details to take into account.

For taking into account hardware details:
". . . Whatever decision are adopted on such
issues must be documented, as volatile access
is implementation-defined".

For volatile use with no explicitly identified
hardware details: volatile would appear to be
no more than a potential hint for such a
context, not an effective requirement. The
implementation-defined status could allow lack
of access.

Overall, based on what I see in the C99 and
C11 language definitions, I'd not be willing to
declare clang wrong (if it did optimize out junk),
even with my alternative formulation.

C does not have an explicit Principle of Least
Astonishment as a official guideline to its
interpretation and the rules are very biased to
allowing so-called optimizations. "junk" does not
fit with being shared across processes (for
example its address is not handed to anything)
and is not static or even global. There is no
known type of potential context for specific
hardware details that would need to be taken
into account for junk. That in turn leaves open
not accessing it at all as far as I can tell.


=3D=3D=3D
Mark Millard
markmi at dsl-only.net


From owner-freebsd-numerics@freebsd.org  Sat May 13 11:01:15 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 18869D6A250
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Sat, 13 May 2017 11:01:15 +0000 (UTC)
 (envelope-from dimitry@andric.com)
Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3])
 by mx1.freebsd.org (Postfix) with ESMTP id 02B12A97
 for <freebsd-numerics@freebsd.org>; Sat, 13 May 2017 11:01:15 +0000 (UTC)
 (envelope-from dimitry@andric.com)
Received: by mailman.ysv.freebsd.org (Postfix)
 id 02113D6A24F; Sat, 13 May 2017 11:01:15 +0000 (UTC)
Delivered-To: numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0192CD6A24E;
 Sat, 13 May 2017 11:01:15 +0000 (UTC)
 (envelope-from dimitry@andric.com)
Received: from tensor.andric.com (tensor.andric.com [87.251.56.140])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "tensor.andric.com",
 Issuer "COMODO RSA Domain Validation Secure Server CA" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id BD736A95;
 Sat, 13 May 2017 11:01:14 +0000 (UTC)
 (envelope-from dimitry@andric.com)
Received: from [IPv6:2001:470:7a58::a8c1:a7f4:edbc:3331] (unknown
 [IPv6:2001:470:7a58:0:a8c1:a7f4:edbc:3331])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by tensor.andric.com (Postfix) with ESMTPSA id 5AF543FD73;
 Sat, 13 May 2017 13:01:12 +0200 (CEST)
From: Dimitry Andric <dimitry@andric.com>
Message-Id: <42D3F536-42D7-4097-A500-0EF939584592@andric.com>
Content-Type: multipart/signed;
 boundary="Apple-Mail=_3FD2EBD2-6E86-4877-858B-D4C2722775DB";
 protocol="application/pgp-signature"; micalg=pgp-sha1
Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\))
Subject: Re: catrig[fl].c and inexact
Date: Sat, 13 May 2017 13:00:59 +0200
In-Reply-To: <20170512215654.GA82545@troutmask.apl.washington.edu>
Cc: numerics@freebsd.org,
 freebsd-hackers@freebsd.org
To: sgk@troutmask.apl.washington.edu
References: <20170512215654.GA82545@troutmask.apl.washington.edu>
X-Mailer: Apple Mail (2.3273)
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 13 May 2017 11:01:15 -0000


--Apple-Mail=_3FD2EBD2-6E86-4877-858B-D4C2722775DB
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii

On 12 May 2017, at 23:56, Steve Kargl <sgk@troutmask.apl.washington.edu> =
wrote:
>=20
> So, I've been making improvements to my implementations of
> the half-cycle trig functions.  In doing so, I decide to
> add WARNS=3D2 to msun/Makefile.  clang 4.0.0 dies with an
> error about an unused variable in raise_inexact() from
> catrig[fl].c.
>=20
> /usr/home/kargl/trunk/math/libm/msun/src/catrigl.c:195:2: error: =
unused variable
>      'junk' [-Werror,-Wunused-variable]
>        raise_inexact();
>        ^
> /usr/home/kargl/trunk/math/libm/msun/src/catrigl.c:56:45: note: =
expanded from
>      macro 'raise_inexact'
> #define raise_inexact() do { volatile float junk =3D 1 + tiny; } =
while(0)
>                                            ^
> Grepping catrig.o for the variable 'junk' suggests that 'junk' is
> optimized out (with at least -O2).

As far as I can see, this is not the case.  The simplest reduction is
this:

static const volatile float tiny =3D 0x1p-100;

void f(void)
{
  volatile float junk =3D 1 + tiny;
}

For i386-freebsd, this results in the following (boilerplate left out):

$ clang-4.0.0 -target i386-freebsd -O2 -S vol1.c -o -
[...]
	pushl	%ebp
	movl	%esp, %ebp
	pushl	%eax
	fld1
	fadds	tiny
	fstps	-4(%ebp)
	addl	$4, %esp
	popl	%ebp
	retl
[...]
tiny:
	.long	226492416               # float 7.88860905E-31

For amd64-freebsd:

$ clang-4.0.0 -target amd64-freebsd -O2 -S vol1.c -o -
[...]
.LCPI0_0:
	.long	1065353216              # float 1
[...]
	pushq	%rbp
	movq	%rsp, %rbp
	movss	tiny(%rip), %xmm0       # xmm0 =3D mem[0],zero,zero,zero
	addss	.LCPI0_0(%rip), %xmm0
	movss	%xmm0, -4(%rbp)
	popq	%rbp
	retq
[...]
tiny:
	.long	226492416               # float 7.88860905E-31

I also tried -O3, but it doesn't change the result.

-Dimitry


--Apple-Mail=_3FD2EBD2-6E86-4877-858B-D4C2722775DB
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename=signature.asc
Content-Type: application/pgp-signature;
	name=signature.asc
Content-Description: Message signed with OpenPGP

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.30

iEYEARECAAYFAlkW53gACgkQsF6jCi4glqPkNACfTDp+YbDQinSkExo64JsidEmj
bWMAnA3VM6qYzUFY/5BpESn9zX3x2nxk
=NqYy
-----END PGP SIGNATURE-----

--Apple-Mail=_3FD2EBD2-6E86-4877-858B-D4C2722775DB--

From owner-freebsd-numerics@freebsd.org  Sat May 13 13:08:37 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 71B6FD6A37C
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Sat, 13 May 2017 13:08:37 +0000 (UTC)
 (envelope-from dimitry@andric.com)
Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3])
 by mx1.freebsd.org (Postfix) with ESMTP id 5A6F31BE2
 for <freebsd-numerics@freebsd.org>; Sat, 13 May 2017 13:08:37 +0000 (UTC)
 (envelope-from dimitry@andric.com)
Received: by mailman.ysv.freebsd.org (Postfix)
 id 59C42D6A37B; Sat, 13 May 2017 13:08:37 +0000 (UTC)
Delivered-To: numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 577DAD6A37A;
 Sat, 13 May 2017 13:08:37 +0000 (UTC)
 (envelope-from dimitry@andric.com)
Received: from tensor.andric.com (tensor.andric.com [IPv6:2001:470:7a58:1::1])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
 bits)) (Client CN "tensor.andric.com",
 Issuer "COMODO RSA Domain Validation Secure Server CA" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id E16B51BE1;
 Sat, 13 May 2017 13:08:36 +0000 (UTC)
 (envelope-from dimitry@andric.com)
Received: from [IPv6:2001:470:7a58::a8c1:a7f4:edbc:3331] (unknown
 [IPv6:2001:470:7a58:0:a8c1:a7f4:edbc:3331])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by tensor.andric.com (Postfix) with ESMTPSA id 8EF363FD81;
 Sat, 13 May 2017 15:08:33 +0200 (CEST)
From: Dimitry Andric <dimitry@andric.com>
Message-Id: <F5F8736B-D7E1-48AD-BC6C-8C74AF0A3272@andric.com>
Content-Type: multipart/signed;
 boundary="Apple-Mail=_4FBC88C3-4C7E-4D97-8BD0-773DBE95BCD3";
 protocol="application/pgp-signature"; micalg=pgp-sha1
Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\))
Subject: Re: catrig[fl].c and inexact
Date: Sat, 13 May 2017 15:08:26 +0200
In-Reply-To: <20170513060803.GA84399@troutmask.apl.washington.edu>
Cc: Bruce Evans <brde@optusnet.com.au>, freebsd-hackers@freebsd.org,
 numerics@freebsd.org
To: sgk@troutmask.apl.washington.edu
References: <20170512215654.GA82545@troutmask.apl.washington.edu>
 <20170513103208.M845@besplex.bde.org>
 <20170513060803.GA84399@troutmask.apl.washington.edu>
X-Mailer: Apple Mail (2.3273)
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 13 May 2017 13:08:37 -0000


--Apple-Mail=_4FBC88C3-4C7E-4D97-8BD0-773DBE95BCD3
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii

On 13 May 2017, at 08:08, Steve Kargl <sgk@troutmask.apl.washington.edu> =
wrote:
>=20
> On Sat, May 13, 2017 at 11:35:49AM +1000, Bruce Evans wrote:
>> On Fri, 12 May 2017, Steve Kargl wrote:
...
>> required for the standard magic.  I planned to fix all this magic =
using
>> macros like raise_inexact().
>=20
> If you plan to fix the magic with raise_inexact, then please
> test with a suite of compilers.  AFAICT, clang is optimizing
> out the code.  I haven't written a testcase to demonstrate this
> as I have other irons in the fire.

Using the full catrig.c and -O3, I tried gcc 4.2.1, 4.7.4, 4.8.5, 4.9.4,
5.4.0, 6.3.0 and 7.0.1, in addition to clang 3.4.1, 3.8.0, 3.9.1, 4.0.0
and 5.0.0.  All versions of gcc produced something similar to the
following for i386:

# /usr/src/lib/msun/src/catrig.c:314:   if (x =3D=3D 0 && y =3D=3D 0)
        .loc 1 314 0
        fldz
        fucom   %st(3)  #
        fnstsw  %ax     # tmp262
        sahf
        setne   %al     #, tmp270
        setnp   %dl     #, tmp259
        subl    $1, %eax        #, tmp270
        testb   %al, %dl        # tmp270, tmp259
        je      .L176   #,
        fucomp  %st(1)  #
        fnstsw  %ax     # tmp281
        sahf
        setne   %al     #, tmp289
        setnp   %dl     #, tmp278
        subl    $1, %eax        #, tmp289
        testb   %al, %dl        # tmp289, tmp278
        je      .L37    #,
        fstp    %st(3)  #
        fstp    %st(0)  #
        jmp     .L153   #
[...]
.L176:
        fstp    %st(0)  #
.L37:
.LBB25:
# /usr/src/lib/msun/src/catrig.c:318:   raise_inexact();
        flds    tiny    # tiny
        fadds   .LC2    #
        fstps   120(%esp)       # junk

and for amd64:

# /usr/src/lib/msun/src/catrig.c:314:   if (x =3D=3D 0 && y =3D=3D 0)
        .loc 1 314 0
        pxor    %xmm7, %xmm7    # tmp386
        ucomisd %xmm7, %xmm3    # tmp386, z
        setnp   %dl     #, tmp258
        cmovne  %eax, %edx      # tmp258,, tmp207, tmp254
        testb   %dl, %dl        # tmp254
        je      .L34    #,
        ucomisd %xmm7, %xmm1    # tmp386, z
        setnp   %dl     #, tmp266
        cmove   %edx, %eax      # tmp266,, tmp262
        testb   %al, %al        # tmp262
        je      .L34    #,
[...]
.L34:
.LBB33:
# /usr/src/lib/msun/src/catrig.c:318:   raise_inexact();
        movss   tiny(%rip), %xmm0       # tiny, tiny.0_28
        addss   .LC13(%rip), %xmm0      #, _29
        movss   %xmm0, 188(%rsp)        # _29, junk

All versions of clang produced something similar to the following for
i386:

        .loc    1 314 8 is_stmt 1       # =
/usr/src/lib/msun/src/catrig.c:314:8
        fldz
        .loc    1 314 13 is_stmt 0      # =
/usr/src/lib/msun/src/catrig.c:314:13
        fxch    %st(1)
        fucom   %st(1)
        fnstsw  %ax
        sahf
        jne     .LBB0_19
        jp      .LBB0_19
        .loc    1 0 13                  # =
/usr/src/lib/msun/src/catrig.c:0:13
        fxch    %st(3)
        fucom   %st(1)
        fstp    %st(1)
        fnstsw  %ax
        sahf
        fldz
        fxch    %st(1)
        fxch    %st(3)
        jne     .LBB0_19
        jp      .LBB0_19
[...]
.LBB0_19:                               # %do.body
        .loc    1 0 8 is_stmt 0         # =
/usr/src/lib/msun/src/catrig.c:0:8
        fstp    %st(1)
        .loc    1 318 2 is_stmt 1       # =
/usr/src/lib/msun/src/catrig.c:318:2
        fld1
        fadds   tiny
        fstps   168(%esp)

and for amd64:

        .loc    1 314 8 is_stmt 1       # =
/usr/src/lib/msun/src/catrig.c:314:8
        pxor    %xmm2, %xmm2
        .loc    1 314 13 is_stmt 0      # =
/usr/src/lib/msun/src/catrig.c:314:13
        ucomisd %xmm2, %xmm4
        jne     .LBB0_15
        jp      .LBB0_15
        .loc    1 0 13                  # =
/usr/src/lib/msun/src/catrig.c:0:13
        ucomisd %xmm2, %xmm3
        jne     .LBB0_15
        jnp     .LBB0_21
.LBB0_15:                               # %do.body
        .loc    1 318 2 is_stmt 1       # =
/usr/src/lib/msun/src/catrig.c:318:2
        movss   tiny(%rip), %xmm2       # xmm2 =3D mem[0],zero,zero,zero
        addss   .LCPI0_2(%rip), %xmm2
.Ltmp11:
        movss   %xmm2, -16(%rbp)

E.g., these all look good, at least with regards to not optimizing out
the desired addition.

The only compiler I could find that does optimize everything away (at
least in the simplified test case), is the Intel compiler:

https://godbolt.org/g/g1UT2m

-Dimitry


--Apple-Mail=_4FBC88C3-4C7E-4D97-8BD0-773DBE95BCD3
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename=signature.asc
Content-Type: application/pgp-signature;
	name=signature.asc
Content-Description: Message signed with OpenPGP

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.30

iEYEARECAAYFAlkXBVEACgkQsF6jCi4glqP6KQCg2xk6WB11svnu92R6Rr2NtmO5
9TIAoK00DaX+gGpjflMpSreyQ5iVCdy0
=FHkh
-----END PGP SIGNATURE-----

--Apple-Mail=_4FBC88C3-4C7E-4D97-8BD0-773DBE95BCD3--

From owner-freebsd-numerics@freebsd.org  Sat May 13 16:05:38 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 151A2D6BA9F
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Sat, 13 May 2017 16:05:38 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org
 [IPv6:2001:1900:2254:206a::50:5])
 by mx1.freebsd.org (Postfix) with ESMTP id 0201C18F9
 for <freebsd-numerics@freebsd.org>; Sat, 13 May 2017 16:05:38 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: by mailman.ysv.freebsd.org (Postfix)
 id 01463D6BA9E; Sat, 13 May 2017 16:05:38 +0000 (UTC)
Delivered-To: numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id F338FD6BA9D;
 Sat, 13 May 2017 16:05:37 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail109.syd.optusnet.com.au (mail109.syd.optusnet.com.au
 [211.29.132.80])
 by mx1.freebsd.org (Postfix) with ESMTP id 9E66718F7;
 Sat, 13 May 2017 16:05:37 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from besplex.bde.org (c122-106-153-191.carlnfd1.nsw.optusnet.com.au
 [122.106.153.191])
 by mail109.syd.optusnet.com.au (Postfix) with ESMTPS id 44982D6B4B0;
 Sun, 14 May 2017 02:05:33 +1000 (AEST)
Date: Sun, 14 May 2017 02:05:33 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
cc: numerics@freebsd.org, freebsd-hackers@freebsd.org
Subject: Re: catrig[fl].c and inexact
In-Reply-To: <20170513060803.GA84399@troutmask.apl.washington.edu>
Message-ID: <20170514011600.D1038@besplex.bde.org>
References: <20170512215654.GA82545@troutmask.apl.washington.edu>
 <20170513103208.M845@besplex.bde.org>
 <20170513060803.GA84399@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.2 cv=KeqiiUQD c=1 sm=1 tr=0
 a=Tj3pCpwHnMupdyZSltBt7Q==:117 a=Tj3pCpwHnMupdyZSltBt7Q==:17
 a=kj9zAlcOel0A:10 a=NOglxHdSkPoQZBT6KtcA:9 a=CjuIK1q_8ugA:10
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 13 May 2017 16:05:38 -0000

On Fri, 12 May 2017, Steve Kargl wrote:

> On Sat, May 13, 2017 at 11:35:49AM +1000, Bruce Evans wrote:
>> On Fri, 12 May 2017, Steve Kargl wrote:
>>
>>> ...
>>> /usr/home/kargl/trunk/math/libm/msun/src/catrigl.c:56:45: note: expanded from
>>>      macro 'raise_inexact'
>>> #define raise_inexact() do { volatile float junk = 1 + tiny; } while(0)
>>>                                            ^
>>> Grepping catrig.o for the variable 'junk' suggests that 'junk' is
>>> optimized out (with at least -O2).

It is a local variable, so should be and is allocated on the stack, so
you will never find it using grep.  The problem seems to be that all
compilers generated the intended code, but clang warns anyway.

>> Just another bug in clang.  Volatile variables cannot be optimized out
>> (if they are accessed).
>
> Does this depend on scope?  'junk' is local to the do {...} while(0);
> construct.  Can a compiler completely eliminate a do-nothing scoping
> unit?  I don't know C well enough to know.  I do know what I have
> observed in clang.

The semantics of volatile, but as a practical matter standards shouldn't
specify much and compilers should be very conservative.

BTW, I recently noticed that volatile doesn't work right in bus space
macros.  Some reduce to *(volatile int *)var = val, where var is for
memory mapped-i/o that takes 10000 times as long as normal memory to
access.  Compilers still unroll loops setting such variables.  This is
only a pessimization for space.

>>> ...
>>> @@ -315,7 +315,7 @@ casinh(double complex z)
>>> 		return (z);
>>>
>>> 	/* All remaining cases are inexact. */
>>> -	raise_inexact();
>>> +	raise_inexact(new_y);
>>>
>>> 	if (ax < SQRT_6_EPSILON / 4 && ay < SQRT_6_EPSILON / 4)
>>> 		return (z);
>>
>> Now it doesn't take compiler bugs to optimize it out, since new_y is not
>> volatile, and a good compiler would optimize it out in all cases.
>
> I've yet to find a good compiler.  They all seem to have bugs.
>
>> new_y
>> is obviously unused before the early returns, so it doesn't need to be
>> evalated before the returns as far as the compiler can see.  Later,
>> new_y is initialized indirectly, and the compiler can see that too (not
>> so easily, so it can see that raise_inexact() has no effect except possibly
>> for its side effect of raising inexact for 1 + tiny.
>
> The later call passes the address of new_y to the routine.  How
> can the compiler short of inlining the called routine know that
> the value assigned to new_y isn't used?

The compiler does full inlining even when you don't want it.  Full
analysis of the whole source file is fundamental for generating useful
warnings with -Wunused.  Without full analysis, the compiler would
have to assume that new_y is used uninitialized and either suppress
warnings for all variables that might be initialized indirectly
(including via aliased pointers), or generate many bogus warnings
that variables "might be" used uninitialized.  Old compilers mostly
did the latter, and we still see ocasional spurious warnings from
gcc-4.2.1.

Old compilers also have man pages in which this is partly documented.
gcc-3.3.3(1) says that:
- Wuninitialized is null without -O
- Wuninitialized is never generated for volatile variables
- Wuninitialized is not the default since gcc is not smart enough to
   handle it well
gcc-4.2.1(1) says much the same, plus that -Wall implies -Wuninitialized.
It setill says that the compiler is not smart, and doesn't seem to document 
improvements that make this warning reasonable as the default with -Wall.
This is mostly because -O now implies -funit-at-a-time, which I usually
don't want, but which gives the full analysis needed for -Wunitialized
and -Wunused.  I usually don't want this because:
- it slows down compilation
- it allows unwanted inlining
- it allows unportable code.
clang doesn't support -funit-at-a-time.

>> The change might defeat the intent of the original code in another way.
>> 'junk' is intentionally independent of other variables, so that there are
>> no dependencies on it.  If the compiler doesn't optimize away the assignment
>> to new_y, then it is probably because it doesn't see that the assignment is
>> dead, so there is a dependency.
>
> It may defeat the intent of the original code, but it seems that
> the original code provokes undefined behavior.

Defined, but perhaps not what is wanted.  It is using -W flags that gives
undefined behaviour.  They are undefined by the C standard, and also
undefined by compilers with stub man pages.

>> Actually, we want the variable 'junk' to be optimized away.  We only want
>> the side effect of evaluating 1 + tiny.  Compilers have bugs evaluating
>> expressions like 1 + tiny, tiny*tiny and huge*huge, and we use assignments
>> of the result to volatile variables in tens if not hundreds of places to
>> try to work around compiler bugs.  If that doesn't work here, then all the
>> other places are probably broken too.  The other places mostly use a static
>> volatile, while this uses an auto volatile.  'tiny' is also volatile, as
>> required for the standard magic.  I planned to fix all this magic using
>> macros like raise_inexact().
>
> If you plan to fix the magic with raise_inexact, then please
> test with a suite of compilers.  AFAICT, clang is optimizing
> out the code.  I haven't written a testcase to demonstrate this
> as I have other irons in the fire.

I only tested with 4 compilers when I wrote it.  Actually, we agreed
not to worry about compiler bugs for setting fenv, especially for
compilers with even more of them than gcc. libm only has the volatile
hack needed to fix huge*huge for clang in some places (gcc evaluates
huge*huge at run time but tiny*tiny at compile time, so libm has more
volatile hacks for the latter).  Not to mention hacks to remove extra
precision for huge*huge and tiny*tiny.  On i386 with i387, huge*huge
doesn't overflow since it is evaluated in extra precision.   The
wrong result is returned and the wrong result is used if it is assigned
to a variable that can hold the extra precision.  Overflow only occurs
if the variable is converted to float ot double, and STRICT_ASSIGN() or
a volatile hack must be used for this to work around other compiler
bugs (which are actually features, but not allowed by C standards).
C11 and compiler non-support for C11 breaks this further.  C11 adds
the extra pessimization auns subtraction of value of requiring extra
precision (and range) to be destroyed on function return.  clang ignores
this requirement.  Newer gcc supports it under certain pessimal
CFLAGS including -std=c11.

Bruce.

From owner-freebsd-numerics@freebsd.org  Sat May 13 16:19:34 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3A273D6BDCB
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Sat, 13 May 2017 16:19:34 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org
 [IPv6:2001:1900:2254:206a::50:5])
 by mx1.freebsd.org (Postfix) with ESMTP id 263391DA0
 for <freebsd-numerics@freebsd.org>; Sat, 13 May 2017 16:19:34 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: by mailman.ysv.freebsd.org (Postfix)
 id 257B9D6BDC6; Sat, 13 May 2017 16:19:34 +0000 (UTC)
Delivered-To: numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 232CFD6BDC3;
 Sat, 13 May 2017 16:19:34 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au
 [211.29.132.246])
 by mx1.freebsd.org (Postfix) with ESMTP id DF7351D9F;
 Sat, 13 May 2017 16:19:32 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from besplex.bde.org (c122-106-153-191.carlnfd1.nsw.optusnet.com.au
 [122.106.153.191])
 by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id B18B842C3B3;
 Sun, 14 May 2017 02:19:24 +1000 (AEST)
Date: Sun, 14 May 2017 02:19:24 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Dimitry Andric <dimitry@andric.com>
cc: sgk@troutmask.apl.washington.edu, freebsd-hackers@freebsd.org, 
 numerics@freebsd.org
Subject: Re: catrig[fl].c and inexact
In-Reply-To: <F5F8736B-D7E1-48AD-BC6C-8C74AF0A3272@andric.com>
Message-ID: <20170514020559.F1038@besplex.bde.org>
References: <20170512215654.GA82545@troutmask.apl.washington.edu>
 <20170513103208.M845@besplex.bde.org>
 <20170513060803.GA84399@troutmask.apl.washington.edu>
 <F5F8736B-D7E1-48AD-BC6C-8C74AF0A3272@andric.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.2 cv=KeqiiUQD c=1 sm=1 tr=0
 a=Tj3pCpwHnMupdyZSltBt7Q==:117 a=Tj3pCpwHnMupdyZSltBt7Q==:17
 a=kj9zAlcOel0A:10 a=PeOOapuUAAAA:8 a=Wnqw8I5xCDkGpBuh6r0A:9
 a=CjuIK1q_8ugA:10 a=0BaqRfgCL6CLbWgV2pdm:22
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 13 May 2017 16:19:34 -0000

On Sat, 13 May 2017, Dimitry Andric wrote:

> On 13 May 2017, at 08:08, Steve Kargl <sgk@troutmask.apl.washington.edu> wrote:
>>
>> On Sat, May 13, 2017 at 11:35:49AM +1000, Bruce Evans wrote:
>>> On Fri, 12 May 2017, Steve Kargl wrote:
> ...
>>> required for the standard magic.  I planned to fix all this magic using
>>> macros like raise_inexact().
>>
>> If you plan to fix the magic with raise_inexact, then please
>> test with a suite of compilers.  AFAICT, clang is optimizing
>> out the code.  I haven't written a testcase to demonstrate this
>> as I have other irons in the fire.
>
> Using the full catrig.c and -O3, I tried gcc 4.2.1, 4.7.4, 4.8.5, 4.9.4,
> 5.4.0, 6.3.0 and 7.0.1, in addition to clang 3.4.1, 3.8.0, 3.9.1, 4.0.0
> and 5.0.0.  All versions of gcc produced something similar to the
> following for i386:

Yes, all compilers I tried (only gcc-3.3.3, gcc-4.2.1 and clang-3.9.0)
generate the intended code, but clang-3.9.0 also generates a -Wunused
warning about the variable that it has just used to generated the intended
code!

> # /usr/src/lib/msun/src/catrig.c:318:   raise_inexact();
>        flds    tiny    # tiny
>        fadds   .LC2    #
>        fstps   120(%esp)       # junk

I don't know how to ask for the best code, which is more like

 	flds	tiny
 	fadds	one
 	ffree	%st(0)		# or fstp %st(0) -- MD optimization

but the best code runs insignificantly faster in practice.

> and for amd64:
> [...]
> .L34:
> .LBB33:
> # /usr/src/lib/msun/src/catrig.c:318:   raise_inexact();
>        movss   tiny(%rip), %xmm0       # tiny, tiny.0_28
>        addss   .LC13(%rip), %xmm0      #, _29
>        movss   %xmm0, 188(%rsp)        # _29, junk

Discarding the result is easier for amd64 (just omit the store).  The
volatile hack forces the store.

> E.g., these all look good, at least with regards to not optimizing out
> the desired addition.
>
> The only compiler I could find that does optimize everything away (at
> least in the simplified test case), is the Intel compiler:
>
> https://godbolt.org/g/g1UT2m

Urk.

Bruce

From owner-freebsd-numerics@freebsd.org  Sat May 13 16:21:58 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5F9A3D6BF30
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Sat, 13 May 2017 16:21:58 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org
 [IPv6:2001:1900:2254:206a::50:5])
 by mx1.freebsd.org (Postfix) with ESMTP id 492411FD0
 for <freebsd-numerics@freebsd.org>; Sat, 13 May 2017 16:21:58 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: by mailman.ysv.freebsd.org (Postfix)
 id 48860D6BF2F; Sat, 13 May 2017 16:21:58 +0000 (UTC)
Delivered-To: numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 46565D6BF2E;
 Sat, 13 May 2017 16:21:58 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "troutmask", Issuer "troutmask" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 2C32D1FCF;
 Sat, 13 May 2017 16:21:58 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (localhost [127.0.0.1])
 by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v4DGLrVL088880
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Sat, 13 May 2017 09:21:53 -0700 (PDT)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v4DGLrgG088879;
 Sat, 13 May 2017 09:21:53 -0700 (PDT) (envelope-from sgk)
Date: Sat, 13 May 2017 09:21:53 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Dimitry Andric <dimitry@andric.com>
Cc: Bruce Evans <brde@optusnet.com.au>, freebsd-hackers@freebsd.org,
 numerics@freebsd.org
Subject: Re: catrig[fl].c and inexact
Message-ID: <20170513162153.GB88653@troutmask.apl.washington.edu>
Reply-To: sgk@troutmask.apl.washington.edu
References: <20170512215654.GA82545@troutmask.apl.washington.edu>
 <20170513103208.M845@besplex.bde.org>
 <20170513060803.GA84399@troutmask.apl.washington.edu>
 <F5F8736B-D7E1-48AD-BC6C-8C74AF0A3272@andric.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <F5F8736B-D7E1-48AD-BC6C-8C74AF0A3272@andric.com>
User-Agent: Mutt/1.7.2 (2016-11-26)
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 13 May 2017 16:21:58 -0000

On Sat, May 13, 2017 at 03:08:26PM +0200, Dimitry Andric wrote:
> On 13 May 2017, at 08:08, Steve Kargl <sgk@troutmask.apl.washington.edu> wrote:
> > 
> > On Sat, May 13, 2017 at 11:35:49AM +1000, Bruce Evans wrote:
> >> On Fri, 12 May 2017, Steve Kargl wrote:
> ...
> >> required for the standard magic.  I planned to fix all this magic using
> >> macros like raise_inexact().
> > 
> > If you plan to fix the magic with raise_inexact, then please
> > test with a suite of compilers.  AFAICT, clang is optimizing
> > out the code.  I haven't written a testcase to demonstrate this
> > as I have other irons in the fire.
> 
> Using the full catrig.c and -O3, I tried gcc 4.2.1, 4.7.4, 4.8.5, 4.9.4,
> 5.4.0, 6.3.0 and 7.0.1, in addition to clang 3.4.1, 3.8.0, 3.9.1, 4.0.0
> and 5.0.0.

Thanks for checking.  I reduced catrig.c to a small self-contained
program and indeed I was getting the desired addition of 1 + tiny
to raise FE_INEXACT.  I suppose that I'll need to add an appropriate
-Wno-foo to my CFLAGS line to suppress the spurious warning, which
might be tricky because -Wunused is one option I'ld like to have.

-- 
Steve
20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4
20161221 https://www.youtube.com/watch?v=IbCHE-hONow

From owner-freebsd-numerics@freebsd.org  Sat May 13 16:55:38 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9AD0AD6B872
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Sat, 13 May 2017 16:55:38 +0000 (UTC)
 (envelope-from dimitry@andric.com)
Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3])
 by mx1.freebsd.org (Postfix) with ESMTP id 82C661613
 for <freebsd-numerics@freebsd.org>; Sat, 13 May 2017 16:55:38 +0000 (UTC)
 (envelope-from dimitry@andric.com)
Received: by mailman.ysv.freebsd.org (Postfix)
 id 7EF6AD6B870; Sat, 13 May 2017 16:55:38 +0000 (UTC)
Delivered-To: numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7B18CD6B86F;
 Sat, 13 May 2017 16:55:38 +0000 (UTC)
 (envelope-from dimitry@andric.com)
Received: from tensor.andric.com (tensor.andric.com [87.251.56.140])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "tensor.andric.com",
 Issuer "COMODO RSA Domain Validation Secure Server CA" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 004711612;
 Sat, 13 May 2017 16:55:37 +0000 (UTC)
 (envelope-from dimitry@andric.com)
Received: from [IPv6:2001:470:7a58::a8c1:a7f4:edbc:3331] (unknown
 [IPv6:2001:470:7a58:0:a8c1:a7f4:edbc:3331])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by tensor.andric.com (Postfix) with ESMTPSA id 09D253FD9B;
 Sat, 13 May 2017 18:55:34 +0200 (CEST)
From: Dimitry Andric <dimitry@andric.com>
Message-Id: <FB138623-DF5B-4DBD-94FE-29E21FF7FDC6@andric.com>
Content-Type: multipart/signed;
 boundary="Apple-Mail=_1A165ECB-BD13-4967-A0C3-5C9609FF1B6F";
 protocol="application/pgp-signature"; micalg=pgp-sha1
Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\))
Subject: Re: catrig[fl].c and inexact
Date: Sat, 13 May 2017 18:55:27 +0200
In-Reply-To: <20170513162153.GB88653@troutmask.apl.washington.edu>
Cc: freebsd-hackers@freebsd.org, numerics@freebsd.org,
 Bruce Evans <brde@optusnet.com.au>
To: sgk@troutmask.apl.washington.edu
References: <20170512215654.GA82545@troutmask.apl.washington.edu>
 <20170513103208.M845@besplex.bde.org>
 <20170513060803.GA84399@troutmask.apl.washington.edu>
 <F5F8736B-D7E1-48AD-BC6C-8C74AF0A3272@andric.com>
 <20170513162153.GB88653@troutmask.apl.washington.edu>
X-Mailer: Apple Mail (2.3273)
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 13 May 2017 16:55:38 -0000


--Apple-Mail=_1A165ECB-BD13-4967-A0C3-5C9609FF1B6F
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii

On 13 May 2017, at 18:21, Steve Kargl <sgk@troutmask.apl.washington.edu> =
wrote:
>=20
> On Sat, May 13, 2017 at 03:08:26PM +0200, Dimitry Andric wrote:
...
>=20
>> Using the full catrig.c and -O3, I tried gcc 4.2.1, 4.7.4, 4.8.5, =
4.9.4,
>> 5.4.0, 6.3.0 and 7.0.1, in addition to clang 3.4.1, 3.8.0, 3.9.1, =
4.0.0
>> and 5.0.0.
>=20
> Thanks for checking.  I reduced catrig.c to a small self-contained
> program and indeed I was getting the desired addition of 1 + tiny
> to raise FE_INEXACT.  I suppose that I'll need to add an appropriate
> -Wno-foo to my CFLAGS line to suppress the spurious warning, which
> might be tricky because -Wunused is one option I'ld like to have.

The following also gets rid of the warnings:

Index: lib/msun/src/catrig.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- lib/msun/src/catrig.c	(revision 318032)
+++ lib/msun/src/catrig.c	(working copy)
@@ -37,7 +37,7 @@
 #define isinf(x)	(fabs(x) =3D=3D INFINITY)
 #undef isnan
 #define isnan(x)	((x) !=3D (x))
-#define	raise_inexact()	do { volatile float junk =3D 1 + tiny; } =
while(0)
+#define	raise_inexact()	do { volatile float junk __unused =3D 1 =
+ tiny; } while(0)
 #undef signbit
 #define signbit(x)	(__builtin_signbit(x))

Index: lib/msun/src/catrigf.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- lib/msun/src/catrigf.c	(revision 318032)
+++ lib/msun/src/catrigf.c	(working copy)
@@ -51,7 +51,7 @@
 #define isinf(x)	(fabsf(x) =3D=3D INFINITY)
 #undef isnan
 #define isnan(x)	((x) !=3D (x))
-#define	raise_inexact()	do { volatile float junk =3D 1 + tiny; } =
while(0)
+#define	raise_inexact()	do { volatile float junk __unused =3D 1 =
+ tiny; } while(0)
 #undef signbit
 #define signbit(x)	(__builtin_signbitf(x))

Index: lib/msun/src/catrigl.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- lib/msun/src/catrigl.c	(revision 318032)
+++ lib/msun/src/catrigl.c	(working copy)
@@ -53,7 +53,7 @@
 #define isinf(x)	(fabsl(x) =3D=3D INFINITY)
 #undef isnan
 #define isnan(x)	((x) !=3D (x))
-#define	raise_inexact()	do { volatile float junk =3D 1 + tiny; } =
while(0)
+#define	raise_inexact()	do { volatile float junk __unused =3D 1 =
+ tiny; } while(0)
 #undef signbit
 #define signbit(x)	(__builtin_signbitl(x))

If you are OK with that, I will commit it later today.

-Dimitry


--Apple-Mail=_1A165ECB-BD13-4967-A0C3-5C9609FF1B6F
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename=signature.asc
Content-Type: application/pgp-signature;
	name=signature.asc
Content-Description: Message signed with OpenPGP

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.30

iEYEARECAAYFAlkXOoUACgkQsF6jCi4glqOjeQCgrp2JTdTaC/b3j/+gqf56C3AV
GT0AoO+KGbDi+qxoOxNrez97cSEMi/Vv
=zJHP
-----END PGP SIGNATURE-----

--Apple-Mail=_1A165ECB-BD13-4967-A0C3-5C9609FF1B6F--

From owner-freebsd-numerics@freebsd.org  Sat May 13 17:12:13 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 68227D6B2A6
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Sat, 13 May 2017 17:12:13 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3])
 by mx1.freebsd.org (Postfix) with ESMTP id 50FA7155
 for <freebsd-numerics@freebsd.org>; Sat, 13 May 2017 17:12:13 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: by mailman.ysv.freebsd.org (Postfix)
 id 50556D6B2A5; Sat, 13 May 2017 17:12:13 +0000 (UTC)
Delivered-To: numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4FD07D6B2A4;
 Sat, 13 May 2017 17:12:13 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "troutmask", Issuer "troutmask" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 2B29E154;
 Sat, 13 May 2017 17:12:13 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (localhost [127.0.0.1])
 by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v4DHC8hu089182
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Sat, 13 May 2017 10:12:08 -0700 (PDT)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v4DHC8qr089181;
 Sat, 13 May 2017 10:12:08 -0700 (PDT) (envelope-from sgk)
Date: Sat, 13 May 2017 10:12:08 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Dimitry Andric <dimitry@andric.com>
Cc: freebsd-hackers@freebsd.org, numerics@freebsd.org,
 Bruce Evans <brde@optusnet.com.au>
Subject: Re: catrig[fl].c and inexact
Message-ID: <20170513171208.GA89162@troutmask.apl.washington.edu>
Reply-To: sgk@troutmask.apl.washington.edu
References: <20170512215654.GA82545@troutmask.apl.washington.edu>
 <20170513103208.M845@besplex.bde.org>
 <20170513060803.GA84399@troutmask.apl.washington.edu>
 <F5F8736B-D7E1-48AD-BC6C-8C74AF0A3272@andric.com>
 <20170513162153.GB88653@troutmask.apl.washington.edu>
 <FB138623-DF5B-4DBD-94FE-29E21FF7FDC6@andric.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <FB138623-DF5B-4DBD-94FE-29E21FF7FDC6@andric.com>
User-Agent: Mutt/1.7.2 (2016-11-26)
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 13 May 2017 17:12:13 -0000

On Sat, May 13, 2017 at 06:55:27PM +0200, Dimitry Andric wrote:
> On 13 May 2017, at 18:21, Steve Kargl <sgk@troutmask.apl.washington.edu> wrote:
> > 
> > On Sat, May 13, 2017 at 03:08:26PM +0200, Dimitry Andric wrote:
> ...
> > 
> >> Using the full catrig.c and -O3, I tried gcc 4.2.1, 4.7.4, 4.8.5, 4.9.4,
> >> 5.4.0, 6.3.0 and 7.0.1, in addition to clang 3.4.1, 3.8.0, 3.9.1, 4.0.0
> >> and 5.0.0.
> > 
> > Thanks for checking.  I reduced catrig.c to a small self-contained
> > program and indeed I was getting the desired addition of 1 + tiny
> > to raise FE_INEXACT.  I suppose that I'll need to add an appropriate
> > -Wno-foo to my CFLAGS line to suppress the spurious warning, which
> > might be tricky because -Wunused is one option I'ld like to have.
> 
> The following also gets rid of the warnings:
> 
> Index: lib/msun/src/catrig.c
> ===================================================================
> --- lib/msun/src/catrig.c	(revision 318032)
> +++ lib/msun/src/catrig.c	(working copy)
> @@ -37,7 +37,7 @@
>  #define isinf(x)	(fabs(x) == INFINITY)
>  #undef isnan
>  #define isnan(x)	((x) != (x))
> -#define	raise_inexact()	do { volatile float junk = 1 + tiny; } while(0)
> +#define	raise_inexact()	do { volatile float junk __unused = 1 + tiny; } while(0)
>  #undef signbit
>  #define signbit(x)	(__builtin_signbit(x))
> 
> Index: lib/msun/src/catrigf.c
> ===================================================================
> --- lib/msun/src/catrigf.c	(revision 318032)
> +++ lib/msun/src/catrigf.c	(working copy)
> @@ -51,7 +51,7 @@
>  #define isinf(x)	(fabsf(x) == INFINITY)
>  #undef isnan
>  #define isnan(x)	((x) != (x))
> -#define	raise_inexact()	do { volatile float junk = 1 + tiny; } while(0)
> +#define	raise_inexact()	do { volatile float junk __unused = 1 + tiny; } while(0)
>  #undef signbit
>  #define signbit(x)	(__builtin_signbitf(x))
> 
> Index: lib/msun/src/catrigl.c
> ===================================================================
> --- lib/msun/src/catrigl.c	(revision 318032)
> +++ lib/msun/src/catrigl.c	(working copy)
> @@ -53,7 +53,7 @@
>  #define isinf(x)	(fabsl(x) == INFINITY)
>  #undef isnan
>  #define isnan(x)	((x) != (x))
> -#define	raise_inexact()	do { volatile float junk = 1 + tiny; } while(0)
> +#define	raise_inexact()	do { volatile float junk __unused = 1 + tiny; } while(0)
>  #undef signbit
>  #define signbit(x)	(__builtin_signbitl(x))
> 
> If you are OK with that, I will commit it later today.
> 

I'm OK with this change, but I typically defer to Bruce 
as he knows much more about C and floating point.

-- 
Steve
20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4
20161221 https://www.youtube.com/watch?v=IbCHE-hONow

From owner-freebsd-numerics@freebsd.org  Sat May 13 18:21:41 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3C520D6B011
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Sat, 13 May 2017 18:21:41 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org
 [IPv6:2001:1900:2254:206a::50:5])
 by mx1.freebsd.org (Postfix) with ESMTP id 27D9BEE4
 for <freebsd-numerics@freebsd.org>; Sat, 13 May 2017 18:21:41 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: by mailman.ysv.freebsd.org (Postfix)
 id 271C1D6B010; Sat, 13 May 2017 18:21:41 +0000 (UTC)
Delivered-To: numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 26A47D6B00F;
 Sat, 13 May 2017 18:21:41 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail108.syd.optusnet.com.au (mail108.syd.optusnet.com.au
 [211.29.132.59]) by mx1.freebsd.org (Postfix) with ESMTP id 9B43EEE1;
 Sat, 13 May 2017 18:21:39 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from besplex.bde.org (c122-106-153-191.carlnfd1.nsw.optusnet.com.au
 [122.106.153.191])
 by mail108.syd.optusnet.com.au (Postfix) with ESMTPS id D78651A400E;
 Sun, 14 May 2017 04:21:31 +1000 (AEST)
Date: Sun, 14 May 2017 04:21:30 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Mark Millard <markmi@dsl-only.net>
cc: sgk@troutmask.apl.washington.edu, Bruce Evans <brde@optusnet.com.au>, 
 freebsd-hackers@freebsd.org, numerics@freebsd.org
Subject: Re: catrig[fl].c and inexact
In-Reply-To: <DC2DA938-6A07-4CB0-AFB6-038368971B77@dsl-only.net>
Message-ID: <20170514023721.O1230@besplex.bde.org>
References: <20170512215654.GA82545@troutmask.apl.washington.edu>
 <20170513103208.M845@besplex.bde.org>
 <20170513060803.GA84399@troutmask.apl.washington.edu>
 <DC2DA938-6A07-4CB0-AFB6-038368971B77@dsl-only.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.2 cv=VbSHBBh9 c=1 sm=1 tr=0
 a=Tj3pCpwHnMupdyZSltBt7Q==:117 a=Tj3pCpwHnMupdyZSltBt7Q==:17
 a=kj9zAlcOel0A:10 a=iROBt-5bZgHvOzUyjZ0A:9 a=CjuIK1q_8ugA:10
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 13 May 2017 18:21:41 -0000

On Sat, 13 May 2017, Mark Millard wrote:

>
> On 2017-May-12, at 11:08 PM, Steve Kargl <sgk at troutmask.apl.washington.edu> wrote:
>
>> On Sat, May 13, 2017 at 11:35:49AM +1000, Bruce Evans wrote:
>>> On Fri, 12 May 2017, Steve Kargl wrote:
>>>> ...
>>>> /usr/home/kargl/trunk/math/libm/msun/src/catrigl.c:56:45: note: expanded from
>>>>     macro 'raise_inexact'
>>>> #define raise_inexact() do { volatile float junk = 1 + tiny; } while(0)
>>>>                                           ^
>>>> Grepping catrig.o for the variable 'junk' suggests that 'junk' is
>>>> optimized out (with at least -O2).

It is easy to write unportable code that works perfectly.  On i386(i387):

#define	use(x)	__asm("" : : "t" (x))
#define	raise_inexact() use(1 + tiny)

looks cleaner except for the asm, and generates perfect code with fstp
%st(0) and no store of the result to memory.  Unfortunately, the "t"
(top of i387 stack) is too unportable.  "g" might be portable enough,
but generatates wose code that the volatile variable.

>>> Just another bug in clang.  Volatile variables cannot be optimized out
>>> (if they are accessed).
>>
>> Does this depend on scope?  'junk' is local to the do {...} while(0);
>> construct.  Can a compiler completely eliminate a do-nothing scoping
>> unit?  I don't know C well enough to know.  I do know what I have
>> observed in clang.
>
> [This note ignores other standards than C99/C11
> that might place other constraints. And I've done
> no checking of compiler results, I've just looked
> at a couple of the C standards.]
>
> Note: I've not looking to tiny's declaration. It
> may contribute in a way not covered below.
>
> Unfortunately the declarator in an init-declarator
> that has an initializer is not part of an
> expression. The rules for volatile are tied to uses
> in expressions, not to the declarator. (Which is a
> hole in the language definition as far as I can
> tell.)

But the very first mention of volatile in C99 (5.1.2.3 Program Execution
#1) says that "Accessing a volatile object ... [is a side effect]. ...
[All previous side effects shall be complete at certain sequence points.]"

It doesn't make any exceptions for auto objects.

Also, #3 explicitly says for side effects in expressions that the
implementation may optimize away the evaluation if it can determine
that the evaluation has no side effects, including by calling a function
or accessing a volatile object.  But here the compiler can't do that for
1 + tiny, since this expression does have side effects (perhaps modulo
pragma FENV_ACCESS).  This rule is redundant if not wrong.  The
implementation can always use the "as if" rule to avoid doing work
to produce nothing.  And according to #1, any access to a volatile
variable has side effects, so the compiler can never determine that an
evaluation involving volatile variables has no side effects.

So the correctness of the compiler using #3 to avoid the assignment
reduces to the standard breaking its own definition of volatile, and
then the compiler using the broken definition.

> There is one part of the wording that might mitigate
> this, tied to a full declarator having a sequence
> point at its end despite the declarator itself not
> being an expression, even if its initializer is
> one. There is another wording detail that might
> as well.

Surely the assignment gives a sequence point for initializers?  Actually,
this is not too clear.  I don't even like initialization in declarations,
partly because it obscure the order, and only wrote the code with an
initializer to get a 1-line macro.  It could be written as
"volatile float junk; junk = 1 + tiny;".  Also, the use() macro can
be written in C, with similar problems to the asm version, as
"#define use(x) do { volatile float junk; junk = x; } while (0)" or better
in gnuC as
"#define use(x) do { volatile __typeof(x) junk; junk = x; } while (0)".
This allows keeping the volatile hack and variations to make it work
(maybe just __unused) in 1 place.

#9 (Example 1) says that an implementation may make the volatile keyword
redundant, essentially by making volatile-memory non-magic.  I don't
like this.  It reduces the side effects of volatile to just the ordering
of accesses to volatiles relative to sequence points, but practical
implementations need much more than that.  This clause just says that
impractical implementations are allowed, but so does the "as if" rule.

#10 is much more of the same.

6.7.3 #6 says that accesses to a volatile-qualified object "may" have
side effects unknown to the implementation.

Misimplementations may still apply the "as if" rule and comform to this
clause weaselishly by knowing their own badness.  They just have to do
what is allowed in Example 1 to make volatile have no useful effect.
Then this clause is null.

> Still, overall it would seem safer to be sure there
> is an expression that references the volatile object,
> not having only its declarator. But I would not take
> even that as a guarantee under the C standards.

The standard seems a bit too weighted towards read accesses.  We
cold try writing to a non-volatile variable and reading back the
result as volatile using a *(volatile type_t *)&var hack.  But that
would give an unwanted extra memory access.

> It may seem a silly difference but:
>
> do { volatile float junk=1; junk+=tiny; } while(0)
>
> may well be a better way of writing the "must
> evaluate" part of the intent simply because
> junk is used in an expression. Also it has both read
> and write access, so is a little more "used". The
> sequence point before the assignment can help avoid
> compile-time evaluation as well.

That would give 1 more unwanted memory access (if it works normally):
- write 1 to junk
- read 1 from junk; add tiny (usually) in a register
- write result to junk.

> Details if you care. . .
>
> I used the C99 and C11 definitions here, I
> reference C11 section numbering but C99 agrees
> as I remember.
>
> 5.1.2.3 Program execution says:
>
> "Accessing a volatile object, modifying an object,
> modifying a file, or calling a function that does
> any of those operations are all side effects,
> which are changes in the state of the execution
> environment. Evaluation of an expression may
> produce side effects."
>
> Note that raising inexact does not fit in the
> definition of side effect as far as I can tell.
> So a compiler need not consider such a thing
> for side-effect issues if I understand right.

I think it does, modulo #pragma FENV_ACCESS.  Indeed, F.7.1 says it
does explicitly (and without Annex F, floating point can do almost
anything).  It says that when FENV_ACCESS is "on" (should be "ON"),
for FP operations that implicitly raise exception flags, these
changes to the FP state are treated as side effects which respect
sequence points [footnote 291].  The footnote wastes space to remind
the reader that optimizations are allowed when FENV_ACCESS is "off".

> [C11 specific wording:] "The presence of a
> sequence point between the evaluations of
> expressions A and B implies that every value
> computation and side effect associated with A
> is sequenced before every value compuation and
> side effect associated with B."
>
> [C99 is similar but is before the detailed
> "sequenced before" definition.]
>
> "An actual implementation need not evaluate part
> of an expression if it can deduce that its value
> is not used and that no needed side effects are
> produced (including any caused by calling a
> function or accessing a volatile object)."

I didn't expect any problems with volatile or sequence points.  With
FENV_ACCESS OFF, the compiler is free to ignore the side effect for
1+tiny, but with FENV_ACCESS broken in all available compilers, we
have to assume that the compiler doesn't ignore this side affect.
In practice, compilers do ignore it for (void)(1+tiny) with tiny
non-volatile, so we use a several volatile hacks.  Volatile for
tiny alone isn't enough...

> Can a accessing a volatile object ever be
> classified as having "no needed side effects"?
> More on this later. [Remember what "side effect"
> excludes, as noted earlier. So some consequences
> need not be considered by the compiler, all in
> the name of optimizations.]

...we need the write access to junk it to have side effects.  Since
tiny is volatile, 1+tiny has an unknown value even with FENV_ACCESS OFF.
Then we want the side effects for accessing junk to depend on the value,
so that the value must be calculated even though it it unused except for
its effects on the side effects.  This is fragile.

> 6.7.3 Type Qualifiers says:
>
> "An object that has volatile-qualified type . . .
> Therefore any expression referring to such as object
> shall be evaluated strictly according to the rules
> of the abstract machine, as described in 5.1.2.3.
> Furthermore, at every sequence point the value last
> stored in the object shall agree with that prescribed
> by the abstract machine, except as modified by the
> unknown factors mentioned previously. What constitutes
> an access to an object that has volatile-qualified
> type is implementation-defined."
>
> This part is mixed: what the sequence point wording
> giveth the last sentence taketh away. (More later.)

The implementation must work for memory mapped-devices since that is the
most important case for us.  Anything that reads or writes a value to a
memory-mapped device has lots of side effects that depend on the value.
So junk = 1 + tiny must load tiny if tiny is for a memory-mapped device,
evaluate 1+tiny to get a value to store, and do the store if junk is for
a memory-mapped device.  The compiler is doing too much optimization if
it "knows" that junk is not for a memory-mapped device because the compiler
allocated it on the stack.  The compiler allocated the static tiny in
ordinary memory too.  If volatile is broken for tiny, and FENV_ACCESS is
OFF or broken (unsupported) then the compiler is free to evaluate 1+tiny
as 1 at compile time, and similarly for later expressions involving the
result.  extern volatile usually prevents the compiler from knowing that
the variable is not for device memory.

> It also says in a note (134):
>
> "A volatile declaration may be used to describe an
> object corresponding to a memory-mapped input/output
> port or an object accessed by an asynchronously
> interrupting function. Actions on objects so declared
> shall, not be "optimized out" by an implementation
> or reordered except as permitted by the rules for
> evaluating expressions."

"so declared" must be read as simple "volatile", since there is no
declaration like "volatile memory mapped ..." though such declarations
would be very useful for kernels.

> Since rules for evaluating expressions are not rules
> for declarators (vs. initializers), this could be
> read as not allowing the "optimize out". (But the
> abstract machine's description is not explicit about
> declarators for such issues.)

It just allows all optimizations which the compiler can tell are safe.
But compilers can never tell.  Maybe the programmer mapped the stack
memory-mapped...  This is well outside the scope of the C abstract
machine, but would be just another hack for kernels.

> The C99 Rationale:
>
> The C99 Rationale was explicit about static
> volatile for a memory mapped I/O register,
> static const volatile for a memory mapped
> input port, const volatile and volatile
> for variables shared across processes. To
> some extent this identifies examples of
> contexts with "needed side effects" that
> have hardware details to take into account.
>
> For taking into account hardware details:
> ". . . Whatever decision are adopted on such
> issues must be documented, as volatile access
> is implementation-defined".
>
> For volatile use with no explicitly identified
> hardware details: volatile would appear to be
> no more than a potential hint for such a
> context, not an effective requirement. The
> implementation-defined status could allow lack
> of access.
>
> Overall, based on what I see in the C99 and
> C11 language definitions, I'd not be willing to
> declare clang wrong (if it did optimize out junk),
> even with my alternative formulation.
>
> C does not have an explicit Principle of Least
> Astonishment as a official guideline to its
> interpretation and the rules are very biased to
> allowing so-called optimizations. "junk" does not
> fit with being shared across processes (for
> example its address is not handed to anything)
> and is not static or even global. There is no
> known type of potential context for specific
> hardware details that would need to be taken
> into account for junk. That in turn leaves open
> not accessing it at all as far as I can tell.

Yes, it is only a hint, and the C standard would be improved by saying
just that, or requiring the strong meaning that is needed in practice.
The strong meaning is that accesses to volatile variables always have
side effects even if the implementation "knows" that the don't.

Bruce

From owner-freebsd-numerics@freebsd.org  Sat May 13 19:14:57 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6E1E0D6BFF6
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Sat, 13 May 2017 19:14:57 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org
 [IPv6:2001:1900:2254:206a::50:5])
 by mx1.freebsd.org (Postfix) with ESMTP id 599CBA9C
 for <freebsd-numerics@freebsd.org>; Sat, 13 May 2017 19:14:57 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: by mailman.ysv.freebsd.org (Postfix)
 id 56163D6BFF4; Sat, 13 May 2017 19:14:57 +0000 (UTC)
Delivered-To: numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 52341D6BFF3;
 Sat, 13 May 2017 19:14:57 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au
 [211.29.132.42]) by mx1.freebsd.org (Postfix) with ESMTP id 10ACDA9B;
 Sat, 13 May 2017 19:14:56 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from besplex.bde.org (c122-106-153-191.carlnfd1.nsw.optusnet.com.au
 [122.106.153.191])
 by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id 6F0963C6D9F;
 Sun, 14 May 2017 05:14:54 +1000 (AEST)
Date: Sun, 14 May 2017 05:14:53 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Dimitry Andric <dimitry@andric.com>
cc: sgk@troutmask.apl.washington.edu, freebsd-hackers@freebsd.org, 
 numerics@freebsd.org
Subject: Re: catrig[fl].c and inexact
In-Reply-To: <FB138623-DF5B-4DBD-94FE-29E21FF7FDC6@andric.com>
Message-ID: <20170514043645.G2059@besplex.bde.org>
References: <20170512215654.GA82545@troutmask.apl.washington.edu>
 <20170513103208.M845@besplex.bde.org>
 <20170513060803.GA84399@troutmask.apl.washington.edu>
 <F5F8736B-D7E1-48AD-BC6C-8C74AF0A3272@andric.com>
 <20170513162153.GB88653@troutmask.apl.washington.edu>
 <FB138623-DF5B-4DBD-94FE-29E21FF7FDC6@andric.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.2 cv=VbSHBBh9 c=1 sm=1 tr=0
 a=Tj3pCpwHnMupdyZSltBt7Q==:117 a=Tj3pCpwHnMupdyZSltBt7Q==:17
 a=kj9zAlcOel0A:10 a=Y88NXTGeRKpz0WgPRmcA:9 a=CjuIK1q_8ugA:10
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 13 May 2017 19:14:57 -0000

On Sat, 13 May 2017, Dimitry Andric wrote:

> On 13 May 2017, at 18:21, Steve Kargl <sgk@troutmask.apl.washington.edu> wrote:
>>
>> On Sat, May 13, 2017 at 03:08:26PM +0200, Dimitry Andric wrote:
> ...
>>
>>> Using the full catrig.c and -O3, I tried gcc 4.2.1, 4.7.4, 4.8.5, 4.9.4,
>>> 5.4.0, 6.3.0 and 7.0.1, in addition to clang 3.4.1, 3.8.0, 3.9.1, 4.0.0
>>> and 5.0.0.
>>
>> Thanks for checking.  I reduced catrig.c to a small self-contained
>> program and indeed I was getting the desired addition of 1 + tiny
>> to raise FE_INEXACT.  I suppose that I'll need to add an appropriate
>> -Wno-foo to my CFLAGS line to suppress the spurious warning, which
>> might be tricky because -Wunused is one option I'ld like to have.
>
> The following also gets rid of the warnings:
>
> Index: lib/msun/src/catrig.c
> ===================================================================
> --- lib/msun/src/catrig.c	(revision 318032)
> +++ lib/msun/src/catrig.c	(working copy)
> @@ -37,7 +37,7 @@
> #define isinf(x)	(fabs(x) == INFINITY)
> #undef isnan
> #define isnan(x)	((x) != (x))
> -#define	raise_inexact()	do { volatile float junk = 1 + tiny; } while(0)
> +#define	raise_inexact()	do { volatile float junk __unused = 1 + tiny; } while(0)
> #undef signbit
> #define signbit(x)	(__builtin_signbit(x))
> ...
>
> If you are OK with that, I will commit it later today.

It is what I said was best yeseterday :-).

Except, __unused is an obfuscation meaning __used.  I couldn't get __used
to work today either.  It works with static variables, but for auto variables
it generates "'__used__' attribute ignored" for both clang-3.9.0 and
gcc-4.2.1, even without any -W flags to ask for excessive warnings.

Today I looked at the macro used(expr), which would be used like used(1
+ tiny) for inexact, used(huge * huge) for overflow, and used(tiny *
tiny) for underflow.  The difficulty is to declare the variable to
hold the result, especially since we don't want this variable to be in
memory.  Also in some cases, we would like to return the result.  For
overflow, we can do either:

 	({ volatile float junk __unused = huge * huge; INFINITY; })

or

 	({ __typeof(huge) r;  STRICT_ASSIGN(..., huge * huge); r; })

with different tradoffs (the second is broken if r is not used and there
is no volatile hidden in STRICT_ASSIGN()),

or better, only load huge once (float t = huge; junk = t * t;).

Bruce

From owner-freebsd-numerics@freebsd.org  Sat May 13 20:55:19 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 78514D6B1DC;
 Sat, 13 May 2017 20:55:19 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "troutmask", Issuer "troutmask" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 535621C2C;
 Sat, 13 May 2017 20:55:19 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (localhost [127.0.0.1])
 by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v4DKtHpT091964
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Sat, 13 May 2017 13:55:18 -0700 (PDT)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v4DKtH98091963;
 Sat, 13 May 2017 13:55:17 -0700 (PDT) (envelope-from sgk)
Date: Sat, 13 May 2017 13:55:17 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Bruce Evans <brde@optusnet.com.au>
Cc: freebsd-hackers@freebsd.org, freebsd-numerics@freebsd.org
Subject: Re: Implementation of half-cycle trignometric functions
Message-ID: <20170513205517.GA91911@troutmask.apl.washington.edu>
Reply-To: sgk@troutmask.apl.washington.edu
References: <20170428010122.GA12814@troutmask.apl.washington.edu>
 <20170428183733.V1497@besplex.bde.org>
 <20170428165658.GA17560@troutmask.apl.washington.edu>
 <20170429035131.E3406@besplex.bde.org>
 <20170428201522.GA32785@troutmask.apl.washington.edu>
 <20170429070036.A4005@besplex.bde.org>
 <20170428233552.GA34580@troutmask.apl.washington.edu>
 <20170429005924.GA37947@troutmask.apl.washington.edu>
 <20170429151457.F809@besplex.bde.org>
 <20170429194239.P3294@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20170429194239.P3294@besplex.bde.org>
User-Agent: Mutt/1.7.2 (2016-11-26)
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 13 May 2017 20:55:19 -0000

On Sat, Apr 29, 2017 at 08:19:23PM +1000, Bruce Evans wrote:
> On Sat, 29 Apr 2017, Bruce Evans wrote:
> > On Fri, 28 Apr 2017, Steve Kargl wrote:
> >> On Fri, Apr 28, 2017 at 04:35:52PM -0700, Steve Kargl wrote:
> >>> 
> >>> I was just backtracking with __kernel_sinpi.  This gets a max ULP < 0.61.
> >
> > Comments on this below.
> >
> > This is all rather over-engineered.  Optimizing these functions is
> > unimportant comparing with finishing cosl() and sinl() and optimizing
> > all of the standard trig functions better, but we need correctness.
> > But I now see many simplifications and improvements:
> >
> > (1) There is no need for new kernels.  The standard kernels already handle
> > extra precision using approximations like:
> >
> >    sin(x+y) ~= sin(x) + (1-x*x/2)*y.
> >
> > Simply reduce x and write Pi*x = hi+lo.  Then
> >
> >    sin(Pi*x) = __kernel_sin(hi, lo, 1).
> >
> > I now see how to do the extra-precision calculations without any
> > multiplications.
> 
> But that is over-engineered too.
> 
> Using the standard kernels is easy and works well:

Maybe works well.  See below.

> Efficiency is very good in some cases, but anomalous in others: all
> times in cycles, on i386, on the range [0, 0.25]
> 
> athlon-xp, gcc-3.3           Haswell, gcc-3.3   Haswell, gcc-4.2.1
> cos:   61-62                 44                 43
> cospi: 69-71 (8-9 extra)     78 (anomalous...)  42 (faster to do more!)
> sin:   59-60                 51                 37
> sinpi: 67-68 (8 extra)       80                 42
> tan:   136-172               93-195             67-94
> tanpi: 144-187 (8-15 extra)  145-176            61-189
> 
> That was a throughput test.  Latency is not so good.  My latency test
> doesn't use serializing instructions, but uses random args and the
> partial serialization of making each result depend on the previous
> one.
> 
> athlon-xp, gcc-3.3           Haswell, gcc-3.3   Haswell, gcc-4.2.1
> cos:   84-85                 69                 79
> cospi: 103-104 (19-21 extra) 117                94
> sin:   75-76                 89                 77
> sinpi: 105-106 (30 extra)    116                90
> tan:   168-170               167-168            147
> tanpi: 191-194 (23-24 extra) 191                154
> 
> This also indicates that the longest times for tan in the throughput
> test are what happens when the function doesn't run in parallel with
> itself.  The high-degree polynomial and other complications in tan()
> are too complicated for much cross-function parallelism.
> 
> Anywyay, it looks like the cost of using the kernel is at most 8-9
> in the parallel case and at most 30 in the serial case.  The extra-
> precision code has about 10 dependent instructions, so it s is
> doing OK to take 30.

Based on other replies in this email exchange, I have gone back
and looked at improvements to my __kernel_{cos|sin|tan}pi[fl]
routines.  The improvements where for both accuracy and speed.
I have tested on i686 and x86_64 systems with libm built with
-O2 -march=native -mtune=native.  My timing loop is of the
form

        float dx, f, x;
        long i, k;

        f = 0;
        k = 1 << 23;
        dx = (xmax - xmin) / (k - 1);
        time_start();
        for (i = 0; i < k; i++) {
                x = xmin + i * dx;
                f += cospif(x);
        };
        time_end();

        x = (time_cpu() / k) * 1.e6;
        printf("cospif time: %.4f usec per call\n", x);

        if (f == 0)
                printf("Can't happen!\n");

The assumption here is that loop overhead is the same for
all tested kernels.

Test intervals for kernels.

 float: [0x1p-14, 0.25]
double: [0x1p-29, 0.25]
  ld80: [0x1p-34, 0.25] 

   Core2 Duo T7250 @ 2.00GHz      || AMD FX8350 Eight-Core CPU
    (1995.05-MHz 686-class)       ||  (4018.34-MHz K8-class)
----------------------------------++--------------------------
       | Horner | Estrin | Fdlibm || Horner | Estrin | Fdlibm 
-------+--------+--------+--------++--------+--------+--------
cospif | 0.0223 |        | 0.0325 || 0.0112 |        | 0.0085
sinpif | 0.0233 | Note 1 | 0.0309 || 0.0125 |        | 0.0085
tanpif | 0.0340 |        | Note 2 || 0.0222 |        |
-------+--------+--------+--------++--------+--------+--------
cospi  | 0.0641 | 0.0571 | 0.0604 || 0.0157 | 0.0142 | 0.0149
sinpi  | 0.0722 | 0.0626 | 0.0712 || 0.0178 | 0.0161 | 0.0166
tanpi  | 0.1049 | 0.0801 |        || 0.0323 | 0.0238 |
-------+--------+--------+--------++--------+--------+--------
cospil | 0.0817 | 0.0716 | 0.0921 || 0.0558 | 0.0560 | 0.0755
sinpil | 0.0951 | 0.0847 | 0.0994 || 0.0627 | 0.0568 | 0.0768
tanpil | 0.1310 | 0.1004 |        || 0.1005 | 0.0827 |
-------+--------+--------+--------++--------+--------+--------

Time in usec/call.

Note 1.  In re-arranging the polynomials for Estrin's method and
float, I found appreciable benefit.

Note 2.  I have been unable to use the tan[fl] kernels to implement
satisfactory kernels for tanpi[fl].  In particular, for x in [0.25,0.5]
and using tanf kernel leads to 6 digit ULPs in 0.5 whereas my kernel
near 2 ULP.

-- 
Steve
20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4
20161221 https://www.youtube.com/watch?v=IbCHE-hONow

From owner-freebsd-numerics@freebsd.org  Sat May 13 22:30:39 2017
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8FD54D66D9E;
 Sat, 13 May 2017 22:30:39 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au
 [211.29.132.246])
 by mx1.freebsd.org (Postfix) with ESMTP id 33B216EC;
 Sat, 13 May 2017 22:30:38 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from besplex.bde.org (c122-106-153-191.carlnfd1.nsw.optusnet.com.au
 [122.106.153.191])
 by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id AEF9E4296A3;
 Sun, 14 May 2017 08:30:35 +1000 (AEST)
Date: Sun, 14 May 2017 08:30:34 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Steve Kargl <sgk@troutmask.apl.washington.edu>
cc: freebsd-hackers@freebsd.org, freebsd-numerics@freebsd.org
Subject: Re: Implementation of half-cycle trignometric functions
In-Reply-To: <20170513205517.GA91911@troutmask.apl.washington.edu>
Message-ID: <20170514071942.T1084@besplex.bde.org>
References: <20170428010122.GA12814@troutmask.apl.washington.edu>
 <20170428183733.V1497@besplex.bde.org>
 <20170428165658.GA17560@troutmask.apl.washington.edu>
 <20170429035131.E3406@besplex.bde.org>
 <20170428201522.GA32785@troutmask.apl.washington.edu>
 <20170429070036.A4005@besplex.bde.org>
 <20170428233552.GA34580@troutmask.apl.washington.edu>
 <20170429005924.GA37947@troutmask.apl.washington.edu>
 <20170429151457.F809@besplex.bde.org>
 <20170429194239.P3294@besplex.bde.org>
 <20170513205517.GA91911@troutmask.apl.washington.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.2 cv=KeqiiUQD c=1 sm=1 tr=0
 a=Tj3pCpwHnMupdyZSltBt7Q==:117 a=Tj3pCpwHnMupdyZSltBt7Q==:17
 a=kj9zAlcOel0A:10 a=YHl6NKQVYIZzuSoSgCMA:9 a=viboGBD9vYLM4oiE:21
 a=fNZ8f2z6azax7mVy:21 a=CjuIK1q_8ugA:10
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 13 May 2017 22:30:39 -0000

On Sat, 13 May 2017, Steve Kargl wrote:

> On Sat, Apr 29, 2017 at 08:19:23PM +1000, Bruce Evans wrote:
>> ...
>> Using the standard kernels is easy and works well:
>
> Maybe works well.  See below.
>> ...
>> Anywyay, it looks like the cost of using the kernel is at most 8-9
>> in the parallel case and at most 30 in the serial case.  The extra-
>> precision code has about 10 dependent instructions, so it s is
>> doing OK to take 30.

Probably a few more than 8.

I got nowhere using inline versions for double precision.  Apparently
the only large win for inlining is when it avoids repeating the
classification, as it does for e_rem_pio2*.  The kernels don't repeat
anything, so the only cost a function call, plus a few cycles for
testing iy for __kernel_sin() only.

> Based on other replies in this email exchange, I have gone back
> and looked at improvements to my __kernel_{cos|sin|tan}pi[fl]
> routines.  The improvements where for both accuracy and speed.

I really don't want another set of kernels (or more sets for degrees
instead of radians, and sincos).  Improvements to the existing kernels
are welcome, but difficult except for long double precision.  I got
nowhere tweaking the polynominal in __kernel_sin().  Every change that
that I tried just moved the tradeoff between accuracy and efficiency.
The one best for efficiency is only about 4 cycles faster, and increases
the error by 0.1 to 0.2 ulps.  This change involves adding up the terms
in a different order.

> I have tested on i686 and x86_64 systems with libm built with
> -O2 -march=native -mtune=native.  My timing loop is of the
> form
>
>        float dx, f, x;
>        long i, k;
>
>        f = 0;
>        k = 1 << 23;
>        dx = (xmax - xmin) / (k - 1);
>        time_start();
>        for (i = 0; i < k; i++) {
>                x = xmin + i * dx;

This asks for a conversions from long to double which tends to be slow, and
a multiplication in the inner loop.  The compiler shouldn't optimize it to
x += dx since this has different inaccuracy.

My test loop does x += dx with FP an test that x < limit.  This sometimes
has problems when dx is so small that x + dx == x.  Also, x, dx and limit
are double precision for testing all precision, so that the loop overhead
is the same for all precisions.  This works best on i386/i387.  Otherwise,
there are larger conversion overheads.  This usually prevents x + dx == x
in float precision, but in long double precison it results in x + dx == x
more often.  Double precision just can't handle a large limit like
LDBL_MAX or even small steps up to DBL_MAX.

>                f += cospif(x);
>        };
>        time_end();
>
>        x = (time_cpu() / k) * 1.e6;
>        printf("cospif time: %.4f usec per call\n", x);
>
>        if (f == 0)
>                printf("Can't happen!\n");

Otherwise, this is a reasonable throughput test.  But please count times
in cycles if possible.  rdtsc() is very easy to use on x86.

>
> The assumption here is that loop overhead is the same for
> all tested kernels.

It is probably much larger for long double precision.  I get minimal times
like 9 cycles for float and double precision, but more like 30 for long
double on x86.

> Test intervals for kernels.
>
> float: [0x1p-14, 0.25]
> double: [0x1p-29, 0.25]
>  ld80: [0x1p-34, 0.25]
>
>   Core2 Duo T7250 @ 2.00GHz      || AMD FX8350 Eight-Core CPU
>    (1995.05-MHz 686-class)       ||  (4018.34-MHz K8-class)
> ----------------------------------++--------------------------
>       | Horner | Estrin | Fdlibm || Horner | Estrin | Fdlibm
> -------+--------+--------+--------++--------+--------+--------
> cospif | 0.0223 |        | 0.0325 || 0.0112 |        | 0.0085
> sinpif | 0.0233 | Note 1 | 0.0309 || 0.0125 |        | 0.0085
> tanpif | 0.0340 |        | Note 2 || 0.0222 |        |

The fdlibm kernels are almost impossible to beat in float precision,
since they use double precision so the correct way to use them is
for example 'cospif: return __kernel_cosdf(M_PI * x);' after reduction
to |x| ~< 0.25  Any pure float precision method is going to take 10-20
cycles longer.

It is interesting that you measured fdlibm to be faster on the newer
system but much slower on the older system.  The latter must be a bug
somewhere.

> -------+--------+--------+--------++--------+--------+--------
> cospi  | 0.0641 | 0.0571 | 0.0604 || 0.0157 | 0.0142 | 0.0149
> sinpi  | 0.0722 | 0.0626 | 0.0712 || 0.0178 | 0.0161 | 0.0166
> tanpi  | 0.1049 | 0.0801 |        || 0.0323 | 0.0238 |
> -------+--------+--------+--------++--------+--------+--------

Now the differences are almost small enough to be noise.

> cospil | 0.0817 | 0.0716 | 0.0921 || 0.0558 | 0.0560 | 0.0755
> sinpil | 0.0951 | 0.0847 | 0.0994 || 0.0627 | 0.0568 | 0.0768
> tanpil | 0.1310 | 0.1004 |        || 0.1005 | 0.0827 |
> -------+--------+--------+--------++--------+--------+--------

Now the differences are that the kernels for long double precision
are unoptimized.  They use Horner.  Actually, they do use the optimization
of using double precision constants if possible (but not the larger
optimization for sparc64 of calculating higher terms in double precision).

> Time in usec/call.
>
> Note 1.  In re-arranging the polynomials for Estrin's method and
> float, I found appreciable benefit.

Do you mean "no appreciable benefit"?  No times are shown.  Short polynomials
benefit less.  There is also the problem that measuring throughput vs
latency is hard.  If the CPU can execute several functions in parallel, it
is best (iff the load has candidates for such functions, as simple tests
do) to use something like Horner's method to minimise the number of
operations.  Horner's method is only very bad for latency, and on in-order
CPUs.  Some of the timing anomalys are probably explained by this -- newer
CPUs have fewer bottlenecks so do better at executing functions in parallel;
this is also easier in float precision.

> Note 2.  I have been unable to use the tan[fl] kernels to implement
> satisfactory kernels for tanpi[fl].  In particular, for x in [0.25,0.5]
> and using tanf kernel leads to 6 digit ULPs in 0.5 whereas my kernel
> near 2 ULP.

The tanf kernel should be very accurate since it is in double precision.
But its polynomial is chosen to only give an accuracy of 0.7999 ulps,
while the polys for cosf and sing are chosen to give an accuracy of
0.5009 ulps, since the high accuracy is only almost free for the latter.
Any extra error on 0.7999 might be too much.  But multiplication by M_PI
in double precision shouldn't change the error by more than 0.0001 ulps.

The tanl kernel has to struggle to get even sub-ulp precision.  Its
degree is too high for efficiency, and I don't trust it to give even
sub-ulp precision, especially for ld128.

I didn't manage to get cospi(x) and sinpi(x) using the kernels as fast
as cos(x) and sin(x), even with |x| restricted to < 0.25 so that the
range reduction step is null.  The extra precision operations just take
longer than the range reduction even when the latter is not simplifed
for the reduced range.

Conversion of degrees to multiples of Pi is interesting.  E.g.,
cosd(x) = cos(x * Pi / 180) = cospi(x / 180) in infinite precision.
The natural way to implement it is to convert to cospi() first.
This is only easy using a remainder operation.  Remainder operations
work for this, unlike for converting radians to a quadrand plus a
remainder, because 180 is exactly representable but Pi isn't.  But
exact remainder operations are slow too.  They are just not as slow
or inexact as ones for 18000+ digit approximations to Pi.  So cosd(x)
can only be implemented much more efficiently than cos(x) for the
unimportant case of large |x|.

Bruce