From owner-freebsd-numerics@freebsd.org Fri Apr 28 08:55:18 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 76190D5279F; Fri, 28 Apr 2017 08:55:18 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail107.syd.optusnet.com.au (mail107.syd.optusnet.com.au [211.29.132.53]) by mx1.freebsd.org (Postfix) with ESMTP id 15EAFEA0; Fri, 28 Apr 2017 08:55:17 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c122-106-153-191.carlnfd1.nsw.optusnet.com.au [122.106.153.191]) by mail107.syd.optusnet.com.au (Postfix) with ESMTPS id ED08AD4A8E5; Fri, 28 Apr 2017 18:33:57 +1000 (AEST) Date: Fri, 28 Apr 2017 18:33:54 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl cc: freebsd-numerics@freebsd.org, freebsd-hackers@freebsd.org Subject: Re: Implementation of half-cycle trignometric functions In-Reply-To: <20170428041849.GA13544@troutmask.apl.washington.edu> Message-ID: <20170428180046.P1386@besplex.bde.org> References: <20170409220809.GA25076@troutmask.apl.washington.edu> <20170427231411.GA11346@troutmask.apl.washington.edu> <20170428041849.GA13544@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=AYLBJzfG c=1 sm=1 tr=0 a=Tj3pCpwHnMupdyZSltBt7Q==:117 a=Tj3pCpwHnMupdyZSltBt7Q==:17 a=kj9zAlcOel0A:10 a=6I5d2MoRAAAA:8 a=uopPRt1wpuTJ01loTfUA:9 a=44QarpAT_w9gG2nY:21 a=cCMfiCuOYRyRPczO:21 a=CjuIK1q_8ugA:10 a=IjZwj45LgO3ly-622nXo:22 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Apr 2017 08:55:18 -0000 On Thu, 27 Apr 2017, Steve Kargl wrote: > On Thu, Apr 27, 2017 at 04:14:11PM -0700, Steve Kargl wrote: >>> >>> The code is attached the bug reportr. >>> >>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=218514 >> >> I have attached a new diff to the bugzilla report. The >> diff is 3090 lines and won't be broadcast the mailing list. > ... > Grrrr. Find a sloppy theshold can be fun. > > Index: src/s_cospif.c > =================================================================== > --- src/s_cospif.c (revision 1916) > +++ src/s_cospif.c (working copy) > @@ -61,7 +61,7 @@ > SET_FLOAT_WORD(ax, ix); > > if (ix < 0x3f800000) { /* |x| < 1 */ > - if (ix < 0x39000000) { /* |x| < 0x1p-13 */ > + if (ix < 0x38800000) { /* |x| < 0x1p-14 */ > if (huge + ax > 0) /* Raise inexact iff != 0. */ > return (1); > } This looks too different from s_cosf.c: XX if(ix <= 0x3f490fda) { /* |x| ~<= pi/4 */ XX if(ix<0x39800000) /* |x| < 2**-12 */ XX if(((int)x)==0) return 1.0; /* 1 with inexact if x != 0 */ XX return __kernel_cosdf(x); XX } The (int)x == 0 method is perhaps not best (especially not the excessive parentheses used in it), but it was verified to be faster than other methods. This might be a matter of tricking the compiler into not optimizing the unusual case. (__predict ugliness doesn't work any better.) Non-formatting style bugs include: - arguably worse spelling of powers of 2 in comments - less information in the comment about inexact. I try to change all similar comments to be consistent whenever I change their code. s_cos.c still has a variant with less detail "/* generate inexact". s_sinl.c is still broken by your not copying the fdlibm code: XX /* If x = +-0 or x is a subnormal number, then sin(x) = x */ XX if (z.bits.exp == 0) XX return (x); This is missing not only the comment about setting inexact, but also the code. This is also poorly optimized. XX /* Optimize the case where x is already within range. */ XX if (z.e < M_PI_4) { XX hi = __kernel_sinl(z.e, 0, 0); XX RETURNI(s ? -hi : hi); XX } This is is considerably more broken. It underflows for small x. Evaluation of polynomials always depends on not doing it for tiny x, since evaluation of higher terms always underflows for tiny x even when x is not denormal (except when the poly is linear, the T1*x term doesn't underflow if either T1 is 0 or T1 is larger than about 1 and x is not denormal). Filtering out tiny x is just an optimization, but prevents this underflow. It sometimes also gives free tests for 0 and denormals (I don't see how to fold this test into the others), and free optimizations for tiny x. It does do this for sin(x), provided we don't support non-standard rounding modes. The correct value is just x with inexact if x != 0. s_cosl.c has the same bug. I sent you preliminary patches in 2008 when the long double trig functions were committed, but left fixing these bugs as an exercise. Actually, I thought I fixed this in cosl() and only left sinl() as an exercise. XX diff -u2 s_cosl.c~ s_cosl.c XX --- s_cosl.c~ Thu May 30 18:17:21 2013 XX +++ s_cosl.c Tue Sep 6 03:43:57 2016 XX @@ -55,21 +55,17 @@ XX long double y[2]; XX long double hi, lo; XX + int ex; XX XX z.e = x; XX - z.bits.sign = 0; XX - XX - /* If x = +-0 or x is a subnormal number, then cos(x) = 1 */ XX - if (z.bits.exp == 0) XX - return (1.0); XX - XX - /* If x = NaN or Inf, then cos(x) = NaN. */ XX - if (z.bits.exp == 32767) XX - return ((x - x) / (x - x)); XX + ex = z.bits.exp; Mostly style and minor efficiency fixes. XX XX ENTERI(); XX XX - /* Optimize the case where x is already within range. */ XX - if (z.e < M_PI_4) XX - RETURNI(__kernel_cosl(z.e, 0)); XX + if (ex < 0x3ffe || (ex == 0x3ffe && z.bits.manh < 0xc90fdaa2)) XX + RETURNI(__kernel_cosl(x, 0)); Efficiency fix for the comparison. But no fix for underflow, etc. XX + XX + /* If x = NaN or Inf, then cos(x) = NaN. */ XX + if (ex == 32767) XX + RETURNI(x - x); XX XX e0 = __ieee754_rem_pio2l(x, y); ISTR breaking s_sinf.c and/or s_cosf.c with the invalid optimization of just evaluating the polynomial, to avoid the branch for the micro-optimization. But it is not juste for an optimization. Bruce