From owner-freebsd-numerics@freebsd.org Thu May 18 17:54:37 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9070DD70642 for ; Thu, 18 May 2017 17:54:37 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 6455FCF7 for ; Thu, 18 May 2017 17:54:37 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v4IHsZVu074686 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 18 May 2017 10:54:35 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v4IHsZFV074685; Thu, 18 May 2017 10:54:35 -0700 (PDT) (envelope-from sgk) Date: Thu, 18 May 2017 10:54:34 -0700 From: Steve Kargl To: Bruce Evans Cc: freebsd-numerics@freebsd.org Subject: Re: Implementation of half-cycle trignometric functions Message-ID: <20170518175434.GA74453@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu References: <20170409220809.GA25076@troutmask.apl.washington.edu> <20170427231411.GA11346@troutmask.apl.washington.edu> <20170428175851.A1386@besplex.bde.org> <20170516224618.GA40855@troutmask.apl.washington.edu> <20170517094848.A52247@besplex.bde.org> <20170517180927.GA54431@troutmask.apl.washington.edu> <20170518072636.E951@besplex.bde.org> <20170518014226.GA63819@troutmask.apl.washington.edu> <20170518154820.A8280@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170518154820.A8280@besplex.bde.org> User-Agent: Mutt/1.7.2 (2016-11-26) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 May 2017 17:54:37 -0000 On Thu, May 18, 2017 at 04:34:57PM +1000, Bruce Evans wrote: > On Wed, 17 May 2017, Steve Kargl wrote: > > > On Thu, May 18, 2017 at 09:25:58AM +1000, Bruce Evans wrote: > >> On Wed, 17 May 2017, Steve Kargl wrote: > > ... > >>> As such, I've added a comment > >>> > >>> /* > >>> * pi_hi contains the leading 56 bits of a 169 bit approximation for pi. > >>> */ > >> > >> Why 56? 53 + 56 = 109. > > > > This is ld128. > > ld128 has precision 113. Yes. I know. > > static const long double > > pi_hi = 3.14159265358979322702026593105983920e+00L, > > pi_lo = 1.14423774522196636802434264184180742e-17L; > > > > pi_hi has 113/2 = 56 bits. > > pi_lo has 113 bit. > > > > 56 + 113 = 169 > > But hi is set intentionally sloppily by converting to double, so it has > 53 bits instead of the 57 that 56 would go with (57 = 113/2 rounded up). A 53-bit entity times a 56-bit entity needs 109-bits for the result. A 113-bit entity can hold a 109-bit entity. Or am I missing something? > >> My improvements to lgamma()'s sin_pi*() are buggy. I forgot the original > >> target of avoiding inexact better: > > > > I'll need to wait until the weekend to study your improved sin_pi. > > I now have closer to production version. sinpi() runs more than 2 > times faster that your version, without doing much different except > not having so much EXTRACT/INSERT or special cases. (66 cycles instead > of 150 on Haswell-i386, and the 66 can be reduced to 58 by removing > the STRICT_ASSIGN() which breaks only i386 with non-default extra > precision. The difference would be smaller on amd64 or better compilers > , but Haswell-i386 has much the same slownesses as old Athlon from the > EXTRACTs and INSERTs. > > However, both versions fail tests. Min has a max error of 2048 ulps > and an everage error of 928 ulps and yours has a maximum error of many > giga-ups for the sign bit wrong and an average error of 561 ulps. The > enormous errors are probably for denormals or NaNs. You're correct that I did not give much consideration to subnormal number. For float this, can be fixed by if (ix < 0x38800000) { /* |x| < 0x1p-14 */ if (x == 0) return (x); SET_FLOAT_WORD(hi, hx & 0xffff0000); if (ix <= 0x0b7f0001) { /* Avoid spurios underflow. */ hi *= 0x1p23F; lo = x * 0x1p23F - hi; s = (pi_lo + pi_hi) * lo + pi_lo * hi + pi_hi * hi; return (s * 0x1p-23F); } lo = x - hi; s = (pi_lo + pi_hi) * lo + pi_lo * hi + pi_hi * hi; return (s); } or if you want to avoid the extra if-statement. Simply scale if (ix < 0x38800000) { /* |x| < 0x1p-14 */ if (x == 0) return (x); SET_FLOAT_WORD(hi, hx & 0xffff0000); hi *= 0x1p23F; lo = x * 0x1p23F - hi; s = (pi_lo + pi_hi) * lo + pi_lo * hi + pi_hi * hi; return (s * 0x1p-23F); } Starting with x = 0x1p-14 and using nextafterf(x,-1.f), the first value where underflow is signalled is % ./testf -S -D sinpif(3.74171353e-39) = 1.17549407e-38 x is subnormal sinpif(x) is subnormal Exhaustive testing of numbers in the domain [FLT_MIN, 0x1p-14] yields % ./testf -S -m 1.17549435E-38F -M 0x1p-14F -ed MAX ULP: 0.58366567 Total tested: 939524096 -- Steve 20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4 20161221 https://www.youtube.com/watch?v=IbCHE-hONow