From owner-freebsd-numerics@freebsd.org Sat Apr 29 18:10:24 2017 Return-Path: Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8E065D55BE9; Sat, 29 Apr 2017 18:10:24 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 76320D7A; Sat, 29 Apr 2017 18:10:24 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v3TIAN63041484 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sat, 29 Apr 2017 11:10:23 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v3TIAMGF041483; Sat, 29 Apr 2017 11:10:22 -0700 (PDT) (envelope-from sgk) Date: Sat, 29 Apr 2017 11:10:22 -0700 From: Steve Kargl To: Bruce Evans Cc: freebsd-hackers@freebsd.org, freebsd-numerics@freebsd.org Subject: Re: Implementation of half-cycle trignometric functions Message-ID: <20170429181022.GA41420@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu References: <20170427231411.GA11346@troutmask.apl.washington.edu> <20170428010122.GA12814@troutmask.apl.washington.edu> <20170428183733.V1497@besplex.bde.org> <20170428165658.GA17560@troutmask.apl.washington.edu> <20170429035131.E3406@besplex.bde.org> <20170428201522.GA32785@troutmask.apl.washington.edu> <20170429070036.A4005@besplex.bde.org> <20170428233552.GA34580@troutmask.apl.washington.edu> <20170429005924.GA37947@troutmask.apl.washington.edu> <20170429151457.F809@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170429151457.F809@besplex.bde.org> User-Agent: Mutt/1.7.2 (2016-11-26) X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 29 Apr 2017 18:10:24 -0000 On Sat, Apr 29, 2017 at 05:54:21PM +1000, Bruce Evans wrote: > On Fri, 28 Apr 2017, Steve Kargl wrote: > > > On Fri, Apr 28, 2017 at 04:35:52PM -0700, Steve Kargl wrote: > >> > >> I was just backtracking with __kernel_sinpi. This gets a max ULP < 0.61. > > Comments on this below. Way too much information to digest in one reading. I'll answer the things that are easy ... > >> x2 = x * x; > >> p = x2 * (x2 * (x2 * s4 + s3) + s2) + s1; > >> s = x * (x2 * (double)p + s0hi + s0lo); > >> return ((float)s); > >> } > > I don't like the style being completely different. Why lower-case constants? > ... Having teethed on musty FORTRAN code, I have an adversion to uppercase. I also do not like camel case, which seems to plague modern languages. > > Well, starting with above and playing the splitting game > > with x, x2, and p. I've manage to reduce the max ULP in > > the region testd. Testing against MPFR with sin(pi*x) > > computed in 5*24-bit precision gives > > It already cheats by using (double)p, but is limited by using float to > caclulate p. Yes, the above is cheating to the extent that the above code started out as completely written in double. This gives max ULP < 0.51. I then started reducing the double to float with the intent of finding when max ULP exceeds some chosen threshold. The above gave something like max ULP < 0.55. I then moved the (double) cast to different terms/factors, observed how the max ULP changed. This then gives information about what requires extra precision. > > static inline float > > __kernel_sinpif(float x) > > { > > float p, phi, x2, x2hi, x2lo, xhi, xlo; > > uint32_t ix; > > > > x2 = x * x; > > p = x2 * (x2 * (x2 * s4 + s3) + s2) + s1; > > > > GET_FLOAT_WORD(ix, p); > > SET_FLOAT_WORD(phi, (ix >> 14) << 14); > > > > GET_FLOAT_WORD(ix, x2); > > SET_FLOAT_WORD(x2hi, (ix >> 14) << 14); > > I expect that these GET/SET's are the slowest part. They are quite fast > in float prec, but in double prec on old i386 CPUs compilers generate bad > code which can have penalties of 20 cycles per GET/SET. > > Why the strange reduction? The double shift is just a manual optimization > or pssimization (usually the latter) for clearing low bits. Here it is > used to clear 14 low bits instead of the usual 12. This is normally > written using just a mask of 0xffff0000, unless you want a different > number of bits in the hi terms for technical reasons. Double precision > can benefit more from asymmetric splitting of terms since 53 is not > divisible by 2; 1 hi term must have less than 26.5 bits and the other term > can hold an extra bit. Because I didn't think about using a mask. :-) It's easy to change 14 to 13 or 11 or ..., while I would need to write out zeros and one to come up with 0xffff8000, etc. -- Steve 20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4 20161221 https://www.youtube.com/watch?v=IbCHE-hONow