From owner-freebsd-hackers@freebsd.org Sat Apr 29 00:59:26 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B511FD52424; Sat, 29 Apr 2017 00:59:26 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "troutmask", Issuer "troutmask" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 7F5581ABD; Sat, 29 Apr 2017 00:59:26 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost [127.0.0.1]) by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id v3T0xPk4037996 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Fri, 28 Apr 2017 17:59:25 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id v3T0xOBe037995; Fri, 28 Apr 2017 17:59:24 -0700 (PDT) (envelope-from sgk) Date: Fri, 28 Apr 2017 17:59:24 -0700 From: Steve Kargl To: Bruce Evans Cc: freebsd-hackers@freebsd.org, freebsd-numerics@freebsd.org Subject: Re: Implementation of half-cycle trignometric functions Message-ID: <20170429005924.GA37947@troutmask.apl.washington.edu> Reply-To: sgk@troutmask.apl.washington.edu References: <20170409220809.GA25076@troutmask.apl.washington.edu> <20170427231411.GA11346@troutmask.apl.washington.edu> <20170428010122.GA12814@troutmask.apl.washington.edu> <20170428183733.V1497@besplex.bde.org> <20170428165658.GA17560@troutmask.apl.washington.edu> <20170429035131.E3406@besplex.bde.org> <20170428201522.GA32785@troutmask.apl.washington.edu> <20170429070036.A4005@besplex.bde.org> <20170428233552.GA34580@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170428233552.GA34580@troutmask.apl.washington.edu> User-Agent: Mutt/1.7.2 (2016-11-26) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 29 Apr 2017 00:59:26 -0000 On Fri, Apr 28, 2017 at 04:35:52PM -0700, Steve Kargl wrote: > > I was just backtracking with __kernel_sinpi. This gets a max ULP < 0.61. > > static const float > s0hi = 3.14062500e+00f, /* 0x40490000 */ > s0lo = 9.67653585e-04f, /* 0x3a7daa22 */ > s1 = -5.16771269e+00f, /* 0xc0a55de7 */ > s2 = 2.55016255e+00f, /* 0x402335dd */ > s3 = -5.99202096e-01f, /* 0xbf19654f */ > s4 = 8.10018554e-02f; /* 0x3da5e44d */ > > static inline float > __kernel_sinpif(float x) > { > double s; > float p, x2; > x2 = x * x; > p = x2 * (x2 * (x2 * s4 + s3) + s2) + s1; > s = x * (x2 * (double)p + s0hi + s0lo); > return ((float)s); > } > Well, starting with above and playing the splitting game with x, x2, and p. I've manage to reduce the max ULP in the region testd. Testing against MPFR with sin(pi*x) computed in 5*24-bit precision gives MAX ULP: 0.73345101 Total tested: 33554427 0.7 < ULP <= 0.8: 90 0.6 < ULP <= 0.7: 23948 Exhaustive testing with my older sinpi(x) as the reference gives ./testf -S -m 0x1p-14 -M 0.25 -d -e MAX ULP: 0.73345101 Total tested: 100663296 0.7 < ULP <= 0.8: 45 0.6 < ULP <= 0.7: 11977 The code is slightly slower than my current best kernel. sinpif time is 0.0147 usec per call (60 cycles). static inline float __kernel_sinpif(float x) { float p, phi, x2, x2hi, x2lo, xhi, xlo; uint32_t ix; x2 = x * x; p = x2 * (x2 * (x2 * s4 + s3) + s2) + s1; GET_FLOAT_WORD(ix, p); SET_FLOAT_WORD(phi, (ix >> 14) << 14); GET_FLOAT_WORD(ix, x2); SET_FLOAT_WORD(x2hi, (ix >> 14) << 14); x2lo = s0lo + x2 * (p - phi) + (x2 - x2hi) * phi; x2hi *= phi; GET_FLOAT_WORD(ix, x); SET_FLOAT_WORD(xhi, (ix >> 14) << 14); xlo = x - xhi; xlo = xlo * (x2lo + x2hi) + (xlo * s0hi + xhi * x2lo); return (xlo + xhi * x2hi + xhi * s0hi); } -- Steve 20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4 20161221 https://www.youtube.com/watch?v=IbCHE-hONow