From owner-freebsd-standards@FreeBSD.ORG Wed Oct 10 20:47:02 2007 Return-Path: Delivered-To: freebsd-standards@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CEFD016A418 for ; Wed, 10 Oct 2007 20:47:02 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.208.78.105]) by mx1.freebsd.org (Postfix) with ESMTP id 9D30E13C47E for ; Wed, 10 Oct 2007 20:47:02 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu [127.0.0.1]) by troutmask.apl.washington.edu (8.14.1/8.14.1) with ESMTP id l9AKgn3L007510; Wed, 10 Oct 2007 13:42:49 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.1/8.14.1/Submit) id l9AKgnqg007509; Wed, 10 Oct 2007 13:42:49 -0700 (PDT) (envelope-from sgk) Date: Wed, 10 Oct 2007 13:42:49 -0700 From: Steve Kargl To: Bruce Evans Message-ID: <20071010204249.GA7446@troutmask.apl.washington.edu> References: <20070928152227.GA39233@troutmask.apl.washington.edu> <20071001173736.U1985@besplex.bde.org> <20071002001154.GA3782@troutmask.apl.washington.edu> <20071002172317.GA95181@VARK.MIT.EDU> <20071002173237.GA12586@troutmask.apl.washington.edu> <20071003103519.X14175@delplex.bde.org> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="k+w/mQv8wyuph6w0" Content-Disposition: inline In-Reply-To: <20071003103519.X14175@delplex.bde.org> User-Agent: Mutt/1.4.2.3i Cc: freebsd-standards@FreeBSD.org Subject: Re: long double broken on i386? X-BeenThere: freebsd-standards@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Standards compliance List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Oct 2007 20:47:02 -0000 --k+w/mQv8wyuph6w0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Wed, Oct 03, 2007 at 11:13:38AM +1000, Bruce Evans wrote: > On Tue, 2 Oct 2007, Steve Kargl wrote: > > >On Tue, Oct 02, 2007 at 01:23:18PM -0400, David Schultz wrote: > >>... > >>Anyway, my point is that if you have something that works > >>reasonably well and doesn't have egregious errors, my suggestion > >>is to just commit it and not kill yourself over whether the > >>argument reduction is correct in the last ulp. > > >There are a few problems: 1) I don't have commit access. 2) I > >need to do style(9) clean-up pass. 3) I've only tested these > >on i386 and amd64, and I know these fail for sparc64. 4) Most > >importantly, I don't know how the project wants to handle the > >53 vs 64 bit default fpu setting on i386. > > (3) is most important. The C versions will never even be used for > i386 or amd64, since these arches have sin and cos in hardware and (I > believe) it is impossible to beat the hardware on these arches. The > asm versions have always been trivial to implement (change 2 bytes in > s_sin.S (compare e_sqrt.S with e_sqrtf.S; the long double changes go > the other way). amd64 actually uses the C versions for double precision > becase the C versions are competitive with the hardware for speed and > beat it for accuracy on large args and on small args aear a multiple > of pi/2. I've given up on i386. All of my attempts to make my C implementations work on both amd64 and i386 have failed. The code when run on i386 results in ULP's that exceed 10^5. Yes, that is 1E5. Results for amd64 are reported below. As far as assembly, someone else will need to write those routines because I do not know assembly. > (4): These functions are useless unless the default is changed. Users > have always been able to get equivalent _in_accuracy on i386 by just > using the double precision versions. Hacks like #define sinl sin > almost work for just getting things to compile. AFAIK, C99 says the prototype in math.h is "long double sinl(long double);". The #define hack simply won't cut it. > In fact, on i386, due > to the bugfeatures outlined in my previous mail, users can even get > full long double accuracy using hacks like that, even if the rounding > precision is 53 bits: (code removed) > On i386, this stores the full long double result for sinl(1) in x, due > to the bugfeatures -- sin(1) returns extra precision in a register, and > gcc doesn't understand the extra precision so it just stores all bits > of the register in x. The behaviour would be similar if sin(1) were used > in an expression. Given GCC's history, relying on the accidental full long double accuracy because FP registers are 80 bits seems to be folly. It also isn't clear to me what sin(x) will do with x > DBL_MAX. Will 1e400L be interpreted as +Inf or the long double value 1e400? Yes, I'm fully aware that such a large argument may not be the greatest idea due to argument reduction. > >PS: There is also the minor possibility of giving bde either > >a heartache or death via laughter when he finally sees what I > >have done. :) > > I would be unhappy with any versions that aren't either: > - as identical as possible in algorithm and style to the double precision > versions, in the same way that the float versions are as identical as > possible, or > - much better in algorithm and style and efficiency and accuracy than the > double precision versions. I don't want to maintain different versions. I'll go with your second option see comments below and attached code. BTW, if I add #ifdef sinl #undef sinl #endif #define sinl sin to the top of my sin_test.c code, I get ULPs that exceed 10^9. > I tried to turn your original efforts for logl() into the latter for > logf(), log() and logl(), but so far they are only a little better. I lost my copy of logl() due to fat fingering a rm command and various hard drive/file system issues. It probably is contained on one of my several back up tapes, but I've been too lazy to do the restore. Back to sinl, cosl, tanl. The attached diff contains everything that one needs for these functions on an architecture with 64-bit significand long double. Note that portions of the diff for Makefile and math.h contain irrelevant changes that are only relevant for my ccosh, sqrtl, and cbrtl changes. Also note that the Makefile only includes my long double routines when LDBL_PREC == 64. My testing on FreeBSD-amd64 shows troutmask:sgk[219] ./sin_test 4000000 10 400000000 Number of test loops to run: 10 Number of PRN in [0,4.000000e+08] per run: 4000000 ULP 1.0000 (5) 1.0000 (4) 1.0000 (6) 1.0000 (4) 1.0000 (3) 1.0000 (7) 1.0000 (3) 1.0000 (7) 1.0000 (5) 1.0000 (5) --- 49 The number in parenthesis is the count of values that give a sinl(x) that exceeds 0.5 ULP. On the reduced range of [0,pi/4] where __kernel_cosl and __kernel_sinl are used, in all my testing no value of x in [0,pi/4] generated a sinl(x) that exceeded 0.5 ULP. Thus, I suspect (but haven't tried to confirm), the 1.0 ULP observed above is due to the argument reduction. My timings on the basic routines using gprof are: % cumulative self self total time seconds seconds calls ms/call ms/call name 0.6 804.42 5.67 40000000 0.00 0.00 sinl [18] 0.1 934.41 0.74 19998751 0.00 0.00 __kernel_sinl [95] 0.1 935.82 0.66 20001249 0.00 0.00 __kernel_cosl [96] where the system contains a dual-cpu opteron tyan motherboard running at 2.2 GHz. The results for cosl() are troutmask:sgk[224] ./cos_test 4000000 10 400000000 Number of test loops to run: 10 Number of PRN in [0,4.000000e+08] per run: 4000000 ULP 1.0000 (3) 1.0000 (4) 1.0000 (2) 1.0000 (1) 1.0000 (7) 1.0000 (5) 0.5000 (0) 1.0000 (3) 1.0000 (9) 1.0000 (5) --- 39 Note, the 40 million PRN are not the same as those used in the the testing of sinl(). % cumulative self self total time seconds seconds calls ms/call ms/call name 0.6 829.38 5.73 40000000 0.00 0.00 cosl [18] 0.1 936.85 0.70 20000178 0.00 0.00 __kernel_sinl [80] 0.1 938.20 0.67 19999822 0.00 0.00 __kernel_cosl [81] The testing of tanl() is a little more interesting. troutmask:sgk[227] ./tan_test 4000000 10 400000000 Number of test loops to run: 10 Number of PRN in [0,4.000000e+08] per run: 4000000 ULP 1.0000 (24788) 1.0000 (24633) 1.5000 (24626) 1.0000 (24820) 1.0000 (24652) 1.5000 (24526) 1.0000 (24749) 1.0000 (24873) 1.0000 (24656) 1.0000 (24701) ------- 247024 0.5 911.88 5.80 40000000 0.00 0.00 tanl [16] 0.5 933.18 5.01 40000000 0.00 0.00 __kernel_tanl [53] At first glance, the 1.5 ULP and the total count of values of x that generate a result with a ULP that exceeds 0.5 seems disconcerting. However, testing of __kernel_tanl() shows that for all tested values of x in [0,pi/4] the maximum ULP did exceed 0.5. This suggests that tanl() is more sensitive to errors in the argument reduction than sinl() and cosl(). Do with the code as you see fit. I'll be moving on to hypothl and other functions as I have time. -- Steve --k+w/mQv8wyuph6w0 Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="msun.diff" Index: msun/Makefile =================================================================== RCS file: /home/ncvs/src/lib/msun/Makefile,v retrieving revision 1.78 diff -u -p -r1.78 Makefile --- msun/Makefile 21 May 2007 02:49:08 -0000 1.78 +++ msun/Makefile 9 Oct 2007 21:52:32 -0000 @@ -54,7 +54,8 @@ COMMON_SRCS= b_exp.c b_log.c b_tgamma.c s_nexttowardf.c s_remquo.c s_remquof.c \ s_rint.c s_rintf.c s_round.c s_roundf.c s_roundl.c \ s_scalbln.c s_scalbn.c s_scalbnf.c s_signbit.c \ - s_signgam.c s_significand.c s_significandf.c s_sin.c s_sinf.c s_tan.c \ + s_signgam.c s_significand.c s_significandf.c s_sin.c s_sinf.c \ + s_tan.c \ s_tanf.c s_tanh.c s_tanhf.c s_trunc.c s_truncf.c s_truncl.c \ w_cabs.c w_cabsf.c w_drem.c w_dremf.c @@ -68,13 +69,20 @@ SYMBOL_MAPS= ${SYM_MAPS} # C99 long double functions COMMON_SRCS+= s_copysignl.c s_fabsl.c s_modfl.c + .if ${LDBL_PREC} != 53 # If long double != double use these; otherwise, we alias the double versions. COMMON_SRCS+= s_fmal.c s_frexpl.c s_nextafterl.c s_nexttoward.c s_scalbnl.c .endif +.if ${LDBL_PREC} == 64 +COMMON_SRCS+= e_sqrtl.c k_cosl.c k_sinl.c k_tanl.c s_cbrtl.c \ + s_cosl.c s_sinl.c s_tanl.c +.endif + # C99 complex functions -COMMON_SRCS+= s_cimag.c s_cimagf.c s_cimagl.c s_conj.c s_conjf.c s_conjl.c \ +COMMON_SRCS+= s_ccosh.c \ + s_cimag.c s_cimagf.c s_cimagl.c s_conj.c s_conjf.c s_conjl.c \ s_creal.c s_crealf.c s_creall.c # FreeBSD's C library supplies these functions: Index: msun/Symbol.map =================================================================== RCS file: /home/ncvs/src/lib/msun/Symbol.map,v retrieving revision 1.4 diff -u -p -r1.4 Symbol.map --- msun/Symbol.map 29 Apr 2007 14:05:20 -0000 1.4 +++ msun/Symbol.map 9 Oct 2007 21:52:32 -0000 @@ -76,6 +79,7 @@ FBSD_1.0 { copysignl; cos; cosf; + cosl; creal; crealf; creall; @@ -168,8 +172,10 @@ FBSD_1.0 { significandf; sin; sinf; + sinl; tan; tanf; + tanl; tanh; tanhf; trunc; Index: msun/man/cos.3 =================================================================== RCS file: /home/ncvs/src/lib/msun/man/cos.3,v retrieving revision 1.12 diff -u -p -r1.12 cos.3 --- msun/man/cos.3 9 Jan 2007 01:02:05 -0000 1.12 +++ msun/man/cos.3 9 Oct 2007 21:52:32 -0000 @@ -33,7 +33,8 @@ .Os .Sh NAME .Nm cos , -.Nm cosf +.Nm cosf , +.Nm cosl .Nd cosine functions .Sh LIBRARY .Lb libm @@ -43,11 +44,14 @@ .Fn cos "double x" .Ft float .Fn cosf "float x" +.Ft "long double" +.Fn cosl "long double x" .Sh DESCRIPTION The -.Fn cos -and the +.Fn cos , .Fn cosf +and the +.Fn cosl functions compute the cosine of .Fa x (measured in radians). @@ -57,9 +61,10 @@ For a discussion of error due to roundof .Xr math 3 . .Sh RETURN VALUES The -.Fn cos -and the +.Fn cos , .Fn cosf +and the +.Fn cosl functions return the cosine value. .Sh SEE ALSO .Xr acos 3 , Index: msun/man/sin.3 =================================================================== RCS file: /home/ncvs/src/lib/msun/man/sin.3,v retrieving revision 1.10 diff -u -p -r1.10 sin.3 --- msun/man/sin.3 9 Jan 2007 01:02:06 -0000 1.10 +++ msun/man/sin.3 9 Oct 2007 21:52:32 -0000 @@ -34,7 +34,8 @@ .Os .Sh NAME .Nm sin , -.Nm sinf +.Nm sinf , +.Nm sinl .Nd sine functions .Sh LIBRARY .Lb libm @@ -44,11 +45,14 @@ .Fn sin "double x" .Ft float .Fn sinf "float x" +.Ft "long double" +.Fn sinf "long double x" .Sh DESCRIPTION The -.Fn sin -and the +.Fn sin , .Fn sinf +and the +.Fn sinl functions compute the sine of .Fa x (measured in radians). @@ -56,9 +60,10 @@ A large magnitude argument may yield a r or no significance. .Sh RETURN VALUES The -.Fn sin -and the +.Fn sin , .Fn sinf +and the +.Fn sinl functions return the sine value. .Sh SEE ALSO .Xr acos 3 , Index: msun/man/tan.3 =================================================================== RCS file: /home/ncvs/src/lib/msun/man/tan.3,v retrieving revision 1.10 diff -u -p -r1.10 tan.3 --- msun/man/tan.3 9 Jan 2007 01:02:06 -0000 1.10 +++ msun/man/tan.3 9 Oct 2007 21:52:32 -0000 @@ -33,7 +33,8 @@ .Os .Sh NAME .Nm tan , -.Nm tanf +.Nm tanf , +.Nm tanl .Nd tangent functions .Sh LIBRARY .Lb libm @@ -43,11 +44,14 @@ .Fn tan "double x" .Ft float .Fn tanf "float x" +.Ft "long double" +.Fn tanf "long double x" .Sh DESCRIPTION The -.Fn tan -and the +.Fn tan , .Fn tanf +and the +.Fn tanl functions compute the tangent of .Fa x (measured in radians). @@ -57,8 +61,11 @@ For a discussion of error due to roundof .Xr math 3 . .Sh RETURN VALUES The -.Fn tan -function returns the tangent value. +.Fn tan , +.Fn tanf +and +.Fn tanl +functions return the tangent value. .Sh SEE ALSO .Xr acos 3 , .Xr asin 3 , Index: msun/src/k_cosl.c =================================================================== RCS file: msun/src/k_cosl.c diff -N msun/src/k_cosl.c --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ msun/src/k_cosl.c 9 Oct 2007 21:52:32 -0000 @@ -0,0 +1,62 @@ +/*- + * Copyright (c) 2007 Steven G. Kargl + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice unmodified, this list of conditions, and the following + * disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +/* + * Compute cos(x) on the interval [-1:1] with a polynomial approximation. + * The soefficients are determined from a Chebyshev interpolation scheme, + * which claims to provide a nearly minimax polynomial approximation. + * The scheme was implemented with GMP/MPFR at 512 bits of precision. + * + * cos(x) = c0 + c1 * x**2 + c2 * x**4 + c3 * x**6 + ... + * = c0 + x**2 * (c1 + x**2 * (c2 + x**2 * (c3 + ... + * + * In practice, this routine is called only for x in the interval [0:pi/4]. + * Limited testing on pseudorandom numbers drawn within [0:pi/4] shows + * an accuracy of <= 0.5 ULP. + */ +#define C0 (long double) 0x1.000000000000000p0L +#define C1 (long double) -0x8.000000000000000p-4L +#define C2 (long double) 0xa.aaaaaaaaaaaaaabp-8L +#define C3 (long double) -0x5.b05b05b05b05b05p-12L +#define C4 (long double) 0x1.a01a01a01a019dbp-16L +#define C5 (long double) -0x4.9f93edde27cbed7p-24L +#define C6 (long double) 0x8.f76c77fc4bbff37p-32L +#define C7 (long double) -0xc.9cba54236b4e97fp-40L +#define C8 (long double) 0xd.73f9aa92060766ap-48L +#define C9 (long double) -0xb.4105ba69deeed1ap-56L + +long double +__kernel_cosl(long double x) +{ + long double ys, fs; + + ys = x * x; + + fs = C0 + ys * (C1 + ys * (C2 + ys * (C3 + ys * (C4 + + ys * (C5 + ys * (C6 + ys * (C7 + ys * (C8 + ys * C9)))))))); + + return (fs); +} Index: msun/src/k_sinl.c =================================================================== RCS file: msun/src/k_sinl.c diff -N msun/src/k_sinl.c --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ msun/src/k_sinl.c 9 Oct 2007 21:52:32 -0000 @@ -0,0 +1,64 @@ +/*- + * Copyright (c) 2007 Steven G. Kargl + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice unmodified, this list of conditions, and the following + * disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +/* + * Compute sin(x) on the interval [-1:1] with a polynomial approximation + * for sin(x)/x. Coefficients are determined from a Chebyshev interpolation + * scheme, which claims to provide a nearly minimax polynomial approximation. + * The scheme was implemented with GMP/MPFR at 512 bits of precision. + * + * sin(x) + * ------ = c0 + c1 * x**2 + c2 * x**4 + c3 * x**6 + ... + * x + * = c0 + x**2 * (c1 + x**2 * (c2 + x**2 * (c3 + ... + * + * In practice, this routine is called only for x in the interval [0:pi/4]. + * Limited testing on pseudorandom numbers drawn within [0:pi/4] shows + * an accuracy of <= 0.5 ULP. + */ +#define C0 (long double) 0x1.000000000000000p0L +#define C1 (long double) -0x2.aaaaaaaaaaaaaabp-4L +#define C2 (long double) 0x2.222222222222222p-8L +#define C3 (long double) -0xd.00d00d00d00cf21p-16L +#define C4 (long double) 0x2.e3bc74aad8e0742p-20L +#define C5 (long double) -0x6.b99159fd3ad8f13p-28L +#define C6 (long double) 0xb.092309a1577e374p-36L +#define C7 (long double) -0xd.73f9ac01491043cp-44L +#define C8 (long double) 0xc.a926cdf5d22710ep-52L +#define C9 (long double) -0x9.5d953b89a4d49b1p-60L + +long double +__kernel_sinl(long double x) +{ + long double ys, fs; + + ys = x * x; + + fs = C0 + ys * (C1 + ys * (C2 + ys * (C3 + ys * (C4 + ys * (C5 + + ys * (C6 + ys * (C7 + ys * (C8 + ys * C9)))))))); + + return (x * fs); +} Index: msun/src/k_tanl.c =================================================================== RCS file: msun/src/k_tanl.c diff -N msun/src/k_tanl.c --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ msun/src/k_tanl.c 9 Oct 2007 21:52:32 -0000 @@ -0,0 +1,88 @@ +/*- + * Copyright (c) 2007 Steven G. Kargl + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice unmodified, this list of conditions, and the following + * disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +/* + * Compute tan(x) for x in the interval [-1:1]. In practice, this function + * is only called for x in the interval [0:pi/4]. The coefficients in the + * nearly minimax polynomial approximation were determined from a Chebyshev + * interpolation scheme. The scheme was implemented with GMP/MPFR at 512 + * bits of precision. Limited testing on pseudorandom numbers drawn within + * [0:pi/4] shows an accuracy of <= 0.5 ULP. + */ + +#define C0 (long double) 0x1.000000000000000p0L +#define C1 (long double) 0x5.555555555555555p-4L +#define C2 (long double) 0x2.222222222222222p-4L +#define C3 (long double) 0xd.d0dd0dd0dd0dd0ep-8L +#define C4 (long double) 0x5.993d220b043e7c9p-8L +#define C5 (long double) 0x2.44dc6abcd8479d4p-8L +#define C6 (long double) 0xe.b69e870abed7bf1p-12L +#define C7 (long double) 0x5.f68d914adfc47a3p-12L +#define C8 (long double) 0x2.6ab0490042f9ab5p-12L +#define C9 (long double) 0xf.abebb9cb8cc2036p-16L +#define C10 (long double) 0x6.59f8612142fae16p-16L +#define C11 (long double) 0x2.92fb2c6db573835p-16L +#define C12 (long double) 0x1.0b12c13038ec343p-16L +#define C13 (long double) 0x6.c40627af9caa355p-20L +#define C14 (long double) 0x2.bd109c09597d8adp-20L +#define C15 (long double) 0x1.2014a0592e18316p-20L +#define C16 (long double) 0x6.5fa57993c4c813ap-24L +#define C17 (long double) 0x5.896339c2dabba90p-24L +#define C18 (long double) -0x5.dc121de49c2d260p-24L +#define C19 (long double) 0x1.0bf745f0cc96f03p-20L +#define C20 (long double) -0x2.0141b9a2c7a07fap-20L +#define C21 (long double) 0x3.70ed6f452e482fcp-20L +#define C22 (long double) -0x5.04879bfb1589212p-20L +#define C23 (long double) 0x6.456251926b172eap-20L +#define C24 (long double) -0x6.aac0ab6dc0588a1p-20L +#define C25 (long double) 0x5.ff08654751186f8p-20L +#define C26 (long double) -0x4.84ff6f8e7414ddap-20L +#define C27 (long double) 0x2.d1d7cdc3082c3b2p-20L +#define C28 (long double) -0x1.6e486511b1fb457p-20L +#define C29 (long double) 0x9.36f96f36ec8aee5p-24L +#define C30 (long double) -0x2.d564e445a53c9c8p-24L +#define C31 (long double) 0xa.054b3ccb4a63326p-28L +#define C32 (long double) -0x1.6bbcae821a3de9fp-28L +#define C33 (long double) 0x1.8f939ed1883f595p-32L + +long double +__kernel_tanl(long double x) +{ + long double t, xs; + + xs = x * x; + + t = C0 + xs * (C1 + xs * (C2 + xs * (C3 + xs * (C4 + xs * + (C5 + xs * (C6 + xs * (C7 + xs * (C8 + xs * (C9 + xs * + (C10 + xs * (C11 + xs * (C12 + xs * (C13 + xs * (C14 + xs * + (C15 + xs * (C16 + xs * (C17 + xs * (C18 + xs * (C19 + xs * + (C20 + xs * (C21 + xs * (C22 + xs * (C23 + xs * (C24 + xs * + (C25 + xs * (C26 + xs * (C27 + xs * (C28 + xs * (C29 + xs * + (C30 + xs * (C31 + xs * + (C32 + xs * C33)))))))))))))))))))))))))))))))); + + return (x * t); +} Index: msun/src/math.h =================================================================== RCS file: /home/ncvs/src/lib/msun/src/math.h,v retrieving revision 1.62 diff -u -p -r1.62 math.h --- msun/src/math.h 7 Jan 2007 07:54:21 -0000 1.62 +++ msun/src/math.h 9 Oct 2007 21:52:32 -0000 @@ -397,13 +397,15 @@ long double asinl(long double); long double atan2l(long double, long double); long double atanhl(long double); long double atanl(long double); -long double cbrtl(long double); #endif +long double cbrtl(long double); long double ceill(long double); long double copysignl(long double, long double) __pure2; #if 0 long double coshl(long double); +#endif long double cosl(long double); +#if 0 long double erfcl(long double); long double erfl(long double); long double exp2l(long double); @@ -459,10 +461,14 @@ long double scalblnl(long double, long); long double scalbnl(long double, int); #if 0 long double sinhl(long double); +#endif long double sinl(long double); long double sqrtl(long double); +#if 0 long double tanhl(long double); +#endif long double tanl(long double); +#if 0 long double tgammal(long double); #endif long double truncl(long double); Index: msun/src/math_private.h =================================================================== RCS file: /home/ncvs/src/lib/msun/src/math_private.h,v retrieving revision 1.20 diff -u -p -r1.20 math_private.h --- msun/src/math_private.h 28 Nov 2005 04:58:57 -0000 1.20 +++ msun/src/math_private.h 9 Oct 2007 21:52:32 -0000 @@ -269,4 +269,9 @@ float __kernel_cosdf(double); float __kernel_tandf(double,int); int __kernel_rem_pio2f(float*,float*,int,int,int,const int*); +/* long double versions of fdlibm kernel functions */ +long double __kernel_cosl(long double); +long double __kernel_sinl(long double); +long double __kernel_tanl(long double); + #endif /* !_MATH_PRIVATE_H_ */ Index: msun/src/s_cosl.c =================================================================== RCS file: msun/src/s_cosl.c diff -N msun/src/s_cosl.c --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ msun/src/s_cosl.c 9 Oct 2007 21:52:32 -0000 @@ -0,0 +1,118 @@ +/*- + * Copyright (c) 2007 Steven G. Kargl + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice unmodified, this list of conditions, and the following + * disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +/* + * Compute cos(x) for x where x is reduced to y = x - k * pi / 2. + * Limited testing on pseudorandom numbers drawn within [0:4e8] shows + * an accuracy of <= 1.0 ULP where 39 values of x out of 40 million + * possibles resulted in cos(x) that exceeded 0.5 ULP (ie., 0.0000975%). + */ + +#include "math.h" +#include "math_private.h" +#include "fpmath.h" + +/* + * Table of constants for 2/pi, 396 Hex digits (476 decimal) of 2/pi. + * The table was taken from e_rem_pio2.c. + */ +static const int32_t two_over_pi[] = { +0xA2F983, 0x6E4E44, 0x1529FC, 0x2757D1, 0xF534DD, 0xC0DB62, +0x95993C, 0x439041, 0xFE5163, 0xABDEBB, 0xC561B7, 0x246E3A, +0x424DD2, 0xE00649, 0x2EEA09, 0xD1921C, 0xFE1DEB, 0x1CB129, +0xA73EE8, 0x8235F5, 0x2EBB44, 0x84E99C, 0x7026B4, 0x5F7E41, +0x3991D6, 0x398353, 0x39F49C, 0x845F8B, 0xBDF928, 0x3B1FF8, +0x97FFDE, 0x05980F, 0xEF2F11, 0x8B5A0A, 0x6D1F6D, 0x367ECF, +0x27CB09, 0xB74F46, 0x3F669E, 0x5FEA2D, 0x7527BA, 0xC7EBE5, +0xF17B3D, 0x0739F7, 0x8A5292, 0xEA6BFB, 0x5FB11F, 0x8D5D08, +0x560330, 0x46FC7B, 0x6BABF0, 0xCFBC20, 0x9AF436, 0x1DA9E3, +0x91615E, 0xE61B08, 0x659985, 0x5F14A0, 0x68408D, 0xFFD880, +0x4D7327, 0x310606, 0x1556CA, 0x73A8C9, 0x60E27B, 0xC08C6B, +}; + +#define PIO4 (long double) 0xc.90fdaa22168c235p-4L /* pi/4 */ +#define TWO24 (long double) 1.67772160000000000000e+07L /* 2**24 */ + +long double +cosl(long double x) +{ + union IEEEl2bits z; + int i, e0; + int nx = 3, prec = 2; + double xd[3], yd[2]; + long double arg; + + z.e = x; + z.bits.sign = 0; + + /* If x = +-0, then cos(x) = 1. */ + if ((z.bits.manh | z.bits.manl) == 0) + return (x); + + /* If x = NaN or Inf, then cos(x) = NaN. */ + if (z.bits.exp == 32767) + return ((x - x) / (x - x)); + + /* If x is a subnormal number, then cos(x) = 1 */ + if (z.bits.exp == 0) + return (1); + + /* Optimize the case where x is already within range. */ + if (z.e < PIO4) + return __kernel_cosl(x); + + /* Split z.e into a 24-bit representation. */ + e0 = ilogbl(z.e) - 23; + z.e = scalbnl(z.e, -e0); + for (i = 0; i < nx; i++) { + xd[i] = (double) ((int32_t) z.e); + z.e = (z.e - xd[i]) * TWO24; + } + xd[i] = (double) ((int32_t) z.e); + + /* yd contains the pieces of xd rem pi/2 such that |yd| < pi/4. */ + e0 = __kernel_rem_pio2(xd, yd, e0, nx, prec, two_over_pi); + + for (z.e = 0, i = nx - 1; i >= 0; i--) + z.e += (long double) yd[i]; + + switch (e0 & 3) { + case 0: + z.e = __kernel_cosl(z.e); + break; + case 1: + z.e = - __kernel_sinl(z.e); + break; + case 2: + z.e = - __kernel_cosl(z.e); + break; + case 3: + z.e = __kernel_sinl(z.e); + break; + } + + return (z.e); +} Index: msun/src/s_sinl.c =================================================================== RCS file: msun/src/s_sinl.c diff -N msun/src/s_sinl.c --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ msun/src/s_sinl.c 9 Oct 2007 21:52:32 -0000 @@ -0,0 +1,118 @@ +/*- + * Copyright (c) 2007 Steven G. Kargl + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice unmodified, this list of conditions, and the following + * disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +/* + * Compute sin(x) for x where x is reduced to y = x - k * pi / 2. + * Limited testing on pseudorandom numbers drawn within [0:4e8] shows + * an accuracy of <= 1.0 ULP where 49 values of x out of 40 million + * possibles resulted in cos(x) that exceeded 0.5 ULP (ie., 0.000123%). + */ +#include "math.h" +#include "math_private.h" +#include "fpmath.h" + +/* + * Table of constants for 2/pi, 396 Hex digits (476 decimal) of 2/pi. + * The table was taken from e_rem_pio2.c. + */ +static const int32_t two_over_pi[] = { +0xA2F983, 0x6E4E44, 0x1529FC, 0x2757D1, 0xF534DD, 0xC0DB62, +0x95993C, 0x439041, 0xFE5163, 0xABDEBB, 0xC561B7, 0x246E3A, +0x424DD2, 0xE00649, 0x2EEA09, 0xD1921C, 0xFE1DEB, 0x1CB129, +0xA73EE8, 0x8235F5, 0x2EBB44, 0x84E99C, 0x7026B4, 0x5F7E41, +0x3991D6, 0x398353, 0x39F49C, 0x845F8B, 0xBDF928, 0x3B1FF8, +0x97FFDE, 0x05980F, 0xEF2F11, 0x8B5A0A, 0x6D1F6D, 0x367ECF, +0x27CB09, 0xB74F46, 0x3F669E, 0x5FEA2D, 0x7527BA, 0xC7EBE5, +0xF17B3D, 0x0739F7, 0x8A5292, 0xEA6BFB, 0x5FB11F, 0x8D5D08, +0x560330, 0x46FC7B, 0x6BABF0, 0xCFBC20, 0x9AF436, 0x1DA9E3, +0x91615E, 0xE61B08, 0x659985, 0x5F14A0, 0x68408D, 0xFFD880, +0x4D7327, 0x310606, 0x1556CA, 0x73A8C9, 0x60E27B, 0xC08C6B, +}; + +#define PIO4 (long double) 0xc.90fdaa22168c235p-4L /* pi/4 */ +#define TWO24 (long double) 1.67772160000000000000e+07L /* 2**24 */ + +long double +sinl(long double x) +{ + union IEEEl2bits z; + int i, e0, s; + int nx = 3, prec = 2; + double xd[3], yd[2]; + long double arg; + + z.e = x; + s = z.bits.sign; + z.bits.sign = 0; + + /* If x = +-0, then sin(x) = +-0. */ + if ((z.bits.manh | z.bits.manl) == 0) + return (x); + + /* If x = NaN or Inf, then sin(x) = NaN. */ + if (z.bits.exp == 32767) + return ((x - x) / (x - x)); + + /* If x is a subnormal number, then sin(x) = x */ + if (z.bits.exp == 0) + return (x); + + /* Optimize the case where x is already within range. */ + if (z.e < PIO4) + return __kernel_sinl(x); + + /* Split z.e into a 24-bit representation. */ + e0 = ilogbl(z.e) - 23; + z.e = scalbnl(z.e, -e0); + for (i = 0; i < nx; i++) { + xd[i] = (double) ((int32_t) z.e); + z.e = (z.e - xd[i]) * TWO24; + } + xd[i] = (double) ((int32_t) z.e); + + /* yd contains the pieces of xd rem pi/2 such that |yd| < pi/4. */ + e0 = __kernel_rem_pio2(xd, yd, e0, nx, prec, two_over_pi); + + for (z.e = 0, i = nx - 1; i >= 0; i--) + z.e += (long double) yd[i]; + + switch (e0 & 3) { + case 0: + z.e = __kernel_sinl(z.e); + break; + case 1: + z.e = __kernel_cosl(z.e); + break; + case 2: + z.e = - __kernel_sinl(z.e); + break; + case 3: + z.e = - __kernel_cosl(z.e); + break; + } + + return (s ? - z.e : z.e); +} Index: msun/src/s_tanl.c =================================================================== RCS file: msun/src/s_tanl.c diff -N msun/src/s_tanl.c --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ msun/src/s_tanl.c 9 Oct 2007 21:52:32 -0000 @@ -0,0 +1,115 @@ +/*- + * Copyright (c) 2007 Steven G. Kargl + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice unmodified, this list of conditions, and the following + * disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR + * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. + * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT + * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +/* + * Compute tan(x) for x where x is reduced to y = x - k * pi / 2. + * Limited testing on pseudorandom numbers drawn within [0:4e8] shows + * an accuracy of <= 1.5 ULP where 247024 values of x out of 40 million + * possibles resulted in tan(x) that exceeded 0.5 ULP (ie., 0.6%). + */ + +#include "math.h" +#include "math_private.h" +#include "fpmath.h" + +/* + * Table of constants for 2/pi, 396 Hex digits (476 decimal) of 2/pi. + * The table was taken from e_rem_pio2.c. + */ +static const int32_t two_over_pi[] = { +0xA2F983, 0x6E4E44, 0x1529FC, 0x2757D1, 0xF534DD, 0xC0DB62, +0x95993C, 0x439041, 0xFE5163, 0xABDEBB, 0xC561B7, 0x246E3A, +0x424DD2, 0xE00649, 0x2EEA09, 0xD1921C, 0xFE1DEB, 0x1CB129, +0xA73EE8, 0x8235F5, 0x2EBB44, 0x84E99C, 0x7026B4, 0x5F7E41, +0x3991D6, 0x398353, 0x39F49C, 0x845F8B, 0xBDF928, 0x3B1FF8, +0x97FFDE, 0x05980F, 0xEF2F11, 0x8B5A0A, 0x6D1F6D, 0x367ECF, +0x27CB09, 0xB74F46, 0x3F669E, 0x5FEA2D, 0x7527BA, 0xC7EBE5, +0xF17B3D, 0x0739F7, 0x8A5292, 0xEA6BFB, 0x5FB11F, 0x8D5D08, +0x560330, 0x46FC7B, 0x6BABF0, 0xCFBC20, 0x9AF436, 0x1DA9E3, +0x91615E, 0xE61B08, 0x659985, 0x5F14A0, 0x68408D, 0xFFD880, +0x4D7327, 0x310606, 0x1556CA, 0x73A8C9, 0x60E27B, 0xC08C6B, +}; + +#define PIO4 (long double) 0xc.90fdaa22168c235p-4L /* pi/4 */ +#define TWO24 (long double) 1.67772160000000000000e+07L /* 2**24 */ + +long double +tanl(long double x) +{ + union IEEEl2bits z; + int i, e0, s; + int nx = 3, prec = 2; + double xd[3], yd[2]; + long double arg; + + z.e = x; + s = z.bits.sign; + z.bits.sign = 0; + + /* If x = +-0, then tan(x) = +-0. */ + if ((z.bits.manh | z.bits.manl) == 0) + return (x); + + /* If x = NaN or Inf, then tan(x) = NaN. */ + if (z.bits.exp == 32767) + return ((x - x) / (x - x)); + + /* If x is a subnormal number, then tan(x) = x */ + if (z.bits.exp == 0) + return (x); + + /* Optimize the case where x is already within range. */ + if (z.e < PIO4) + return __kernel_tanl(x); + + /* Split z.e into a 24-bit representation. */ + e0 = ilogbl(z.e) - 23; + z.e = scalbnl(z.e, -e0); + for (i = 0; i < nx; i++) { + xd[i] = (double) ((int32_t) z.e); + z.e = (z.e - xd[i]) * TWO24; + } + xd[i] = (double) ((int32_t) z.e); + + /* yd contains the pieces of xd rem pi/2 such that |yd| < pi/4. */ + e0 = __kernel_rem_pio2(xd, yd, e0, nx, prec, two_over_pi); + + for (z.e = 0, i = nx - 1; i >= 0; i--) + z.e += (long double) yd[i]; + + switch (e0 & 3) { + case 0: + case 2: + z.e = __kernel_tanl(z.e); + break; + case 1: + case 3: + z.e = - 1.L / __kernel_tanl(z.e); + break; + } + + return (s ? - z.e : z.e); +} --k+w/mQv8wyuph6w0--