From owner-freebsd-standards@FreeBSD.ORG Tue Oct 2 02:25:06 2007 Return-Path: Delivered-To: freebsd-standards@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 74D5B16A418 for ; Tue, 2 Oct 2007 02:25:06 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail12.syd.optusnet.com.au (mail12.syd.optusnet.com.au [211.29.132.193]) by mx1.freebsd.org (Postfix) with ESMTP id 2102213C467 for ; Tue, 2 Oct 2007 02:25:05 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c220-239-235-248.carlnfd3.nsw.optusnet.com.au [220.239.235.248]) by mail12.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id l922P1cR000332 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 2 Oct 2007 12:25:03 +1000 Date: Tue, 2 Oct 2007 12:25:01 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Steve Kargl In-Reply-To: <20071002001154.GA3782@troutmask.apl.washington.edu> Message-ID: <20071002105817.V7990@besplex.bde.org> References: <20070928152227.GA39233@troutmask.apl.washington.edu> <20071001173736.U1985@besplex.bde.org> <20071002001154.GA3782@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-standards@freebsd.org Subject: Re: long double broken on i386? X-BeenThere: freebsd-standards@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Standards compliance List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Oct 2007 02:25:06 -0000 On Mon, 1 Oct 2007, Steve Kargl wrote: > On Mon, Oct 01, 2007 at 07:56:28PM +1000, Bruce Evans wrote: >> On Fri, 28 Sep 2007, Steve Kargl wrote: >>> On amd64, these routines give <= 1/2 ULP for the various >>> values I've tested in the reduced range. Now, if I test >> >> I doubt this :-). Getting <= 1 ULP is hard; getting <= 1/2 + 0.001 >> ULPs is harder, and getting <= 1/2 ULPs without using an enormous >> amount of extra precision is a research problem. > > It appears that I was not clear in what I had intended to write. > Let's try again. I've implemented the necessary argument > reduction using k_rem_pio2.c for a 64-bit long double. I'm > making no claims about its accuracy and the ULP. The comments > in k_rem_pio2.c do not discuss the accuracy. It is noted > that giving the reduced argument to one of my algorithms > does not sudden recover any lost precision. I think it is accurate to within one bit provided it is set up correctly. Setup includes passing it a large table of bits of 2/pi. I don't know if the existing tables with 1584 bits are large enough for long double precision. If long doubles have 64 bits, then the recipe in k_rem_pio_2 returns 106 mantissa bits. I would expect a max relative error of 2^-105. > However, given a reduced value in the range [0,pi/4], the > algorithms that I have for sinl(), cosl(), and tanl() over > this range give <= 1/2 ULP "for the various values I've > tested." Note, I did not claim to test *all* values in > [0,pi/4]. I've only tested the algorithms with several 10 > of millions of randomly generated values. If you start with a relative error of 2^-105, then it is not very hard to get a relative error of 2^-104 in the result, but it is not very easy to get this either without having very slow, large and/or tricky code. k_cos.c only wants a relative error of about 2^-60, so it only needs a small amount of tricky code to be fast (it uses essentially cos(x+y) ~= cos(x) - x*y; getting a smaller relative error would require many more terms; everything should really be evaluated in double-double precision, but k_cos.c uses tricks so ordinary double precision error can be used to get a relative error of strictly less than 2^-54 (half an ulp) if the final addition were in infinite precision. Rounding then gives a final relative error of strictly less than 1 ulp. If you can limit the relative error to < 2^-104 before rounding to long double precision, then (with i386 long doubles but of course not with sparc64 long doubles) the final result has a very large chance of being perfectly rounded, but since the result space is huge there is also a very large chance of wrong rounding in a huge number of cases. (Chance of wrong rounding ~= 2^-40 in an arg space of size 2^64 gives about 2^20 cases incorrectly rounded. The arg space is so large that you will probably never find any incorrectly rounded cases by blind searching.) > It should also be noted that I'm using Goldberg's definition > of ULP where the "exact" result is computed via MPFR with > 512 bits of precision and then the mpfr value is converted > to long double. If you can limit the relative error to < 2^-512 before the final rounding step, then you probably have perfect rounding in all cases but no one can prove it because the arg space is too large. (Chance of wrong rounding ~= 2^-448 in an arg space of size 2^64 gives about 2^-384 cases incorrectly rounded). I think doing any calculations in 512-bit precision would be too slow to be useful for long double precision, except for verifying results. > /* > * y is the approximate value. z is the long double result > * of converting the 512 bit MPFR number in round-to-nearest > * mode. > */ > long double > ulpl(long double y, long double z) > { > int n; > long double f; > f = frexpl(y, &n); > f = fabsl(f - ldexpl(z, -n)); > f = ldexpl(f, LDBL_MANT_DIG - 1); > return (f); > } You should try to use all the bits in the MFPR number, to verify that when the rounded results differ, it is because the result is near a half-way case (as indicated by lots of 1's bits in the MFPR number after the rounding point). >> [rounding precision hack] > Sorry about the lack of trimming the above passage, but I was afraid > I might lose context if I deleted text. All gone now :-). > Anyway, the above looks like a catch-22 for i386. I can wrap In 1988 I thought it would be fixed by now. Now It looks like i386 will be obsolete first. >> Note that this bug makes it more difficult to implement functions in >> long double precision -- you cannot use tables of long double constants. > > Ah ha! This explains why my Chebyshev polynomial approximations > (ie nearly mimimax approximations) can have between 10 and 100 > ULP of errors. And not all 11 bits (2048 ulps) are normally lost since the constants are usually exact for leading terms. >> OTOH, loads of long double constants are so slow on i386 that such >> constants should almost never be used. Instead represent long doubles >> as the sum of 2 doubles if possible. Loading 2 doubles and adding them >> is faster than loading 1 long double, at least if the 2 extra operations >> can be pipelined effectively. > > Looks like I spend some time this week splitting the coefficients > into high and low parts. I use pari scripts to generate coeffiencts, and they handle the splitting with minor adjustements. Bruce