Date: Fri, 7 Mar 2008 10:05:11 +1100 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Colin Percival <cperciva@freebsd.org> Cc: Peter Jeremy <peterjeremy@optushome.com.au>, src-committers@freebsd.org, Bruce Evans <bde@freebsd.org>, cvs-all@freebsd.org, cvs-src@freebsd.org Subject: Re: cvs commit: src/sys/i386/include _types.h Message-ID: <20080307085031.P10724@delplex.bde.org> In-Reply-To: <47CF9586.70707@freebsd.org> References: <200803051121.m25BLE03035426@repoman.freebsd.org> <20080305182531.GS68971@server.vk2pj.dyndns.org> <20080306021222.GA46783@zim.MIT.EDU> <47CF5D19.3090100@freebsd.org> <20080306033246.GA47280@zim.MIT.EDU> <47CF7EBF.6000009@freebsd.org> <20080306063452.GB48339@zim.MIT.EDU> <47CF9586.70707@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Eek, my mailbox filled up overnight. Trying to reply to only some details in LIFO order... On Wed, 5 Mar 2008, Colin Percival wrote: > David Schultz wrote: >> On Wed, Mar 05, 2008, Colin Percival wrote: >>> Setting the i387 FPU to 53-bit precision gives standards-compliant >>> behaviour whether people are using "double" or "long double". >> >> Not quite. > > How is it not standards-compliant? > >>> Yes and no. Double rounding isn't allowed; >> >> Yes, it is. > > No, it isn't. The following code Yes it is :-). Extra precision for intermediate values implies double rounding for the final results, and extra precision for intermediate values is a fundamental part of C (it used to be only that float expressions were (de-facto standard required to be) evaluated in double precision, but now it affects double expressions too, and C99 has vast complications to support i387'-ish extra precision. > double x, y, z; > > x = 1.0; > y = 0x1.0p-53 + 0x1.0p-75; > z = x + y; > z = z - 1.0; > if (z == 0.0) > printf("BUGGY MATH\n"); > else > printf("IEEE754-COMPLIANT MATH\n"); > > should never output "BUGGY MATH", since x + y should round up to > 1 + 2^(-52); but if you set the i387 to use extended precision, it > gets the wrong answer. In C, with FLT_EVAL_METHOD = 2 (evaluate all FP expressions in the range and precision of the long double type) and long doubles actually longer than doubles, this always gives double rounding: x + y is evaluated (and presumably rounded) in long double precision z = x + y rounds to double precision (but with gcc-i386, it doesn't round) (-O0 happens to work to fix this) (the -ffloat-store hack happens to work to fix this) z is now 1.0 except with gcc-i386 -O -fno-float-store when it is 1.0L + 0x1.0p-53L. z - 1.0 is evaluated in long double precision z = z - 1.0 rounds to double precision (except with gcc-i386...) z is now 0.0, except with gcc-i386 -O -fno-float-store it is 0x1.0p-53L if (z == 0.0) fails to detect the bug for gcc-i386 -O -fno-float-store since z is still long double and nonzero Another advantage of using double_t is that you can control when the double rounding occurs and not have to understand so many sub-cases from the compiler bugs. The result is then always 0x1.0p-53L (if double_t is long double, etc.), and if you want a double result then you can round as the last step. The result is then 0x1.0p-53. You don't get the double precision result of 0x1.0p-52, but you get a more accurate result. Extra precision is of course intended to deliver a more accurate result in this way. This assumes 64-bit precision. With 113-bit precision for long doubles, all intermediate values and the result would be exact; the result would be y. It seems to be impossible to choose y to demonstrate the problem for 113-bit long doubles. We see a similar phenomenon for floats vs doubles -- since the extra precision provided by doubles is actually (more than) double, double rounding never (?) occurs. Extended precision should have had more than 64 bits. I think it is mainly the result of Intel's transistor budget being limited 30 years ago. >>>> The downside is that this breaks long doubles. >>> Except that it doesn't. Using 53-bit precision for long doubles is >>> a perfectly valid and standards-compliant option. >> >> It's impossible to implement efficient library functions that >> produce correct long double results > > You could have stopped this sentence here -- for all practical purposes, > correctly rounded trigonometric functions are not feasible. Nah, it's quite feasible, especially for transcendental functions since transcendental numbers in general can be approximated more closely by rationals than algebraic numbers. The evaluation just needs to be done in extra precision, and at least for trig functions, "in general" happens in practice so relatively few cases require significant extra precision. With extra precision in hardware, all except the relatively few cases can even be implemented efficiently. >> when the FPU is set to 64-bit >> mode and avoid double rounding and related problems when the FPU >> is set to 53-bit mode. > > Fortunately, library functions aren't required to have any particular > error bounds. It's always nice if they are correct to within 0.6ulp > instead of 0.8ulp or suchlike -- but nobody should be proving code > correctness based on assumed properties of library transcendental > functions. People should, and do, prove code correct based on the > assumption that double precision arithmetic behaves like double precision > arithmetic, however (myself included). We require error bounds of < 1 ulp and aim for < 0.5[0]1 ulps. Error bounds are required for everything. E.g., libm uses expm1() to implement hyperbolic functions and needs good error bounds for expm1() and a nontrivial error analysis to justify using it (since the error can blow up and would blow up if exp(x)-1 were used instead of expm1(x)), yet it still has an error of > 1 ulp since expm1() and other things aren't accurate enough. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080307085031.P10724>