Date: Fri, 7 Dec 2007 17:58:39 +1100 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Steve Kargl <sgk@troutmask.apl.washington.edu> Cc: freebsd-standards@freebsd.org Subject: Re: [PATCH] hypotl, cabsl, and code removal in cabs Message-ID: <20071207173222.D702@delplex.bde.org> In-Reply-To: <20071206231143.GA63969@troutmask.apl.washington.edu> References: <20071012180959.GA36345@troutmask.apl.washington.edu> <20071206090833.GA95428@VARK.MIT.EDU> <20071206231143.GA63969@troutmask.apl.washington.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 6 Dec 2007, Steve Kargl wrote: > On Thu, Dec 06, 2007 at 04:08:33AM -0500, David Schultz wrote: >> Also, umm, I've been busy and unable to pay attention for a while, >> so forgive me if I'm missing something, but isn't it the case that >> we don't have a sqrtl(), except for the gcc builtin on some >> architectures? > > bde pointed me to the right file in src/libm/ieee that explains > the rounding issues with hypotl. I haven't had a chance to > update my implementation to use extra care in the evaluation of > a*a+b*b. I fixed it in your mailbox for the float precision case. (It is useful to test algorithms for the float precision case, since only that case can be tested resonably exhaustively (not actually exhaustively for 2-arg functions like hypotf()). But after a lot of work, the debugged version reduces to almost the fdlibm version except for different style bugs.) > As to the sqrtl question, I have an implementation that supposely > does correct rounding in all rounding modes. It is restricted to > 64-bit significand long doubles. The code does not use bit twiddle; > instead, it uses fenv. This I haven't looked at closely. I fear extreme slowness. On athlon-xp, fenv accesses take a about 100 cycles each (129 for fldenv and 89 for fstenv; thus > 200 for fldenv+fstenv in a C-level fenv access), while bit twiddling instructions can be executed at up to 3 per cycle. mxcsr accesses are much faster, but mxcsr gives just more environment to handle for general C-level access functions, since the i387 and the SSE environments must be maintained in parallel, even on amd64 in case someone actually uses long doubles (SSE would suffice without long doubles). Anyway, the software version of sqrtl is irrelevant on athlon-xp, since athlon-xp has sqrtl in hardware (takes 35 cycles). Similarly for amd64, ia64 and possibly sparc64 (sparc64 has sqrt in hardware so it hopefully has sqrtl in hardware). arm and powerpc apparently have long double == double, so the software version of sqrtl is apparently only needed on ia64. When gcc and gcc actually support C99+IEC-mumble floating point, rounding and setting exception flags will have to continue to be handled using bit fiddling integer instructions or ordinary FP instructions, possibly moved to the C fenv access functions, since i387 fenv accesses are too slow to use for anything except initialization. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20071207173222.D702>