Date: Wed, 08 Aug 2012 22:19:06 -0500 From: Stephen Montgomery-Smith <stephen@missouri.edu> To: freebsd-numerics@freebsd.org Subject: Re: cpow(3) implementations. Message-ID: <50232C2A.8040704@missouri.edu> In-Reply-To: <5022C2B2.2030803@missouri.edu> References: <20120808000357.GA11128@server.rulingia.com> <5021D5DF.7010709@missouri.edu> <20120809033123.V4514@besplex.bde.org> <5022C2B2.2030803@missouri.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
On 08/08/2012 02:49 PM, Stephen Montgomery-Smith wrote: > On 08/08/2012 01:11 PM, Bruce Evans wrote: >> On Tue, 7 Aug 2012, Stephen Montgomery-Smith wrote: >> >>> On 08/07/2012 07:03 PM, Peter Jeremy wrote: >>>> The C99 standard (at least WG14/N1256) is quite liberal as regards >>>> cpow() and specifically allows a conforming implementation to just do: >>>> cpow(z, w) = cexp(w * clog(z)) >>>> >>>> The downside of this approach is that log() inherently loses precision >>>> by pushing (most of) the exponent bits into the fraction, displacing >>>> original fraction bits. I've therefore been looking at how to >>>> implement cpow(3) with a precision similar to pow(3). The following >>>> are some thoughts, together with questions. >>>> >>>> In the following: >>>> w = a + I*b >>>> z = c + I*d >>>> cis(r) = cos(r) + I*sin(r) >>>> t = u + I*v = clog(c + I*d) >>>> = log(hypot(c, d)) + I*atan2(d, c) >>>> >>>> cpow(z, w) = cexp(w * clog(z)) >>>> = cpow(c + I*d, a + I*b) >>>> = cexp((a + I*b) * clog(c + I*d)) >>>> = cexp((a + I*b) * (u + I*v)) >>>> = cexp((a*u - b*v) + I*(a*v + b*u)) >>>> = exp(a*u - b*v) * cis(a*v + b*u) >>> >>> I wouldn't regard errors in a*u-b*v as catastrophic cancellation. >>> This is because exp(0) = 1. So if the error in computing a*u-b*v is >>> approx DBL_EPSILON, and a*u-b*v approx zero, even though the relative >>> error in computing a*u-b*v is going to large (perhaps even infinite), >>> nevertheless the error in exp(a*u-b*v) may still be bounded by 1 or 2 >>> ulp. >> >> Indeed. It's sort of the opposite of catastrophic cancelation. We >> might have a*u-b*v off by a factor of 2, with value DBL_EPSILON when >> it should be 2*DBL_EPSILON. That is about 2**52 ulps in double >> precision. But then adding 1 turns it into an error of just 1 ulp. >> >> Even more so for the cos() part of cis(). Now the small term with the >> large error gets squared before adding 1. >> >> But for the sin() part of cis(), the final error is the one that the >> cancelation gave. It may be small absolute for cis(), but on >> multiplication by large exp() it will become large absolute (though >> no larger relative, and always small relative to the cos() part which >> is about 1). >> >>> More generally, as a mathematician, I would be far more concerned that >>> cpow(z,w) return accurate answers when w is real, and especially when >>> w is a small integer. Real life applications of cpow(z,y) when w is >>> not real are very few and far between. >> >> This is easy when z is also real, by using pow(). Then there is the >> issue of continuity -- how close is cpow(x + I*eps, y) to pow(x, y)? >> Similarly for cpow(x, y + I*eps). >> >>> I would be pleased if cpow(x,y) made special provisions for when y is >>> a small integer, and so, for example, cpow(z,2) was computed as z*z = >>> (x+y)*(x-y) + 2x*y*I. >> >> x^2 - y^2 is easier than x^2 + y^2 - 1 for clog(). >> >>> For cpow(z,3), you are going to have a hard time avoiding large >>> relative errors when x^2 = 3y^2 (i.e. z is parallel to a cube root of >>> 1). Frankly I see that as somewhat unavoidable. >> >> x^2 - 3y^2 is still easier than for clog(). But go to z^4, and the real >> part is x^4 - 6 x^2 y^2 + y^4, so it's much harder than clog(). Seems >> to need quadrupled precision for the sum of the 4th powers, then a trick >> to avoid maybe octupled precision :-). > > How about doing the real part of z^4 as > ((x-y)*(x+y)-2*x*y)*((x-y)*(x+y)+2*x*y) ? > That is what my suggested process would do. I tested it. It performs very poorly. I wrote a cpow function, just for integers. cpow(x+I*y,n) does very poorly when x/y approx tan(Pi/(2n)), that is, x+I*y is parallel to an nth root of -1. Only the cases n=0,1,2,-1,-2 do well. As a mathematician, I wouldn't expect any better. In fact I would be pleasantly surprised if it got the case n=2 correct. Just to point out: tan(Pi/4) = 1 tan(Pi/8) = sqrt(2) - 1 tan(Pi/16) = (1 + Sqrt(2)) (sqrt(4 - 2 sqrt(2)) - 1) if a_n = tan(Pi/(2n)), then a_{n+1} = (sqrt(a_n^2+1) - 1) / a_n
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?50232C2A.8040704>