From owner-freebsd-numerics@FreeBSD.ORG Sun Aug 5 10:06:02 2012 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0F7DF106564A for ; Sun, 5 Aug 2012 10:06:02 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail26.syd.optusnet.com.au (mail26.syd.optusnet.com.au [211.29.133.167]) by mx1.freebsd.org (Postfix) with ESMTP id 682C88FC1D for ; Sun, 5 Aug 2012 10:05:59 +0000 (UTC) Received: from c122-106-171-246.carlnfd1.nsw.optusnet.com.au (c122-106-171-246.carlnfd1.nsw.optusnet.com.au [122.106.171.246]) by mail26.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q75A5ruA001658 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 5 Aug 2012 20:05:54 +1000 Date: Sun, 5 Aug 2012 20:05:53 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Stephen Montgomery-Smith In-Reply-To: <501D9C36.2040207@missouri.edu> Message-ID: <20120805175106.X3574@besplex.bde.org> References: <5017111E.6060003@missouri.edu> <501C361D.4010807@missouri.edu> <20120804165555.X1231@besplex.bde.org> <501D51D7.1020101@missouri.edu> <20120805030609.R3101@besplex.bde.org> <501D9C36.2040207@missouri.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-numerics@FreeBSD.org, Bruce Evans Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Aug 2012 10:06:02 -0000 On Sat, 4 Aug 2012, Stephen Montgomery-Smith wrote: > On 08/04/2012 03:45 PM, Bruce Evans wrote: >>> >>>> I made good progress with clog[fl](). The accuracy problems near 0 >>>> are long solved and proved to be solved. > > I am starting to go through your code. > > What is the purpose of the macros norm and spadd? It is to provide easy-to-use maximally efficient primitives for multi-precision arithmetic. Full multi-precision is too slow, so old fdlibm code uses subsets of it tailored to the context. This is about as efficient as possible, but every use looks different so it is hard to read, and context-sensitive tricks are hard to write too. norm() and spadd() are the only primitives that I have found generally useful and efficient. I used to have a primitive for hi+lo splitting, but found it not general enough to use. For splitting, you often have a suitable subset of the bits in a register and don't want to use a general interface that would have to copy all the bits from another variable. Splitting is inherently inefficient, and my primitive originally focused on putting MD code in it to optimize it. I eventually found ways to avoid it or to optimize the MI code, and the MI code turned out to be quite good but is still not optimized well by compilers. Here are my full versions of these interfaces. Documenting them makes them unreadable :-). The following changes are needed: - make everything type-generic. Already done for the cut-down versions for clog*(). I don't know how to put the debugging code in type-generic macros. - deverbosify the comments. - due to the compiler bugfeatures whose description forms the main part of the comments, type-generic macros won't actually work if the arg type is float or double and float differs from float_t or double differs from double_t. These non-type-generic functions tend to cause compile-time warnings when there is a mismatch. How can this even be described without incrementing the verbosity instead of descreasing it? % /*- % * "Normalize" the terms in the infinite-precision expression (*a + *b) % * for the sum of 2 doubles so that *b is as small as possible relative % * to *a. (The resulting *a is the value of the expression in double % * precision and the resulting *b is the rounding error.) |*a| must % * be >= |*b|, and exponent overflow or underflow must not occur. This % * uses a Theorem of Dekker. See Knuth (1981) 4.2.2 Theorem C. % * % * The doubles must actually be double_t's (spelled as __double_t for % * namespace reasons) to avoid the following problems with plain % * doubles: % * % * - If expressions are evaluated in extra precision, then Theorem C % * doesn't apply unless the extra precision is discarded after every % * addition and subtraction. % * - C specifies that casts and assignments discard any extra precision. % * However, this is very broken in many compilers. % * - If we work around the compiler bugs using STRICT_ASSIGN(), then we % * get not only ugly code but also very inefficient code and % * unnecessarily inaccurate code. The conversions forced by the % * strict assignments are not just inefficient (since they involve % * extra operations); they are usually very inefficient since they % * give extra dependencies. Discarding the extra precision that we % * got for "free" then reduces the maximum precision "free"ly % * available in final values. % * - If we don't work around the compiler bugs, then we get code that % * usually works and is efficient and accurate as possible (all 3!), % * but this is very fragile. It depends on the compiler bugs being % * so perfect that extra precision is never lost before it is needed % * and callers not having complicated code or even function calls that % * would force loss of the extra precision. % * % * Using double_t's throughout works essentially by producing the same % * code as the "usually works" fragile case, but intentionally. % * Callers must pass us double_t's, and should also use double_t's for % * almost all their variables (and thus intermediate values), to get % * consistent extra precision and to avoid conversions. If the % * values cannot live throughout in registers, then spilling them will % * cause much the same inefficiencies as conversions, but the spills % * should now preserve extra precision even in the presence of the % * normal compiler bugs in this area, so the final result remains as % * accurate as possible. % */ % static inline void % onorm(__double_t *a, __double_t *b) % { % __double_t w; % % #ifdef DEBUG % volatile __double_t ia, ib, r, vw; % % ia = *a; % ib = *b; % assert(ia == 0 || fabs(ia) >= fabs(ib)); % #endif % w = *a + *b; % *b = (*a - w) + *b; % *a = w; % % #ifdef DEBUG % assert((long double)ia + ib == (long double)*a + *b); % vw = ia + ib; % r = ia - vw; % r += ib; % assert(vw == *a && r == *b); % #endif % } % % /* % * We now use the inline version again. This macro version has been % * reduced to a wrapper that hides the messy detail of taking the addresses % * (`a' and b must be lvalues, and the macro obfuscates this but keeps the % * API stable). Perhaps taking the addresses should be put back in the % * API. The results should be identical, but the inline version version % * is a cycle or 2 faster in more cases than it is a cycle slower. % * Apparently, more assignments help the register allocator. % */ % #define norm(a, b) onorm(&(a), &(b)) % /* Maybe back to macro version. */ % #define Norm(a, b) do { \ % __double_t w; \ % \ % w = (a) + (b); \ % (b) = ((a) - w) + (b); \ % (a) = w; \ % } while (0) % % /* % * Set x += c, where x is represented in extra precision as a + b. % * x must be sufficiently normalized and sufficiently larger than c, % * and the result is then sufficiently normalized. % * % * The details of ordering are that |a| must be >= |c| (so that (a, c) % * can be normalized without extra work to swap 'a' with c). The details of % * the normalization are that b must be small relative to the normalized a. % * Normalization of (a, c) makes the normalized c tiny relative to the % * normalized a, so b remains small relative to 'a' in the result. However, % * b need not ever be tiny relative to a. For example, b might be about % * 2**20 times smaller than 'a' to give about 20 extra bits of precision. % * That is usually enough, and adding c (which by normalization is about % * 2**53 times smaller than a) cannot change b significantly. However, % * cancellation of 'a' with c in normalization of (a, c) may reduce a % * significantly relative to b. The caller must ensure that significant % * cancellation doesn't occur, either by having c of the same sign as 'a', % * or by having |c| a few percent smaller than |a|. Pre-normalization of % * (a, b) may help. % * % * This is is a variant of an algorithm of Kahan (see Knuth (1981) 4.2.2 % * exercise 19). We gain considerable efficiency by requiring the terms to % * be sufficiently normalized and sufficiently increasing. % * % * `sp' means that the added term is in "single precision", i.e., not % * in multiple precision. % */ % #define spadd(a, b, c) do { \ % __double_t __tmp; \ % \ % __tmp = (c); \ % onorm(&__tmp, &(a)); \ % (b) += (a); \ % (a) = __tmp; \ % } while (0) % % /* norm() for float precision. */ % static inline void % onormf(__float_t *a, __float_t *b) % { % __float_t w; % % #ifdef DEBUG % volatile __float_t ia, ib, r, vw; % % ia = *a; % ib = *b; % assert(ia == 0 || fabsf(ia) >= fabsf(ib)); % #endif % w = *a + *b; % *b = (*a - w) + *b; % *a = w; % % #ifdef DEBUG % assert((long double)ia + ib == (long double)*a + *b); % vw = ia + ib; % r = ia - vw; % r += ib; % assert(vw == *a && r == *b); % #endif % } % % #define normf(a, b) onormf(&(a), &(b)) % % /* spadd() for float precision. */ % #define spaddf(a, b, c) do { \ % __float_t __tmp; \ % \ % __tmp = (c); \ % onormf(&__tmp, &(a)); \ % (b) += (a); \ % (a) = __tmp; \ % } while (0) % % /* % * norm() for long double precision. This is simpler since there is no % * extra precision to worry about (or if there is, then we have larger % * problems), but without extra precision the last 2 assertions in the % * debugging code are even weaker. % */ % static inline void % onorml(long double *a, long double *b) % { % long double w; % % #ifdef DEBUG % volatile long double ia, ib, r, vw; % % ia = *a; % ib = *b; % assert(ia == 0 || fabsl(ia) >= fabsl(ib)); % #endif % w = *a + *b; % *b = (*a - w) + *b; % *a = w; % % #ifdef DEBUG % assert(ia + ib == *a + *b); % vw = ia + ib; % r = ia - vw; % r += ib; % assert(vw == *a && r == *b); % #endif % } % % #define norml(a, b) onorml(&(a), &(b)) % % /* spadd() for long double precision. */ % #define spaddl(a, b, c) do { \ % long double __tmp; \ % \ % __tmp = (c); \ % onorml(&__tmp, &(a)); \ % (b) += (a); \ % (a) = __tmp; \ % } while (0) > Also, the proof you provided (which I haven't gone through in detail yet) - > do you intend to put them as comments in the code? No. It is far too long. > For example, when you write: > > % EXTRACT_WORDS(hx, lx, ax); > % bits = ((uint64_t)hx << 32) | lx; > % bits += 0x04000000; > % hx = bits >> 32; > % lx = bits & 0xf8000000; > % INSERT_WORDS(xh, hx, lx); > % xl = ax - xh; > > it would be very helpful to include a comment like > > % /* > * xh is ax rounded to nearest 26 bit floating point. > * xl is such that x = xh + xl. > * Because xh is rounded and not truncated to 26 bits, > * only the top 26 bits (not 27=53-26) of the mantissa of > * xl can be non-zero. > */ > > because while that can be deduced from reading the code, it is by no means > obvious. Again, far too long. This is much more routine than the above. Splitting is done without comment all over old fdlibm code. Only the point about the last bit is not routine. > By the way, that rounding to the nearest 26 bits is a very smart idea. But it is the natural way. It is just easier and more efficient (saves 1 integer instruction) to lop off the low bits, so lopping should be used if it works (I need to try it for float and ld80 precisions where it should work because MANT_DIG is even). e_hypot.c has hi+lo decomposition that is not as delicate as the above: % /* medium size a and b */ % w = a-b; % if (w>b) { % t1 = 0; % SET_HIGH_WORD(t1,ha); % t2 = a-t1; % w = sqrt(t1*t1-(b*(-b)-t2*(a+t1))); % } else { % a = a+a; % y1 = 0; % SET_HIGH_WORD(y1,hb); This lops off the low 32 bits (y1 = b lopped). The previous 2 lines can be written as INSERT_WORDS(y1, hb, 0), but old fdlibm code likes to write it as above. Both methods have serious efficiency problems on modern CPUs unless the compiler optimizes them, since they asks for partial object accesses (2 * 32-bit words instead of 1 64-bit access). Old fdlibm code accidentally reduces these problems by asking for a full initialization of y. % y2 = b - y1; Remaining 32 bits of b. % t1 = 0; % SET_HIGH_WORD(t1,ha+0x00100000); % t2 = a - t1; Similarly for a, except a was doubled just above, so ha is out of date and we fix it up by adding 1 to its exponent bit-field. This is a good example of a context-sensitive splitting. (However, better code maintain ha next to a and let the compiler optimize away the change to ha in paths where the change is not used.) % w = sqrt(t1*y1-(w*(-w)-(t1*y2+t2*b))); % } The arg for sqrt() must be a*a + b*b rearranged for accurcy. Since w*w is a 32x32 -> 53 bit operation, this is obviously not a full doubled precision operation. I noticed a full doubled precision multiplication most recently in das's s_fma.c: % /* % * Compute a*b exactly, returning the exact result in a struct dd. We assume % * that both a and b are normalized, so no underflow or overflow will occur. % * The current rounding mode must be round-to-nearest. % */ I think the requirement on the rounding mode is only so that the splitting rounds to nearest. % static inline struct dd % dd_mul(double a, double b) % { % static const double split = 0x1p27 + 1.0; % struct dd ret; % double ha, hb, la, lb, p, q; % % p = a * split; % ha = a - p; % ha += p; I prefer splitting by addition. This method seems to be just slower (1 extra multiplication and giving 1 extra dependence than for ha = ha + redux - redux). % la = a - ha; This and the additive method are fragile, and the above is broken on i386 with rounding precision FP_PE. clogl() uses the additive method, but clog() and clogf() use more direct methods to avoid problems with extra precision. The assignments to ha are also missing STRICT_ASSIGN(). Corresponding float precision code would be broken on i386 with the default rounding precision FP_PD. We currently assume than the rounding precision on i386 is never changed from its default of FP_PD, except when calling long double functions it should always be changed to FP_PE, so that code like above code works. We are more careful in float functions. My tests can be configured to test non-default precisions. This only shows a few double functions that don't work with extra precision. But many float functions don't work in only float precision, because they have been optimized to depend on double precision. % % p = b * split; % hb = b - p; % hb += p; % lb = b - hb; % % p = ha * hb; % q = ha * lb + la * hb; I don't see why this is always exact. For squaring, it is exact since it is 2*ha*la. % % ret.hi = p + q; % ret.lo = p - ret.hi + q + la * lb; I don't see why the final addition is exact. Except for that, (ret.hi, ret.lo) is norm(p, q), which is exact by Knuth Theorem C. Note that Theorem C applies because |p| >= |q|. norm() in the above depends on this too. norm() in clog*() uses a the larger code justified by a nearby Theorem (B?). s_fma.c also has a function dd_add() which uses the larger code. I copied the larger code in clog*() from there. This is an example of why the smaller norm() is a good primitive -- the larger one does more so it is less primitive, and things should be organized so that the larger one is rarely needed. % return (ret); % } Doubled precision addition is harder than multiplication, and I couldn't find a general enough version in s_fma.c to copy. There is a related problem with naming all of these functions. norm() isn't a very descriptive name, but it is the best that I could think of. spadd() is closer to sesquied precision than singled or doubled (it gives doubled + singled -> doubled). The above dd_mul() is also only sesquied and closer to singled than is spadd() (it gives singled x singled -> doubled). dd_add() is just the larger norm(), so it gives singled + singled -> doubled. s_fma.c gets by without anything like spadd() mainly because it only returns a singled result. After a dp_add() or 2, it adds the 2 components and then has to do a delicate rounding step, but no addition of doubled precision results. There are some more delicate and less routine roundings (not for splitting) at a critical bit in the middle. Some of these deserve and have comments. See s_cbrt.c (my verbose comment), s_exp2.c (no comment for redux magic; s_exp2l.c might say more) and e_rem_pio2.c (redux magic obtained from s_exp2.c). After seeing a few of these 0x1.MpN redux constants, using them becomes routine except for the problems with extra precision. Bruce From owner-freebsd-numerics@FreeBSD.ORG Sun Aug 5 18:48:56 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id CAE281065670 for ; Sun, 5 Aug 2012 18:48:56 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id 90DE78FC12 for ; Sun, 5 Aug 2012 18:48:56 +0000 (UTC) Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q75ImrIv006552; Sun, 5 Aug 2012 13:48:54 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <501EC015.3000808@missouri.edu> Date: Sun, 05 Aug 2012 13:48:53 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:14.0) Gecko/20120714 Thunderbird/14.0 MIME-Version: 1.0 To: Bruce Evans References: <5017111E.6060003@missouri.edu> <501C361D.4010807@missouri.edu> <20120804165555.X1231@besplex.bde.org> <501D51D7.1020101@missouri.edu> <20120805030609.R3101@besplex.bde.org> <501D9C36.2040207@missouri.edu> <20120805175106.X3574@besplex.bde.org> In-Reply-To: <20120805175106.X3574@besplex.bde.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-numerics@freebsd.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Aug 2012 18:48:56 -0000 On 08/05/2012 05:05 AM, Bruce Evans wrote: > On Sat, 4 Aug 2012, Stephen Montgomery-Smith wrote: >> >> What is the purpose of the macros norm and spadd? > > [Answer snipped for brevity] Thank you. Your answers are very helpful. I had no idea what norm and spadd did, and now just even glancing through your answers tells me exactly what they do (even if I haven't yet fully understood why they work). If you don't want lengthy comments in the code, perhaps you could simply include the following link in a comment: http://lists.freebsd.org/pipermail/freebsd-numerics/2012-August/000020.html Let me say that in my proposed code for casinh, etc, that I do have essay length comments. http://people.freebsd.org/~stephen/catrig.c I plan to respect your style (keep out very lengthy comments), but I hope whomever commits my code will respect my style and keep the lengthy comments. From owner-freebsd-numerics@FreeBSD.ORG Sun Aug 5 19:19:55 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 770D8106566B for ; Sun, 5 Aug 2012 19:19:55 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) by mx1.freebsd.org (Postfix) with ESMTP id 505A58FC14 for ; Sun, 5 Aug 2012 19:19:55 +0000 (UTC) Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu [127.0.0.1]) by troutmask.apl.washington.edu (8.14.5/8.14.5) with ESMTP id q75JJsPd050391; Sun, 5 Aug 2012 12:19:54 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.5/8.14.5/Submit) id q75JJsVv050390; Sun, 5 Aug 2012 12:19:54 -0700 (PDT) (envelope-from sgk) Date: Sun, 5 Aug 2012 12:19:54 -0700 From: Steve Kargl To: Stephen Montgomery-Smith Message-ID: <20120805191954.GA50379@troutmask.apl.washington.edu> References: <5017111E.6060003@missouri.edu> <501C361D.4010807@missouri.edu> <20120804165555.X1231@besplex.bde.org> <501D51D7.1020101@missouri.edu> <20120805030609.R3101@besplex.bde.org> <501D9C36.2040207@missouri.edu> <20120805175106.X3574@besplex.bde.org> <501EC015.3000808@missouri.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <501EC015.3000808@missouri.edu> User-Agent: Mutt/1.4.2.3i Cc: freebsd-numerics@freebsd.org, Bruce Evans Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Aug 2012 19:19:55 -0000 On Sun, Aug 05, 2012 at 01:48:53PM -0500, Stephen Montgomery-Smith wrote: > On 08/05/2012 05:05 AM, Bruce Evans wrote: > >On Sat, 4 Aug 2012, Stephen Montgomery-Smith wrote: > > >> > >>What is the purpose of the macros norm and spadd? > > > >[Answer snipped for brevity] > > Thank you. Your answers are very helpful. I had no idea what norm and > spadd did, and now just even glancing through your answers tells me > exactly what they do (even if I haven't yet fully understood why they work). > > If you don't want lengthy comments in the code, perhaps you could simply > include the following link in a comment: > http://lists.freebsd.org/pipermail/freebsd-numerics/2012-August/000020.html > > Let me say that in my proposed code for casinh, etc, that I do have > essay length comments. > http://people.freebsd.org/~stephen/catrig.c > > I plan to respect your style (keep out very lengthy comments), but I > hope whomever commits my code will respect my style and keep the lengthy > comments. Essay long comments probably belong in the man page if the essay is important to the details of implementation. Bruce may disagree with me. -- Steve From owner-freebsd-numerics@FreeBSD.ORG Sun Aug 5 19:40:57 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E7ED4106566B for ; Sun, 5 Aug 2012 19:40:57 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id 90AEF8FC1A for ; Sun, 5 Aug 2012 19:40:57 +0000 (UTC) Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q75JeuRj009888; Sun, 5 Aug 2012 14:40:56 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <501ECC48.40001@missouri.edu> Date: Sun, 05 Aug 2012 14:40:56 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:14.0) Gecko/20120714 Thunderbird/14.0 MIME-Version: 1.0 To: Steve Kargl References: <5017111E.6060003@missouri.edu> <501C361D.4010807@missouri.edu> <20120804165555.X1231@besplex.bde.org> <501D51D7.1020101@missouri.edu> <20120805030609.R3101@besplex.bde.org> <501D9C36.2040207@missouri.edu> <20120805175106.X3574@besplex.bde.org> <501EC015.3000808@missouri.edu> <20120805191954.GA50379@troutmask.apl.washington.edu> In-Reply-To: <20120805191954.GA50379@troutmask.apl.washington.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-numerics@freebsd.org, Bruce Evans Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Aug 2012 19:40:58 -0000 On 08/05/2012 02:19 PM, Steve Kargl wrote: > On Sun, Aug 05, 2012 at 01:48:53PM -0500, Stephen Montgomery-Smith wrote: >> Let me say that in my proposed code for casinh, etc, that I do have >> essay length comments. >> http://people.freebsd.org/~stephen/catrig.c > > Essay long comments probably belong in the man page if the essay is > important to the details of implementation. Bruce may disagree with > me. Except that the comments cannot really be understood without the code, and vice versa. For example, there are places where I compute things like X+Y, and computing Y would cause an underflow, but I happen to know that Y is so much smaller than X that I only need to compute X. But the reason why I can be sure that Y is so much smaller than X is quite subtle - you have to read the code that generates X and the parts that make up Y to see why this is true. Look in my code for the phrase "This cannot happen" and look at the comment this phrase appears in, and the comment below it. It simply doesn't belong in a man page. But the arguments are of the nature of a mathematical proof, and if the arguments aren't presented somewhere, in my opinion the code would be incomprehensible. You could argue that the reader of the code could simply read the paper (*) I got the algorithm from. But I made significant deviations from the algorithm, not least because that paper used exception handling to handle underflows and overflows, whereas I detect them before they happen. Also, I defined the function, which I called f(a,b) = 0.5 * (hypot(a,b) - b) = 0.5*a*a/(hypot(a,b) + b) which was implicitly used several times in that paper, but never presented as a single concept. In my opinion the algorithm is not obvious, but it is also necessary, because numeric instability can occur in rather awkward ways. It isn't immediately obvious how to work around them. I thought the ideas in the paper were brilliant. I feel very strongly that my essay long comments should stay inside the code. I think I do a good job explaining a rather subtle algorithm, and I would hope that a reader could completely understand the algorithm by reading my code + comments. (*) "Implementing the complex arcsine and arccosine functions using exception handling" by T. E. Hull, Thomas F. Fairgrieve, and Ping Tak Peter Tang, published in ACM Transactions on Mathematical Software, Volume 23 Issue 3, 1997, Pages 299-335, http://dl.acm.org/citation.cfm?id=275324 From owner-freebsd-numerics@FreeBSD.ORG Tue Aug 7 20:57:43 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5FB581065670 for ; Tue, 7 Aug 2012 20:57:43 +0000 (UTC) (envelope-from peter@rulingia.com) Received: from vps.rulingia.com (host-122-100-2-194.octopus.com.au [122.100.2.194]) by mx1.freebsd.org (Postfix) with ESMTP id C587C8FC17 for ; Tue, 7 Aug 2012 20:57:41 +0000 (UTC) Received: from server.rulingia.com (c220-239-247-45.belrs5.nsw.optusnet.com.au [220.239.247.45]) by vps.rulingia.com (8.14.5/8.14.5) with ESMTP id q77KveRu048932 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Wed, 8 Aug 2012 06:57:40 +1000 (EST) (envelope-from peter@rulingia.com) X-Bogosity: Ham, spamicity=0.000000 Received: from server.rulingia.com (localhost.rulingia.com [127.0.0.1]) by server.rulingia.com (8.14.5/8.14.5) with ESMTP id q77KvPDr010806 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Wed, 8 Aug 2012 06:57:26 +1000 (EST) (envelope-from peter@server.rulingia.com) Received: (from peter@localhost) by server.rulingia.com (8.14.5/8.14.5/Submit) id q77KvPA6010805 for freebsd-numerics@freebsd.org; Wed, 8 Aug 2012 06:57:25 +1000 (EST) (envelope-from peter) Date: Wed, 8 Aug 2012 06:57:25 +1000 From: Peter Jeremy To: freebsd-numerics@freebsd.org Message-ID: <20120807205725.GA10572@server.rulingia.com> References: <5017111E.6060003@missouri.edu> <501C361D.4010807@missouri.edu> <20120804165555.X1231@besplex.bde.org> <501D51D7.1020101@missouri.edu> <20120805030609.R3101@besplex.bde.org> <501D9C36.2040207@missouri.edu> <20120805175106.X3574@besplex.bde.org> <501EC015.3000808@missouri.edu> <20120805191954.GA50379@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="qMm9M+Fa2AknHoGS" Content-Disposition: inline In-Reply-To: <20120805191954.GA50379@troutmask.apl.washington.edu> X-PGP-Key: http://www.rulingia.com/keys/peter.pgp User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Aug 2012 20:57:43 -0000 --qMm9M+Fa2AknHoGS Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2012-Aug-05 12:19:54 -0700, Steve Kargl wrote: >On Sun, Aug 05, 2012 at 01:48:53PM -0500, Stephen Montgomery-Smith wrote: >> I plan to respect your style (keep out very lengthy comments), but I=20 >> hope whomever commits my code will respect my style and keep the lengthy= =20 >> comments. > >Essay long comments probably belong in the man page if the essay is >important to the details of implementation. Bruce may disagree with >me. I also believe that implementation details belong in the code and not the man page. The man page should contain information necessary to use the function: Domain, range, behaviour with exceptional inputs, accuracy and similar. Any "unexpected" behaviours as a side-effect of the implementation can be mentioned under BUGS. The user shouldn't need to know that it's implemented by range-reduction to [0 .. e/7] and then evaluated with an 11th order truncated Chebyshev polynomial that's had coefficients tweaked by a demon I summoned one evening. OTOH, the code itself should contain comments explaining what it is doing - particularly where it is relying on details of the floating point implementation or the difference between an operation on real (or complex) numbers and the same operation on float/double/... numbers. It's also useful to document why alternative (particularly "obviously better" alternative) approaches won't work. --=20 Peter Jeremy --qMm9M+Fa2AknHoGS Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAlAhgTUACgkQ/opHv/APuIfeuQCfasUDBNGffA4ztejWbAEPLBCc WeAAniSA2z7iZZFlr47CYHlYonMtEvBk =4suN -----END PGP SIGNATURE----- --qMm9M+Fa2AknHoGS-- From owner-freebsd-numerics@FreeBSD.ORG Tue Aug 7 21:19:21 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 925A6106564A for ; Tue, 7 Aug 2012 21:19:21 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) by mx1.freebsd.org (Postfix) with ESMTP id 4C7D28FC17 for ; Tue, 7 Aug 2012 21:19:21 +0000 (UTC) Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu [127.0.0.1]) by troutmask.apl.washington.edu (8.14.5/8.14.5) with ESMTP id q77LJExI061807; Tue, 7 Aug 2012 14:19:14 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.5/8.14.5/Submit) id q77LJEg0061806; Tue, 7 Aug 2012 14:19:14 -0700 (PDT) (envelope-from sgk) Date: Tue, 7 Aug 2012 14:19:14 -0700 From: Steve Kargl To: Peter Jeremy Message-ID: <20120807211914.GA58818@troutmask.apl.washington.edu> References: <5017111E.6060003@missouri.edu> <501C361D.4010807@missouri.edu> <20120804165555.X1231@besplex.bde.org> <501D51D7.1020101@missouri.edu> <20120805030609.R3101@besplex.bde.org> <501D9C36.2040207@missouri.edu> <20120805175106.X3574@besplex.bde.org> <501EC015.3000808@missouri.edu> <20120805191954.GA50379@troutmask.apl.washington.edu> <20120807205725.GA10572@server.rulingia.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20120807205725.GA10572@server.rulingia.com> User-Agent: Mutt/1.4.2.3i Cc: freebsd-numerics@freebsd.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Aug 2012 21:19:21 -0000 On Wed, Aug 08, 2012 at 06:57:25AM +1000, Peter Jeremy wrote: > On 2012-Aug-05 12:19:54 -0700, Steve Kargl wrote: > >On Sun, Aug 05, 2012 at 01:48:53PM -0500, Stephen Montgomery-Smith wrote: > >> I plan to respect your style (keep out very lengthy comments), but I > >> hope whomever commits my code will respect my style and keep the lengthy > >> comments. > > > >Essay long comments probably belong in the man page if the essay is > >important to the details of implementation. Bruce may disagree with > >me. > > I also believe that implementation details belong in the code and not > the man page. The man page should contain information necessary to > use the function: Domain, range, behaviour with exceptional inputs, > accuracy and similar. Any "unexpected" behaviours as a side-effect of > the implementation can be mentioned under BUGS. The user shouldn't > need to know that it's implemented by range-reduction to [0 .. e/7] > and then evaluated with an 11th order truncated Chebyshev polynomial > that's had coefficients tweaked by a demon I summoned one evening. > > OTOH, the code itself should contain comments explaining what it is > doing - particularly where it is relying on details of the floating > point implementation or the difference between an operation on real > (or complex) numbers and the same operation on float/double/... > numbers. It's also useful to document why alternative (particularly > "obviously better" alternative) approaches won't work. There can be 2 man pages. One that provides the user with the usual information you outline above (e.g., casin.3), and one that provides implementation details (e.g. casin.7). OTOH, I'm just happy someone else is taking up the task(s) of implementing missing C99 libm functionality. -- Steve From owner-freebsd-numerics@FreeBSD.ORG Tue Aug 7 23:45:12 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id A8D481065672 for ; Tue, 7 Aug 2012 23:45:12 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id 6C5FB8FC15 for ; Tue, 7 Aug 2012 23:45:12 +0000 (UTC) Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q77NMuKQ016617 for ; Tue, 7 Aug 2012 18:22:57 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <5021A350.40608@missouri.edu> Date: Tue, 07 Aug 2012 18:22:56 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:14.0) Gecko/20120714 Thunderbird/14.0 MIME-Version: 1.0 To: freebsd-numerics@freebsd.org References: <5017111E.6060003@missouri.edu> <501C361D.4010807@missouri.edu> <20120804165555.X1231@besplex.bde.org> <501D51D7.1020101@missouri.edu> <20120805030609.R3101@besplex.bde.org> <501D9C36.2040207@missouri.edu> <20120805175106.X3574@besplex.bde.org> <501EC015.3000808@missouri.edu> <20120805191954.GA50379@troutmask.apl.washington.edu> <20120807205725.GA10572@server.rulingia.com> In-Reply-To: <20120807205725.GA10572@server.rulingia.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Aug 2012 23:45:12 -0000 On 08/07/2012 03:57 PM, Peter Jeremy wrote: > On 2012-Aug-05 12:19:54 -0700, Steve Kargl wrote: >> On Sun, Aug 05, 2012 at 01:48:53PM -0500, Stephen Montgomery-Smith wrote: >>> I plan to respect your style (keep out very lengthy comments), but I >>> hope whomever commits my code will respect my style and keep the lengthy >>> comments. >> >> Essay long comments probably belong in the man page if the essay is >> important to the details of implementation. Bruce may disagree with >> me. > > I also believe that implementation details belong in the code and not > the man page. The man page should contain information necessary to > use the function: Domain, range, behaviour with exceptional inputs, > accuracy and similar. Any "unexpected" behaviours as a side-effect of > the implementation can be mentioned under BUGS. The user shouldn't > need to know that it's implemented by range-reduction to [0 .. e/7] > and then evaluated with an 11th order truncated Chebyshev polynomial > that's had coefficients tweaked by a demon I summoned one evening. > > OTOH, the code itself should contain comments explaining what it is > doing - particularly where it is relying on details of the floating > point implementation or the difference between an operation on real > (or complex) numbers and the same operation on float/double/... > numbers. It's also useful to document why alternative (particularly > "obviously better" alternative) approaches won't work. > I completely agree. I followed this philosophy in my code. From owner-freebsd-numerics@FreeBSD.ORG Wed Aug 8 00:04:13 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 54164106566C for ; Wed, 8 Aug 2012 00:04:13 +0000 (UTC) (envelope-from peter@rulingia.com) Received: from vps.rulingia.com (host-122-100-2-194.octopus.com.au [122.100.2.194]) by mx1.freebsd.org (Postfix) with ESMTP id DA0088FC0A for ; Wed, 8 Aug 2012 00:04:12 +0000 (UTC) Received: from server.rulingia.com (c220-239-247-45.belrs5.nsw.optusnet.com.au [220.239.247.45]) by vps.rulingia.com (8.14.5/8.14.5) with ESMTP id q78045DW049406 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Wed, 8 Aug 2012 10:04:06 +1000 (EST) (envelope-from peter@rulingia.com) X-Bogosity: Ham, spamicity=0.000000 Received: from server.rulingia.com (localhost.rulingia.com [127.0.0.1]) by server.rulingia.com (8.14.5/8.14.5) with ESMTP id q7803v7n034688 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Wed, 8 Aug 2012 10:03:57 +1000 (EST) (envelope-from peter@server.rulingia.com) Received: (from peter@localhost) by server.rulingia.com (8.14.5/8.14.5/Submit) id q7803vGk034686 for freebsd-numerics@freebsd.org; Wed, 8 Aug 2012 10:03:57 +1000 (EST) (envelope-from peter) Date: Wed, 8 Aug 2012 10:03:57 +1000 From: Peter Jeremy To: freebsd-numerics@freebsd.org Message-ID: <20120808000357.GA11128@server.rulingia.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="0OAP2g/MAC+5xKAE" Content-Disposition: inline X-PGP-Key: http://www.rulingia.com/keys/peter.pgp User-Agent: Mutt/1.5.21 (2010-09-15) Subject: cpow(3) implementations. X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Aug 2012 00:04:13 -0000 --0OAP2g/MAC+5xKAE Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable The C99 standard (at least WG14/N1256) is quite liberal as regards cpow() and specifically allows a conforming implementation to just do: cpow(z, w) =3D cexp(w * clog(z)) The downside of this approach is that log() inherently loses precision by pushing (most of) the exponent bits into the fraction, displacing original fraction bits. I've therefore been looking at how to implement cpow(3) with a precision similar to pow(3). The following are some thoughts, together with questions. In the following: w =3D a + I*b z =3D c + I*d cis(r) =3D cos(r) + I*sin(r) t =3D u + I*v =3D clog(c + I*d) =3D log(hypot(c, d)) + I*atan2(d, c) cpow(z, w) =3D cexp(w * clog(z)) =3D cpow(c + I*d, a + I*b) =3D cexp((a + I*b) * clog(c + I*d)) =3D cexp((a + I*b) * (u + I*v)) =3D cexp((a*u - b*v) + I*(a*v + b*u)) =3D exp(a*u - b*v) * cis(a*v + b*u) Unfortunately, either or both of (a*u - b*v) and (a*v + b*u) are potentially subject to catastrophic cancellation. However: exp(a*u - b*v) =3D exp(a*u) / exp(b*v) =3D pow(a, hypot(c, d)) / exp(b*v) Ignoring overflow/underflow issues, does this avoid the cancellation issues associated with the subtraction? (I realise there are still issues when a =E2=89=A4 0 due to the pow(3) definition). Since cis() is periodic, it's also possible to range-reduce each term of (a*v + b*u) independently - but that just fiddles with the issue since it just adds another layer of precision loss. Can anyone suggest an alternative approach? --=20 Peter Jeremy --0OAP2g/MAC+5xKAE Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAlAhrO0ACgkQ/opHv/APuIc18gCdEa+5IIDC+yucx8uyX/69ZjUk CsgAn25wsZLxw32By+J2sRTpbOguZoEj =+T1K -----END PGP SIGNATURE----- --0OAP2g/MAC+5xKAE-- From owner-freebsd-numerics@FreeBSD.ORG Wed Aug 8 01:35:54 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D17EE106564A for ; Wed, 8 Aug 2012 01:35:54 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) by mx1.freebsd.org (Postfix) with ESMTP id A7B658FC0A for ; Wed, 8 Aug 2012 01:35:54 +0000 (UTC) Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu [127.0.0.1]) by troutmask.apl.washington.edu (8.14.5/8.14.5) with ESMTP id q781Z0Pf019395; Tue, 7 Aug 2012 18:35:00 -0700 (PDT) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.5/8.14.5/Submit) id q781Yx77019394; Tue, 7 Aug 2012 18:34:59 -0700 (PDT) (envelope-from sgk) Date: Tue, 7 Aug 2012 18:34:59 -0700 From: Steve Kargl To: Peter Jeremy Message-ID: <20120808013459.GA19280@troutmask.apl.washington.edu> References: <20120808000357.GA11128@server.rulingia.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20120808000357.GA11128@server.rulingia.com> User-Agent: Mutt/1.4.2.3i Cc: freebsd-numerics@freebsd.org Subject: Re: cpow(3) implementations. X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Aug 2012 01:35:54 -0000 On Wed, Aug 08, 2012 at 10:03:57AM +1000, Peter Jeremy wrote: > The C99 standard (at least WG14/N1256) is quite liberal as regards > cpow() and specifically allows a conforming implementation to just do: > cpow(z, w) = cexp(w * clog(z)) > > The downside of this approach is that log() inherently loses precision > by pushing (most of) the exponent bits into the fraction, displacing > original fraction bits. I've therefore been looking at how to > implement cpow(3) with a precision similar to pow(3). The following > are some thoughts, together with questions. > > In the following: > w = a + I*b > z = c + I*d > cis(r) = cos(r) + I*sin(r) > t = u + I*v = clog(c + I*d) > = log(hypot(c, d)) + I*atan2(d, c) > > cpow(z, w) = cexp(w * clog(z)) > = cpow(c + I*d, a + I*b) > = cexp((a + I*b) * clog(c + I*d)) > = cexp((a + I*b) * (u + I*v)) > = cexp((a*u - b*v) + I*(a*v + b*u)) > = exp(a*u - b*v) * cis(a*v + b*u) > > Unfortunately, either or both of (a*u - b*v) and (a*v + b*u) are > potentially subject to catastrophic cancellation. However: > exp(a*u - b*v) = exp(a*u) / exp(b*v) > = pow(a, hypot(c, d)) / exp(b*v) > Ignoring overflow/underflow issues, does this avoid the cancellation > issues associated with the subtraction? (I realise there are still > issues when a ??? 0 due to the pow(3) definition). > > Since cis() is periodic, it's also possible to range-reduce each term > of (a*v + b*u) independently - but that just fiddles with the issue > since it just adds another layer of precision loss. Can anyone > suggest an alternative approach? I cheated and peaked at NetBSD's implementation, which is based on Moshier's cephes implementation. Your approach not only has to worry about catastrophic cancellation, but may need to switch between exp() and expm1() and log() and log1p() to achieve accuracy near the problematic areas for these functions. Anyway, Moshier converts z in polar coordinates and use pow(), exp(), sin(), and cos(). psuedo-code: x = creal(z) y = cimag(z) r = hypot(x,y) t = carg(z) if y == 0 if x == 0 return 0 else return pow(r,x)*(cos(t*x)+I*sin(t*x)) else return pow(r,x)*exp(-t*y)*(cos(t*x+y)+I*sin(t*x+y)) I don't if this is any better. -- Steve From owner-freebsd-numerics@FreeBSD.ORG Wed Aug 8 02:58:41 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7C675106566C for ; Wed, 8 Aug 2012 02:58:41 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id 2A6288FC0A for ; Wed, 8 Aug 2012 02:58:41 +0000 (UTC) Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q782wdOY030787 for ; Tue, 7 Aug 2012 21:58:40 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <5021D5DF.7010709@missouri.edu> Date: Tue, 07 Aug 2012 21:58:39 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:14.0) Gecko/20120714 Thunderbird/14.0 MIME-Version: 1.0 To: freebsd-numerics@freebsd.org References: <20120808000357.GA11128@server.rulingia.com> In-Reply-To: <20120808000357.GA11128@server.rulingia.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: cpow(3) implementations. X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Aug 2012 02:58:41 -0000 On 08/07/2012 07:03 PM, Peter Jeremy wrote: > The C99 standard (at least WG14/N1256) is quite liberal as regards > cpow() and specifically allows a conforming implementation to just do: > cpow(z, w) = cexp(w * clog(z)) > > The downside of this approach is that log() inherently loses precision > by pushing (most of) the exponent bits into the fraction, displacing > original fraction bits. I've therefore been looking at how to > implement cpow(3) with a precision similar to pow(3). The following > are some thoughts, together with questions. > > In the following: > w = a + I*b > z = c + I*d > cis(r) = cos(r) + I*sin(r) > t = u + I*v = clog(c + I*d) > = log(hypot(c, d)) + I*atan2(d, c) > > cpow(z, w) = cexp(w * clog(z)) > = cpow(c + I*d, a + I*b) > = cexp((a + I*b) * clog(c + I*d)) > = cexp((a + I*b) * (u + I*v)) > = cexp((a*u - b*v) + I*(a*v + b*u)) > = exp(a*u - b*v) * cis(a*v + b*u) I wouldn't regard errors in a*u-b*v as catastrophic cancellation. This is because exp(0) = 1. So if the error in computing a*u-b*v is approx DBL_EPSILON, and a*u-b*v approx zero, even though the relative error in computing a*u-b*v is going to large (perhaps even infinite), nevertheless the error in exp(a*u-b*v) may still be bounded by 1 or 2 ulp. More generally, as a mathematician, I would be far more concerned that cpow(z,w) return accurate answers when w is real, and especially when w is a small integer. Real life applications of cpow(z,y) when w is not real are very few and far between. I would be pleased if cpow(x,y) made special provisions for when y is a small integer, and so, for example, cpow(z,2) was computed as z*z = (x+y)*(x-y) + 2x*y*I. For cpow(z,3), you are going to have a hard time avoiding large relative errors when x^2 = 3y^2 (i.e. z is parallel to a cube root of 1). Frankly I see that as somewhat unavoidable. Nevertheless, if you pumped up cpow(z,w) so that when w was a relatively small integer, that it broke w into its base 2 expansion, and then multiplied lots of terms of the form cpow(z,2^n), each of which was computed by n repetitions of cpow(z,2), and then for negative integers by taking the reciprocal of the whole thing, and then in all other cases simply use cexp(w*clog(z)), I would be very happy. Other than that, if your cpow produced cpow(-1,0.5) = I + 1e-16, I wouldn't be shocked at all, and I would find this kind of error totally acceptable. From owner-freebsd-numerics@FreeBSD.ORG Wed Aug 8 09:24:58 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 14690106564A for ; Wed, 8 Aug 2012 09:24:58 +0000 (UTC) (envelope-from peter@rulingia.com) Received: from vps.rulingia.com (host-122-100-2-194.octopus.com.au [122.100.2.194]) by mx1.freebsd.org (Postfix) with ESMTP id 9C5878FC08 for ; Wed, 8 Aug 2012 09:24:57 +0000 (UTC) Received: from server.rulingia.com (c220-239-247-45.belrs5.nsw.optusnet.com.au [220.239.247.45]) by vps.rulingia.com (8.14.5/8.14.5) with ESMTP id q789Ot27050752 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Wed, 8 Aug 2012 19:24:56 +1000 (EST) (envelope-from peter@rulingia.com) X-Bogosity: Ham, spamicity=0.000000 Received: from server.rulingia.com (localhost.rulingia.com [127.0.0.1]) by server.rulingia.com (8.14.5/8.14.5) with ESMTP id q789OkvT066394 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Wed, 8 Aug 2012 19:24:46 +1000 (EST) (envelope-from peter@server.rulingia.com) Received: (from peter@localhost) by server.rulingia.com (8.14.5/8.14.5/Submit) id q789OjF4066393 for freebsd-numerics@freebsd.org; Wed, 8 Aug 2012 19:24:45 +1000 (EST) (envelope-from peter) Date: Wed, 8 Aug 2012 19:24:45 +1000 From: Peter Jeremy To: freebsd-numerics@freebsd.org Message-ID: <20120808092445.GB65056@server.rulingia.com> References: <20120808000357.GA11128@server.rulingia.com> <5021D5DF.7010709@missouri.edu> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="5vNYLRcllDrimb99" Content-Disposition: inline In-Reply-To: <5021D5DF.7010709@missouri.edu> X-PGP-Key: http://www.rulingia.com/keys/peter.pgp User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: cpow(3) implementations. X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Aug 2012 09:24:58 -0000 --5vNYLRcllDrimb99 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2012-Aug-07 18:34:59 -0700, Steve Kargl wrote: >On Wed, Aug 08, 2012 at 10:03:57AM +1000, Peter Jeremy wrote: >> cpow(z, w) =3D exp(a*u - b*v) * cis(a*v + b*u) =2E.. >> exp(a*u - b*v) =3D exp(a*u) / exp(b*v) >> =3D pow(a, hypot(c, d)) / exp(b*v) Oops, that last line is wrong and should be: exp(a*u - b*v) =3D pow(hypot(c, d), a) / exp(b*v) >I cheated and peaked at NetBSD's implementation, which is >based on Moshier's cephes implementation. The pseudocode didn't make sense to me (I couldn't see the 2nd argument anywhere) so I had a peek at both NetBSD and cephes-2.8 and the code is identical in both. I believe the pseudo code should be: cpow(z,w): x =3D creal(w) y =3D cimag(w) r =3D cabs(z) t =3D carg(z) if y =3D=3D 0 if x =3D=3D 0 return 0 else return pow(r,x)*(cos(t*x)+I*sin(t*x)) else return pow(r,x)*exp(-t*y)*(cos(t*x+y*log(r))+I*sin(t*x+y*log(r))) Note that the final line looks the same as my final line (modulo algebraic errors). The rest is just special casing. On 2012-Aug-07 21:58:39 -0500, Stephen Montgomery-Smith wrote: >I wouldn't regard errors in a*u-b*v as catastrophic cancellation. This=20 >is because exp(0) =3D 1. So if the error in computing a*u-b*v is approx= =20 >DBL_EPSILON, and a*u-b*v approx zero, even though the relative error in=20 >computing a*u-b*v is going to large (perhaps even infinite),=20 >nevertheless the error in exp(a*u-b*v) may still be bounded by 1 or 2 ulp. OK. >I would be pleased if cpow(x,y) made special provisions for when y is a=20 >small integer, and so, for example, cpow(z,2) was computed as z*z =3D=20 >(x+y)*(x-y) + 2x*y*I. Thu square-and-multiply approach makes sense. Where do you draw the line on "small"? At some point the rounding errors will exceed the trig approach. >Other than that, if your cpow produced cpow(-1,0.5) =3D I + 1e-16, I=20 >wouldn't be shocked at all, and I would find this kind of error totally=20 >acceptable. I like the idea of also special-casing "simple" rational powers. This offers things like cpow(-1.,0.5) =3D I and cpow(-8.,1./3.) =3D -2 Thank you both for your input. --=20 Peter Jeremy --5vNYLRcllDrimb99 Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAlAiMF0ACgkQ/opHv/APuIebdACfeAT7FOE2T0hKyq4Pe9cAkvuc xUsAn3rJsudDpsacUnPZbDIAN6wT6/IX =J5iy -----END PGP SIGNATURE----- --5vNYLRcllDrimb99-- From owner-freebsd-numerics@FreeBSD.ORG Wed Aug 8 16:50:43 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E8BE11065673 for ; Wed, 8 Aug 2012 16:50:43 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail01.syd.optusnet.com.au (mail01.syd.optusnet.com.au [211.29.132.182]) by mx1.freebsd.org (Postfix) with ESMTP id 67E4E8FC0C for ; Wed, 8 Aug 2012 16:50:42 +0000 (UTC) Received: from c122-106-171-246.carlnfd1.nsw.optusnet.com.au (c122-106-171-246.carlnfd1.nsw.optusnet.com.au [122.106.171.246]) by mail01.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q78GoYvH014335 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 9 Aug 2012 02:50:35 +1000 Date: Thu, 9 Aug 2012 02:50:34 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Stephen Montgomery-Smith In-Reply-To: <501EC015.3000808@missouri.edu> Message-ID: <20120809013542.D3915@besplex.bde.org> References: <5017111E.6060003@missouri.edu> <501C361D.4010807@missouri.edu> <20120804165555.X1231@besplex.bde.org> <501D51D7.1020101@missouri.edu> <20120805030609.R3101@besplex.bde.org> <501D9C36.2040207@missouri.edu> <20120805175106.X3574@besplex.bde.org> <501EC015.3000808@missouri.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-numerics@freebsd.org, Bruce Evans Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Aug 2012 16:50:44 -0000 On Sun, 5 Aug 2012, Stephen Montgomery-Smith wrote: > On 08/05/2012 05:05 AM, Bruce Evans wrote: >> On Sat, 4 Aug 2012, Stephen Montgomery-Smith wrote: > >>> >>> What is the purpose of the macros norm and spadd? >> >> [Answer snipped for brevity] > > Thank you. Your answers are very helpful. I had no idea what norm and spadd > did, and now just even glancing through your answers tells me exactly what > they do (even if I haven't yet fully understood why they work). I googled Dekker's algorithm and found norm() nearby in the book by Muller et al "Handbook of Floating Point...". They call it 2sumF (fast 2sum). It is Knuth's Theorem B They call Knuth's Theorem B 2sum. These are even worse names than norm() or das's dd_add(), especially for C :-). Dekker had a lot to do with these algorithms too. Muller discusses refinements that aren't in Knuth. In clog(), I switched to the full Dekker algorithm for multiplication and something more like a Dekker algorithm for addition, and now don't use spadd() or so many norm()s. My norm() is normally algorithm 2sumF, but in clog() I made it have the effect of 2sum (don't depend on the order of the args). I changed it back to 2sumF and named this normF(), and made norm() == 2sumF (do more arithmetic operations to avoid fabs()'s and branches for swapping the args), and used normF() where possible and otherwise norm(). Just one of each. das's dd_mul() seems to be not quite Dekker's algorithm as given in Muller. It is slower. Maybe it is actually Dekker's original algorithm: % /* % * Compute a*b exactly, returning the exact result in a struct dd. We assume % * that both a and b are normalized, so no underflow or overflow will occur. % * The current rounding mode must be round-to-nearest. % */ % static inline struct dd % dd_mul(double a, double b) % { % static const double split = 0x1p27 + 1.0; % struct dd ret; % double ha, hb, la, lb, p, q; % % p = a * split; % ha = a - p; % ha += p; % la = a - ha; The splitting part is Veldkamp's (?) algorithm. I now use it for efficiency, and had to work around compiler bugfeatures, and partially work around the fudamental inefficiency of writing it like the above. Double expressions may be evaluated in extra precision (and are on i386 if someone changes the rounding precision to FP_PE). To try to get rid of the extra precision, the above assigns the result of all expressions to a variable. C90 implicitly and C99 explicitly require assigments and casts to discard extra precision. But discarding extra precision is slow, and in most code it just reduces accuracy, so gcc on i386 never did this. This totally breaks the above algorithm. (FreeBSD uses a default rounding precision of FP_PD mainly to avoid such breakage for doubles, although not for floats.) The right way to fix this is to use double_t for most variables, so that there is no extra precision. @ static const double_t split = 0x1pNN + 1.0; @ struct dd ret; @ double_t ha, hb, la, lb, p, q; @ @ p = a * split; @ ha = a - p; @ ha += p; @ la = a - ha; Unfortunately, then NN is hard to determine efficiently. The rounding precision must be FP_PD or FP_PE, else doubles won't work at all. On i386: when it is FP_PD, NN must be 27, the same as now; when it is FP_PE, NN must be 38 (11 more because of 11 extra bits of precision). Similarly for float precision, except NN = 12 only works for the unusual rounding precision FP_PS; it must be 41 with FP_PD and 52 with FP_PE. This is too hard to configure, so I use an intermediate way: @ static const double_t split = 0x1p53 + 1; @ struct dd ret; @ double_t ha, hb, la, lb, p, q; @ @ p = (double)(a * split); @ ha = (double)(a - p) + p; @ la = a - ha; The casts are like assignments, and would have no effect on i386 due to the compiler bugfeatures if the variables had type double. They correctly have no effect if there is no extra precision (then double_t == double). On i386, double_t is long double, and the cast is now not null. This tricks the compiler into generating the correct code. STRICT_ASSIGN() can also be used to do essentially the same thing, but the above looks nicer and is more portable. Style changes in the above: - write 1 as 1, not as 1.0. Let the compiler promote it to the type of the other operand. - don't use an assignment for the intermediate step in (a - p + p). An assignment is neither necessary nor (with compiler bugfeatures) sufficient for discarding extra precision. Muller's book is careful about making rounding explicit. E.g., it writes the splitting part as something like: @ p = round(a * split); @ ha = round(a - p); @ ha = round(ha + p); @ la = round(a - ha); IIRC, it writes round() even when the result is exact, but that would not be careful so hopefully it doesn't. This could be done in C code too, by casting every operation that must be rounded (to kill extra precision), except with the compiler bugfeatures this doesn't actually work. This is hard to write and harder to read, so I prefer to use expressions like a - p + p, without any intermediate variables or casts, and depend on strict left to right evaluation and omit parentheses too. Using double_t makes this work, modulo other compiler bugfeatures. % % p = b * split; % hb = b - p; % hb += p; % lb = b - hb; % % p = ha * hb; In Dekker's algorithm as given by Muller, p is a * b (rounded). This has the technical advantage that the hi term becomes available earlier. % q = ha * lb + la * hb; % % ret.hi = p + q; % ret.lo = p - ret.hi + q + la * lb; Dekker's algorithm is: @ ret.hi = a * b; @ ret.lo = ha * hb - ret.hi + ha * lb + la * hb + la * lb; This looks shorter, but actually has the same number of operations (9). I now see that it has the technical disadvantage that the lo term becomes available later. So dd_add() is probably better if you want the full ret available fastest, while the version from Muller is better if you want ret.hi available fastest. Muller's organization is also easier to understand -- you calculate the rounded product, then a correction term. It is fairly clear that (ha * hb - ret.hi) is exact. Then more terms are added, from high to low, presumably with enough cancelations to make them exact too. I wonder why the term equal to q isn't parenthesized?: @ ret.lo = ha * hb - ret.hi + (q = ha * lb + la * hb) + la * lb; Parenthesizing this term makes ret.lo available earlier, if there is some otherwise-unused paralleism. % return (ret); % } > If you don't want lengthy comments in the code, perhaps you could simply > include the following link in a comment: > http://lists.freebsd.org/pipermail/freebsd-numerics/2012-August/000020.html Link to the whole list :-). > Let me say that in my proposed code for casinh, etc, that I do have essay > length comments. > http://people.freebsd.org/~stephen/catrig.c > > I plan to respect your style (keep out very lengthy comments), but I hope > whomever commits my code will respect my style and keep the lengthy comments. I respect style to a fault, but try to get you to change it before committing much :-). Bruce From owner-freebsd-numerics@FreeBSD.ORG Wed Aug 8 17:12:22 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0EE37106566C for ; Wed, 8 Aug 2012 17:12:22 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail14.syd.optusnet.com.au (mail14.syd.optusnet.com.au [211.29.132.195]) by mx1.freebsd.org (Postfix) with ESMTP id 9675D8FC08 for ; Wed, 8 Aug 2012 17:12:21 +0000 (UTC) Received: from c122-106-171-246.carlnfd1.nsw.optusnet.com.au (c122-106-171-246.carlnfd1.nsw.optusnet.com.au [122.106.171.246]) by mail14.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q78HCI2t009761 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 9 Aug 2012 03:12:20 +1000 Date: Thu, 9 Aug 2012 03:12:18 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Peter Jeremy In-Reply-To: <20120807205725.GA10572@server.rulingia.com> Message-ID: <20120809025220.N4114@besplex.bde.org> References: <5017111E.6060003@missouri.edu> <501C361D.4010807@missouri.edu> <20120804165555.X1231@besplex.bde.org> <501D51D7.1020101@missouri.edu> <20120805030609.R3101@besplex.bde.org> <501D9C36.2040207@missouri.edu> <20120805175106.X3574@besplex.bde.org> <501EC015.3000808@missouri.edu> <20120805191954.GA50379@troutmask.apl.washington.edu> <20120807205725.GA10572@server.rulingia.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-numerics@freebsd.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Aug 2012 17:12:22 -0000 On Wed, 8 Aug 2012, Peter Jeremy wrote: > On 2012-Aug-05 12:19:54 -0700, Steve Kargl wrote: >> On Sun, Aug 05, 2012 at 01:48:53PM -0500, Stephen Montgomery-Smith wrote: >>> I plan to respect your style (keep out very lengthy comments), but I >>> hope whomever commits my code will respect my style and keep the lengthy >>> comments. >> >> Essay long comments probably belong in the man page if the essay is >> important to the details of implementation. Bruce may disagree with >> me. Of course only details visible to applications belong in the man page. But I think many details belong in meta-comments in something more like a readme than in every source file. > I also believe that implementation details belong in the code and not > the man page. The man page should contain information necessary to > use the function: Domain, range, behaviour with exceptional inputs, > accuracy and similar. Any "unexpected" behaviours as a side-effect of > the implementation can be mentioned under BUGS. Note that FreeBSD man pages mostly have less than this, and less than even the non-Annex parts of C99, after removing some out-of date meta-info from math.3. > The user shouldn't > need to know that it's implemented by range-reduction to [0 .. e/7] > and then evaluated with an 11th order truncated Chebyshev polynomial > that's had coefficients tweaked by a demon I summoned one evening. Unless the implementation affects the user-visible behaviour. For example, on i386 trig functions are still implemented using the i387 (except in my version), so applications are exposed to the huge errors near multiples of pi/2 (starting at 1*pi/2) that are normal for the i387. A note about this would point to the right place to blame :-). > OTOH, the code itself should contain comments explaining what it is > doing - particularly where it is relying on details of the floating > point implementation or the difference between an operation on real > (or complex) numbers and the same operation on float/double/... > numbers. That's too detailed. Almost every line depends on IEEE754 arithmetic and on this not being too broken. Many depend on the arithmetic _not_ being the same as real arithmetic. An example is almost every line in Dekker's algorithm. > It's also useful to document why alternative (particularly > "obviously better" alternative) approaches won't work. Again too detailed. On most lines I consider a couple of routine alternatives. For critical lines, I consider and try many more. Bruce From owner-freebsd-numerics@FreeBSD.ORG Wed Aug 8 18:11:58 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 66DEC1065674 for ; Wed, 8 Aug 2012 18:11:58 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail35.syd.optusnet.com.au (mail35.syd.optusnet.com.au [211.29.133.51]) by mx1.freebsd.org (Postfix) with ESMTP id EFC7A8FC08 for ; Wed, 8 Aug 2012 18:11:57 +0000 (UTC) Received: from c122-106-171-246.carlnfd1.nsw.optusnet.com.au (c122-106-171-246.carlnfd1.nsw.optusnet.com.au [122.106.171.246]) by mail35.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q78IBnX6003719 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 9 Aug 2012 04:11:50 +1000 Date: Thu, 9 Aug 2012 04:11:49 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Stephen Montgomery-Smith In-Reply-To: <5021D5DF.7010709@missouri.edu> Message-ID: <20120809033123.V4514@besplex.bde.org> References: <20120808000357.GA11128@server.rulingia.com> <5021D5DF.7010709@missouri.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-numerics@freebsd.org Subject: Re: cpow(3) implementations. X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Aug 2012 18:11:58 -0000 On Tue, 7 Aug 2012, Stephen Montgomery-Smith wrote: > On 08/07/2012 07:03 PM, Peter Jeremy wrote: >> The C99 standard (at least WG14/N1256) is quite liberal as regards >> cpow() and specifically allows a conforming implementation to just do: >> cpow(z, w) = cexp(w * clog(z)) >> >> The downside of this approach is that log() inherently loses precision >> by pushing (most of) the exponent bits into the fraction, displacing >> original fraction bits. I've therefore been looking at how to >> implement cpow(3) with a precision similar to pow(3). The following >> are some thoughts, together with questions. >> >> In the following: >> w = a + I*b >> z = c + I*d >> cis(r) = cos(r) + I*sin(r) >> t = u + I*v = clog(c + I*d) >> = log(hypot(c, d)) + I*atan2(d, c) >> >> cpow(z, w) = cexp(w * clog(z)) >> = cpow(c + I*d, a + I*b) >> = cexp((a + I*b) * clog(c + I*d)) >> = cexp((a + I*b) * (u + I*v)) >> = cexp((a*u - b*v) + I*(a*v + b*u)) >> = exp(a*u - b*v) * cis(a*v + b*u) > > I wouldn't regard errors in a*u-b*v as catastrophic cancellation. This is > because exp(0) = 1. So if the error in computing a*u-b*v is approx > DBL_EPSILON, and a*u-b*v approx zero, even though the relative error in > computing a*u-b*v is going to large (perhaps even infinite), nevertheless the > error in exp(a*u-b*v) may still be bounded by 1 or 2 ulp. Indeed. It's sort of the opposite of catastrophic cancelation. We might have a*u-b*v off by a factor of 2, with value DBL_EPSILON when it should be 2*DBL_EPSILON. That is about 2**52 ulps in double precision. But then adding 1 turns it into an error of just 1 ulp. Even more so for the cos() part of cis(). Now the small term with the large error gets squared before adding 1. But for the sin() part of cis(), the final error is the one that the cancelation gave. It may be small absolute for cis(), but on multiplication by large exp() it will become large absolute (though no larger relative, and always small relative to the cos() part which is about 1). > More generally, as a mathematician, I would be far more concerned that > cpow(z,w) return accurate answers when w is real, and especially when w is a > small integer. Real life applications of cpow(z,y) when w is not real are > very few and far between. This is easy when z is also real, by using pow(). Then there is the issue of continuity -- how close is cpow(x + I*eps, y) to pow(x, y)? Similarly for cpow(x, y + I*eps). > I would be pleased if cpow(x,y) made special provisions for when y is a small > integer, and so, for example, cpow(z,2) was computed as z*z = (x+y)*(x-y) + > 2x*y*I. x^2 - y^2 is easier than x^2 + y^2 - 1 for clog(). > For cpow(z,3), you are going to have a hard time avoiding large relative > errors when x^2 = 3y^2 (i.e. z is parallel to a cube root of 1). Frankly I > see that as somewhat unavoidable. x^2 - 3y^2 is still easier than for clog(). But go to z^4, and the real part is x^4 - 6 x^2 y^2 + y^4, so it's much harder than clog(). Seems to need quadrupled precision for the sum of the 4th powers, then a trick to avoid maybe octupled precision :-). > Nevertheless, if you pumped up cpow(z,w) so that when w was a relatively > small integer, that it broke w into its base 2 expansion, and then multiplied > lots of terms of the form cpow(z,2^n), each of which was computed by n > repetitions of cpow(z,2), and then for negative integers by taking the > reciprocal of the whole thing, and then in all other cases simply use > cexp(w*clog(z)), I would be very happy. Is it obvious that this is reasonably continuous? I suppose it is, except what "reasonably" is. > Other than that, if your cpow produced cpow(-1,0.5) = I + 1e-16, I wouldn't > be shocked at all, and I would find this kind of error totally acceptable. cexp(0.5 * I * M_PI) in fact produces I + 0.612e-16. This is probably very accurate, since M_PI differs from pi by up to half an ulp. clog(-1) can do no better than produce 0.5 * I * M_PI (and it should have no difficult producing exactly that). Bruce From owner-freebsd-numerics@FreeBSD.ORG Wed Aug 8 19:49:09 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id B32D3106566C for ; Wed, 8 Aug 2012 19:49:09 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id 89ACF8FC18 for ; Wed, 8 Aug 2012 19:49:09 +0000 (UTC) Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q78Jn65A097379; Wed, 8 Aug 2012 14:49:07 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <5022C2B2.2030803@missouri.edu> Date: Wed, 08 Aug 2012 14:49:06 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:14.0) Gecko/20120714 Thunderbird/14.0 MIME-Version: 1.0 To: Bruce Evans References: <20120808000357.GA11128@server.rulingia.com> <5021D5DF.7010709@missouri.edu> <20120809033123.V4514@besplex.bde.org> In-Reply-To: <20120809033123.V4514@besplex.bde.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-numerics@freebsd.org Subject: Re: cpow(3) implementations. X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Aug 2012 19:49:09 -0000 On 08/08/2012 01:11 PM, Bruce Evans wrote: > On Tue, 7 Aug 2012, Stephen Montgomery-Smith wrote: > >> On 08/07/2012 07:03 PM, Peter Jeremy wrote: >>> The C99 standard (at least WG14/N1256) is quite liberal as regards >>> cpow() and specifically allows a conforming implementation to just do: >>> cpow(z, w) = cexp(w * clog(z)) >>> >>> The downside of this approach is that log() inherently loses precision >>> by pushing (most of) the exponent bits into the fraction, displacing >>> original fraction bits. I've therefore been looking at how to >>> implement cpow(3) with a precision similar to pow(3). The following >>> are some thoughts, together with questions. >>> >>> In the following: >>> w = a + I*b >>> z = c + I*d >>> cis(r) = cos(r) + I*sin(r) >>> t = u + I*v = clog(c + I*d) >>> = log(hypot(c, d)) + I*atan2(d, c) >>> >>> cpow(z, w) = cexp(w * clog(z)) >>> = cpow(c + I*d, a + I*b) >>> = cexp((a + I*b) * clog(c + I*d)) >>> = cexp((a + I*b) * (u + I*v)) >>> = cexp((a*u - b*v) + I*(a*v + b*u)) >>> = exp(a*u - b*v) * cis(a*v + b*u) >> >> I wouldn't regard errors in a*u-b*v as catastrophic cancellation. >> This is because exp(0) = 1. So if the error in computing a*u-b*v is >> approx DBL_EPSILON, and a*u-b*v approx zero, even though the relative >> error in computing a*u-b*v is going to large (perhaps even infinite), >> nevertheless the error in exp(a*u-b*v) may still be bounded by 1 or 2 >> ulp. > > Indeed. It's sort of the opposite of catastrophic cancelation. We > might have a*u-b*v off by a factor of 2, with value DBL_EPSILON when > it should be 2*DBL_EPSILON. That is about 2**52 ulps in double > precision. But then adding 1 turns it into an error of just 1 ulp. > > Even more so for the cos() part of cis(). Now the small term with the > large error gets squared before adding 1. > > But for the sin() part of cis(), the final error is the one that the > cancelation gave. It may be small absolute for cis(), but on > multiplication by large exp() it will become large absolute (though > no larger relative, and always small relative to the cos() part which > is about 1). > >> More generally, as a mathematician, I would be far more concerned that >> cpow(z,w) return accurate answers when w is real, and especially when >> w is a small integer. Real life applications of cpow(z,y) when w is >> not real are very few and far between. > > This is easy when z is also real, by using pow(). Then there is the > issue of continuity -- how close is cpow(x + I*eps, y) to pow(x, y)? > Similarly for cpow(x, y + I*eps). > >> I would be pleased if cpow(x,y) made special provisions for when y is >> a small integer, and so, for example, cpow(z,2) was computed as z*z = >> (x+y)*(x-y) + 2x*y*I. > > x^2 - y^2 is easier than x^2 + y^2 - 1 for clog(). > >> For cpow(z,3), you are going to have a hard time avoiding large >> relative errors when x^2 = 3y^2 (i.e. z is parallel to a cube root of >> 1). Frankly I see that as somewhat unavoidable. > > x^2 - 3y^2 is still easier than for clog(). But go to z^4, and the real > part is x^4 - 6 x^2 y^2 + y^4, so it's much harder than clog(). Seems > to need quadrupled precision for the sum of the 4th powers, then a trick > to avoid maybe octupled precision :-). How about doing the real part of z^4 as ((x-y)*(x+y)-2*x*y)*((x-y)*(x+y)+2*x*y) ? That is what my suggested process would do. From owner-freebsd-numerics@FreeBSD.ORG Thu Aug 9 03:19:14 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 59624106566B for ; Thu, 9 Aug 2012 03:19:14 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id 305858FC12 for ; Thu, 9 Aug 2012 03:19:13 +0000 (UTC) Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q793J65p039860 for ; Wed, 8 Aug 2012 22:19:06 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <50232C2A.8040704@missouri.edu> Date: Wed, 08 Aug 2012 22:19:06 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:14.0) Gecko/20120714 Thunderbird/14.0 MIME-Version: 1.0 To: freebsd-numerics@freebsd.org References: <20120808000357.GA11128@server.rulingia.com> <5021D5DF.7010709@missouri.edu> <20120809033123.V4514@besplex.bde.org> <5022C2B2.2030803@missouri.edu> In-Reply-To: <5022C2B2.2030803@missouri.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: cpow(3) implementations. X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Aug 2012 03:19:14 -0000 On 08/08/2012 02:49 PM, Stephen Montgomery-Smith wrote: > On 08/08/2012 01:11 PM, Bruce Evans wrote: >> On Tue, 7 Aug 2012, Stephen Montgomery-Smith wrote: >> >>> On 08/07/2012 07:03 PM, Peter Jeremy wrote: >>>> The C99 standard (at least WG14/N1256) is quite liberal as regards >>>> cpow() and specifically allows a conforming implementation to just do: >>>> cpow(z, w) = cexp(w * clog(z)) >>>> >>>> The downside of this approach is that log() inherently loses precision >>>> by pushing (most of) the exponent bits into the fraction, displacing >>>> original fraction bits. I've therefore been looking at how to >>>> implement cpow(3) with a precision similar to pow(3). The following >>>> are some thoughts, together with questions. >>>> >>>> In the following: >>>> w = a + I*b >>>> z = c + I*d >>>> cis(r) = cos(r) + I*sin(r) >>>> t = u + I*v = clog(c + I*d) >>>> = log(hypot(c, d)) + I*atan2(d, c) >>>> >>>> cpow(z, w) = cexp(w * clog(z)) >>>> = cpow(c + I*d, a + I*b) >>>> = cexp((a + I*b) * clog(c + I*d)) >>>> = cexp((a + I*b) * (u + I*v)) >>>> = cexp((a*u - b*v) + I*(a*v + b*u)) >>>> = exp(a*u - b*v) * cis(a*v + b*u) >>> >>> I wouldn't regard errors in a*u-b*v as catastrophic cancellation. >>> This is because exp(0) = 1. So if the error in computing a*u-b*v is >>> approx DBL_EPSILON, and a*u-b*v approx zero, even though the relative >>> error in computing a*u-b*v is going to large (perhaps even infinite), >>> nevertheless the error in exp(a*u-b*v) may still be bounded by 1 or 2 >>> ulp. >> >> Indeed. It's sort of the opposite of catastrophic cancelation. We >> might have a*u-b*v off by a factor of 2, with value DBL_EPSILON when >> it should be 2*DBL_EPSILON. That is about 2**52 ulps in double >> precision. But then adding 1 turns it into an error of just 1 ulp. >> >> Even more so for the cos() part of cis(). Now the small term with the >> large error gets squared before adding 1. >> >> But for the sin() part of cis(), the final error is the one that the >> cancelation gave. It may be small absolute for cis(), but on >> multiplication by large exp() it will become large absolute (though >> no larger relative, and always small relative to the cos() part which >> is about 1). >> >>> More generally, as a mathematician, I would be far more concerned that >>> cpow(z,w) return accurate answers when w is real, and especially when >>> w is a small integer. Real life applications of cpow(z,y) when w is >>> not real are very few and far between. >> >> This is easy when z is also real, by using pow(). Then there is the >> issue of continuity -- how close is cpow(x + I*eps, y) to pow(x, y)? >> Similarly for cpow(x, y + I*eps). >> >>> I would be pleased if cpow(x,y) made special provisions for when y is >>> a small integer, and so, for example, cpow(z,2) was computed as z*z = >>> (x+y)*(x-y) + 2x*y*I. >> >> x^2 - y^2 is easier than x^2 + y^2 - 1 for clog(). >> >>> For cpow(z,3), you are going to have a hard time avoiding large >>> relative errors when x^2 = 3y^2 (i.e. z is parallel to a cube root of >>> 1). Frankly I see that as somewhat unavoidable. >> >> x^2 - 3y^2 is still easier than for clog(). But go to z^4, and the real >> part is x^4 - 6 x^2 y^2 + y^4, so it's much harder than clog(). Seems >> to need quadrupled precision for the sum of the 4th powers, then a trick >> to avoid maybe octupled precision :-). > > How about doing the real part of z^4 as > ((x-y)*(x+y)-2*x*y)*((x-y)*(x+y)+2*x*y) ? > That is what my suggested process would do. I tested it. It performs very poorly. I wrote a cpow function, just for integers. cpow(x+I*y,n) does very poorly when x/y approx tan(Pi/(2n)), that is, x+I*y is parallel to an nth root of -1. Only the cases n=0,1,2,-1,-2 do well. As a mathematician, I wouldn't expect any better. In fact I would be pleasantly surprised if it got the case n=2 correct. Just to point out: tan(Pi/4) = 1 tan(Pi/8) = sqrt(2) - 1 tan(Pi/16) = (1 + Sqrt(2)) (sqrt(4 - 2 sqrt(2)) - 1) if a_n = tan(Pi/(2n)), then a_{n+1} = (sqrt(a_n^2+1) - 1) / a_n From owner-freebsd-numerics@FreeBSD.ORG Thu Aug 9 21:34:00 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6C274106564A for ; Thu, 9 Aug 2012 21:34:00 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail03.syd.optusnet.com.au (mail03.syd.optusnet.com.au [211.29.132.184]) by mx1.freebsd.org (Postfix) with ESMTP id DE2C08FC08 for ; Thu, 9 Aug 2012 21:33:59 +0000 (UTC) Received: from c122-106-171-246.carlnfd1.nsw.optusnet.com.au (c122-106-171-246.carlnfd1.nsw.optusnet.com.au [122.106.171.246]) by mail03.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q79LXuvx028045 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 10 Aug 2012 07:33:57 +1000 Date: Fri, 10 Aug 2012 07:33:55 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Stephen Montgomery-Smith In-Reply-To: <50232C2A.8040704@missouri.edu> Message-ID: <20120810072650.E4653@besplex.bde.org> References: <20120808000357.GA11128@server.rulingia.com> <5021D5DF.7010709@missouri.edu> <20120809033123.V4514@besplex.bde.org> <5022C2B2.2030803@missouri.edu> <50232C2A.8040704@missouri.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-numerics@freebsd.org Subject: Re: cpow(3) implementations. X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Aug 2012 21:34:00 -0000 On Wed, 8 Aug 2012, Stephen Montgomery-Smith wrote: > On 08/08/2012 02:49 PM, Stephen Montgomery-Smith wrote: >> On 08/08/2012 01:11 PM, Bruce Evans wrote: >>> On Tue, 7 Aug 2012, Stephen Montgomery-Smith wrote: > ... >>>> For cpow(z,3), you are going to have a hard time avoiding large >>>> relative errors when x^2 = 3y^2 (i.e. z is parallel to a cube root of >>>> 1). Frankly I see that as somewhat unavoidable. >>> >>> x^2 - 3y^2 is still easier than for clog(). But go to z^4, and the real >>> part is x^4 - 6 x^2 y^2 + y^4, so it's much harder than clog(). Seems >>> to need quadrupled precision for the sum of the 4th powers, then a trick >>> to avoid maybe octupled precision :-). >> >> How about doing the real part of z^4 as >> ((x-y)*(x+y)-2*x*y)*((x-y)*(x+y)+2*x*y) ? >> That is what my suggested process would do. > > I tested it. It performs very poorly. x^2 - y^2 in z^2 may have large cancelation, and then squaring again may have further large cancelation, so the problem doesn't really change > I wrote a cpow function, just for integers. cpow(x+I*y,n) does very poorly > when x/y approx tan(Pi/(2n)), that is, x+I*y is parallel to an nth root of > -1. Only the cases n=0,1,2,-1,-2 do well. > > As a mathematician, I wouldn't expect any better. In fact I would be > pleasantly surprised if it got the case n=2 correct. I think that is "as a numerical analyst" :-). The magic of analytic functions involves delicate cancelations. Bruce