From owner-freebsd-numerics@FreeBSD.ORG Sun Jul 29 22:10:25 2012 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1FBAA106566B for ; Sun, 29 Jul 2012 22:10:25 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id C3B7E8FC14 for ; Sun, 29 Jul 2012 22:10:21 +0000 (UTC) Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q6TMAKqv042288 for ; Sun, 29 Jul 2012 17:10:20 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <5015B4CC.3090007@missouri.edu> Date: Sun, 29 Jul 2012 17:10:20 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:14.0) Gecko/20120714 Thunderbird/14.0 MIME-Version: 1.0 To: freebsd-numerics@FreeBSD.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Subject: PR bin/170206 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 29 Jul 2012 22:10:25 -0000 Now that this mailing list has been created, let me just provide a link to PR bin/170206 http://www.freebsd.org/cgi/query-pr.cgi?pr=170206 Two main threads to this discussion: 1. I am submitting complex arc-trig and complex arc-hyperbolic functions casin, cacos, catan, casinh, cacosh, catanh. 2. The main difficulty in trying to implement clog is computing it's real part: log(|z|) = 0.5*log(x^2 + y^2) when |z| is very close to 1. It is very hard to get a reasonably low relative error. I hope these discussions, as well as many other, move to this mailing list. Stephen From owner-freebsd-numerics@FreeBSD.ORG Sun Jul 29 22:27:31 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 85D9D106566B; Sun, 29 Jul 2012 22:27:31 +0000 (UTC) (envelope-from peter@rulingia.com) Received: from vps.rulingia.com (host-122-100-2-194.octopus.com.au [122.100.2.194]) by mx1.freebsd.org (Postfix) with ESMTP id C24578FC16; Sun, 29 Jul 2012 22:27:27 +0000 (UTC) Received: from server.rulingia.com (c220-239-247-45.belrs5.nsw.optusnet.com.au [220.239.247.45]) by vps.rulingia.com (8.14.5/8.14.5) with ESMTP id q6TMREDt080437 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Mon, 30 Jul 2012 08:27:14 +1000 (EST) (envelope-from peter@rulingia.com) X-Bogosity: Ham, spamicity=0.000000 Received: from server.rulingia.com (localhost.rulingia.com [127.0.0.1]) by server.rulingia.com (8.14.5/8.14.5) with ESMTP id q6TMR7u7063127 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 30 Jul 2012 08:27:07 +1000 (EST) (envelope-from peter@server.rulingia.com) Received: (from peter@localhost) by server.rulingia.com (8.14.5/8.14.5/Submit) id q6TMR6Vg063126; Mon, 30 Jul 2012 08:27:06 +1000 (EST) (envelope-from peter) Date: Mon, 30 Jul 2012 08:27:06 +1000 From: Peter Jeremy To: freebsd-numerics@freebsd.org Message-ID: <20120729222706.GA29048@server.rulingia.com> References: <20120718123627.D1575@besplex.bde.org> <20120722121219.GC73662@server.rulingia.com> <500DAD41.5030104@missouri.edu> <20120724113214.G934@besplex.bde.org> <501204AD.30605@missouri.edu> <20120727032611.GB25690@server.rulingia.com> <20120728125824.GA26553@server.rulingia.com> <501460BB.30806@missouri.edu> <20120728231300.GA20741@server.rulingia.com> <50148F02.4020104@missouri.edu> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="a8Wt8u1KmwUX3Y2C" Content-Disposition: inline In-Reply-To: <50148F02.4020104@missouri.edu> X-PGP-Key: http://www.rulingia.com/keys/peter.pgp User-Agent: Mutt/1.5.21 (2010-09-15) X-Mailman-Approved-At: Sun, 29 Jul 2012 22:36:51 +0000 Cc: Diane Bruce , Bruce Evans , John Baldwin , David Chisnall , Stephen Montgomery-Smith , Bruce Evans , Steve Kargl , David Schultz , Warner Losh Subject: Re: Use of C99 extra long double math functions after r236148 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: freebsd-numerics@freebsd.org List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 29 Jul 2012 22:27:31 -0000 --a8Wt8u1KmwUX3Y2C Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable [For anyone not previously in on this discussion, this is part of an ongoing discussion that has been occurring off-list in the absence of a suitable list. It is planned to archive that discussion here, subject to all the participants agreeing]. On 2012-Jul-28 20:16:50 -0500, Stephen Montgomery-Smith wrote: >One thing your program doesn't check are things like: > >real part of casinh(-0+I*x) is -0 >imaginary part of casinh(x-I*0) is -0 > >etc, where x is finite, non-zero. (This follows from casinh being odd=20 >and conjugate invariant.) As it currently exists, my program solely implements the checks in WG14/N1256 G.6. I hadn't considered extending that to verifying that purely real or imaginary inputs give purely real or imaginary outputs, with the appropriately signed zero. This might be reasonable but it's not completely trivial to implement in general since the domains of the real part can be different. I'm less sure of the next logical step, which is to check things like casinh(x + I*0) =3D asinh(x) + I*0 --=20 Peter Jeremy --a8Wt8u1KmwUX3Y2C Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAlAVuLoACgkQ/opHv/APuIcCtQCeL99bycVrFH4+cqxv/OaTyEyT lbMAn3MfLsRw9agSh6jcz4bgJFzMmFE2 =iTj6 -----END PGP SIGNATURE----- --a8Wt8u1KmwUX3Y2C-- From owner-freebsd-numerics@FreeBSD.ORG Sun Jul 29 22:39:37 2012 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 42C08106566B; Sun, 29 Jul 2012 22:39:37 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id D8C2E8FC12; Sun, 29 Jul 2012 22:39:36 +0000 (UTC) Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q6TMdR65044239; Sun, 29 Jul 2012 17:39:27 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <5015BB9F.90807@missouri.edu> Date: Sun, 29 Jul 2012 17:39:27 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:14.0) Gecko/20120714 Thunderbird/14.0 MIME-Version: 1.0 To: freebsd-numerics@FreeBSD.org References: <20120718123627.D1575@besplex.bde.org> <20120722121219.GC73662@server.rulingia.com> <500DAD41.5030104@missouri.edu> <20120724113214.G934@besplex.bde.org> <501204AD.30605@missouri.edu> <20120727032611.GB25690@server.rulingia.com> <20120728125824.GA26553@server.rulingia.com> <501460BB.30806@missouri.edu> <20120728231300.GA20741@server.rulingia.com> <50148F02.4020104@missouri.edu> <20120729222706.GA29048@server.rulingia.com> In-Reply-To: <20120729222706.GA29048@server.rulingia.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Mailman-Approved-At: Sun, 29 Jul 2012 23:14:30 +0000 Cc: Diane Bruce , Bruce Evans , John Baldwin , David Chisnall , Bruce Evans , Steve Kargl , David Schultz , Peter Jeremy , Warner Losh Subject: Re: Use of C99 extra long double math functions after r236148 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 29 Jul 2012 22:39:37 -0000 On 07/29/2012 05:27 PM, Peter Jeremy wrote: > [For anyone not previously in on this discussion, this is part of an > ongoing discussion that has been occurring off-list in the absence of > a suitable list. It is planned to archive that discussion here, > subject to all the participants agreeing]. > > On 2012-Jul-28 20:16:50 -0500, Stephen Montgomery-Smith wrote: >> One thing your program doesn't check are things like: >> >> real part of casinh(-0+I*x) is -0 >> imaginary part of casinh(x-I*0) is -0 >> >> etc, where x is finite, non-zero. (This follows from casinh being odd >> and conjugate invariant.) > > As it currently exists, my program solely implements the checks in > WG14/N1256 G.6. I hadn't considered extending that to verifying that > purely real or imaginary inputs give purely real or imaginary outputs, > with the appropriately signed zero. This might be reasonable but it's > not completely trivial to implement in general since the domains of > the real part can be different. Maybe this should be a different program, since its logical structure would be quite different. In particular, you wouldn't be checking the value of the non-zero parts. Also I forgot that the real part of casinh(0+I*x) isn't always 0. If |x|>1, it is something non-zero. And so you need to check that creal(casinh(0+I*x)) and creal(casinh(-0+I*x)) have opposite signs in this case. > I'm less sure of the next logical > step, which is to check things like > casinh(x + I*0) = asinh(x) + I*0 Does C99 mandate this? My programs probably won't satisfy this, because I realized that the computation works in these cases anyway. Of course, it would be easy to make it happen. From owner-freebsd-numerics@FreeBSD.ORG Sun Jul 29 23:53:12 2012 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2D279106564A for ; Sun, 29 Jul 2012 23:53:12 +0000 (UTC) (envelope-from peter@rulingia.com) Received: from vps.rulingia.com (host-122-100-2-194.octopus.com.au [122.100.2.194]) by mx1.freebsd.org (Postfix) with ESMTP id CD4988FC0C for ; Sun, 29 Jul 2012 23:53:11 +0000 (UTC) Received: from server.rulingia.com (c220-239-247-45.belrs5.nsw.optusnet.com.au [220.239.247.45]) by vps.rulingia.com (8.14.5/8.14.5) with ESMTP id q6TNr9FF080672 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Mon, 30 Jul 2012 09:53:09 +1000 (EST) (envelope-from peter@rulingia.com) X-Bogosity: Ham, spamicity=0.000000 Received: from server.rulingia.com (localhost.rulingia.com [127.0.0.1]) by server.rulingia.com (8.14.5/8.14.5) with ESMTP id q6TNr3Yl077823 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Mon, 30 Jul 2012 09:53:03 +1000 (EST) (envelope-from peter@server.rulingia.com) Received: (from peter@localhost) by server.rulingia.com (8.14.5/8.14.5/Submit) id q6TNr2uV077818 for freebsd-numerics@FreeBSD.org; Mon, 30 Jul 2012 09:53:02 +1000 (EST) (envelope-from peter) Date: Mon, 30 Jul 2012 09:53:02 +1000 From: Peter Jeremy To: freebsd-numerics@FreeBSD.org Message-ID: <20120729235302.GA77192@server.rulingia.com> References: <500DAD41.5030104@missouri.edu> <20120724113214.G934@besplex.bde.org> <501204AD.30605@missouri.edu> <20120727032611.GB25690@server.rulingia.com> <20120728125824.GA26553@server.rulingia.com> <501460BB.30806@missouri.edu> <20120728231300.GA20741@server.rulingia.com> <50148F02.4020104@missouri.edu> <20120729222706.GA29048@server.rulingia.com> <5015BB9F.90807@missouri.edu> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="6c2NcOVqGQ03X4Wi" Content-Disposition: inline In-Reply-To: <5015BB9F.90807@missouri.edu> X-PGP-Key: http://www.rulingia.com/keys/peter.pgp User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Subject: Re: Use of C99 extra long double math functions after r236148 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 29 Jul 2012 23:53:12 -0000 --6c2NcOVqGQ03X4Wi Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable [Pruning CC list to keep mailman happy]] On 2012-Jul-29 17:39:27 -0500, Stephen Montgomery-Smith wrote: >On 07/29/2012 05:27 PM, Peter Jeremy wrote: >> WG14/N1256 G.6. I hadn't considered extending that to verifying that >> purely real or imaginary inputs give purely real or imaginary outputs, >> with the appropriately signed zero. This might be reasonable but it's >> not completely trivial to implement in general since the domains of >> the real part can be different. > >Maybe this should be a different program, since its logical structure=20 >would be quite different. In particular, you wouldn't be checking the=20 >value of the non-zero parts. Adding code to skip checks on the real or imaginary part of the result is quite easy. >Also I forgot that the real part of casinh(0+I*x) isn't always 0. If=20 >|x|>1, it is something non-zero. And so you need to check that=20 >creal(casinh(0+I*x)) and creal(casinh(-0+I*x)) have opposite signs in=20 >this case. This is related to my "domains can be different" comment. Adding code to restrict the domain of the argument to be compatible with the real function isn't too hard (off the top of my head, I think the domains are all one of [-1,1], [0,Inf] or (0,Inf]). Handling behaviour outside that domain requires more special-casing because the behaviour is less consistent. >> I'm less sure of the next logical >> step, which is to check things like >> casinh(x + I*0) =3D asinh(x) + I*0 > >Does C99 mandate this? Nope. They are just mathematical equivalences (at least within the domains supported by the real function). POLA implies that they should be true but unless they are special-cased, the complex variant probably has less accuracy as a result of the additional calculations to support the imaginary component. > My programs probably won't satisfy this, because=20 >I realized that the computation works in these cases anyway. Of course,= =20 >it would be easy to make it happen. It's probably up to the implementation - special casing pure real or imaginary arguments should give those cases a shorter and simpler (and therefore faster and more accurate) calculation but it's a matter of whether this case in common enough to justify the additional test(s) in all cases. It's also just occurred to me that doing so may result in unexpected output discontinuities between cfoo(x-I*tiny), cfoo(x-I*0), cfoo(x+I*0) and cfoo(x+I*tiny). --=20 Peter Jeremy --6c2NcOVqGQ03X4Wi Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAlAVzN4ACgkQ/opHv/APuIe5UwCdF6wpQAi6u98WpyPu8PlZDdQZ iqMAn2R5qrF3jIKUiL+P9RkSc73ShnIQ =yZ7u -----END PGP SIGNATURE----- --6c2NcOVqGQ03X4Wi-- From owner-freebsd-numerics@FreeBSD.ORG Mon Jul 30 00:21:06 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EAC84106564A for ; Mon, 30 Jul 2012 00:21:06 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id BC5908FC0C for ; Mon, 30 Jul 2012 00:21:06 +0000 (UTC) Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q6U0L5cD051097 for ; Sun, 29 Jul 2012 19:21:06 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <5015D372.2060502@missouri.edu> Date: Sun, 29 Jul 2012 19:21:06 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:14.0) Gecko/20120714 Thunderbird/14.0 MIME-Version: 1.0 To: freebsd-numerics@freebsd.org References: <500DAD41.5030104@missouri.edu> <20120724113214.G934@besplex.bde.org> <501204AD.30605@missouri.edu> <20120727032611.GB25690@server.rulingia.com> <20120728125824.GA26553@server.rulingia.com> <501460BB.30806@missouri.edu> <20120728231300.GA20741@server.rulingia.com> <50148F02.4020104@missouri.edu> <20120729222706.GA29048@server.rulingia.com> <5015BB9F.90807@missouri.edu> <20120729235302.GA77192@server.rulingia.com> In-Reply-To: <20120729235302.GA77192@server.rulingia.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: Use of C99 extra long double math functions after r236148 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 Jul 2012 00:21:07 -0000 On 07/29/2012 06:53 PM, Peter Jeremy wrote: > [Pruning CC list to keep mailman happy]] I had mailman complain to me in my last post. But it only took the moderator a few minutes to let it through. > > On 2012-Jul-29 17:39:27 -0500, Stephen Montgomery-Smith wrote: >> On 07/29/2012 05:27 PM, Peter Jeremy wrote: >>> WG14/N1256 G.6. I hadn't considered extending that to verifying that >>> purely real or imaginary inputs give purely real or imaginary outputs, >>> with the appropriately signed zero. This might be reasonable but it's >>> not completely trivial to implement in general since the domains of >>> the real part can be different. >> >> Maybe this should be a different program, since its logical structure >> would be quite different. In particular, you wouldn't be checking the >> value of the non-zero parts. > > Adding code to skip checks on the real or imaginary part of the > result is quite easy. > >> Also I forgot that the real part of casinh(0+I*x) isn't always 0. If >> |x|>1, it is something non-zero. And so you need to check that >> creal(casinh(0+I*x)) and creal(casinh(-0+I*x)) have opposite signs in >> this case. > > This is related to my "domains can be different" comment. Adding code > to restrict the domain of the argument to be compatible with the real > function isn't too hard (off the top of my head, I think the domains > are all one of [-1,1], [0,Inf] or (0,Inf]). Handling behaviour > outside that domain requires more special-casing because the behaviour > is less consistent. I have a rather good handle on what that behavior will be for the complex arc-functions (since I have been working hard on them recently). casinh(x+I*y) and casin(x+I*y) have positive real and imaginary parts if x and y are positive. (Positive in this context includes 0, but not -0.) In particular: the sign of creal(casinh(z)) is the same as the sign of x; the sign of cimag(casinh(z)) is the same as the sign of y; the sign of creal(casin(z)) is the same as the sign of x; the sign of cimag(casin(z)) is the same as the sign of y; cacosh(z) and cacos(z) always have positive real part. The imaginary part of cacos(x+I*y) has the opposite sign to x (since it is PI/2 - casin(x+I*y) The imaginary part of cacosh(x+I*y) has the same sign as x. The signs for catanh and catan are exactly the same as for casinh and casin. For clog: The imaginary part of clog(x+I*y) has the same sign as y. The real part of clog will never be -0, and this doesn't have to be checked. I am fairly sure I got these correct. If your program starts spitting out huge numbers of errors, then I am wrong. But it won't take me long to figure out which ones I got wrong. > >>> I'm less sure of the next logical >>> step, which is to check things like >>> casinh(x + I*0) = asinh(x) + I*0 >> >> Does C99 mandate this? > > Nope. They are just mathematical equivalences (at least within the > domains supported by the real function). POLA implies that they > should be true but unless they are special-cased, the complex variant > probably has less accuracy as a result of the additional calculations > to support the imaginary component. > >> My programs probably won't satisfy this, because >> I realized that the computation works in these cases anyway. Of course, >> it would be easy to make it happen. > > It's probably up to the implementation - special casing pure real or > imaginary arguments should give those cases a shorter and simpler (and > therefore faster and more accurate) calculation but it's a matter of > whether this case in common enough to justify the additional test(s) > in all cases. > > It's also just occurred to me that doing so may result in unexpected > output discontinuities between cfoo(x-I*tiny), cfoo(x-I*0), cfoo(x+I*0) > and cfoo(x+I*tiny). Since the functions are correct within a few ULP, you shouldn't see any discontinuities like this. The algorithms I coded are meant to be rather good at handling these situations. From owner-freebsd-numerics@FreeBSD.ORG Mon Jul 30 00:51:23 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 50CC51065670 for ; Mon, 30 Jul 2012 00:51:23 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id 010658FC08 for ; Mon, 30 Jul 2012 00:51:22 +0000 (UTC) Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q6U0pMVb053462 for ; Sun, 29 Jul 2012 19:51:22 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <5015DA8A.10200@missouri.edu> Date: Sun, 29 Jul 2012 19:51:22 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:14.0) Gecko/20120714 Thunderbird/14.0 MIME-Version: 1.0 To: freebsd-numerics@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Bad error computing 1/z X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 Jul 2012 00:51:23 -0000 As I was debugging catanh, I noticed the following oddness. If z = cpack(x,y), where x = 1 y = 0x1.25691d4068c910p+512, approximately 1.53672e+154 then the real part of 1/z is wrong by about 4ULP. The real part of 1/z is approx 4.2346e-309. I know the reciprocal of this would cause an overflow. I tried writing my own reciprocal algorithm, and it had exactly the same error. Does anyone have an explanation? I thought it might be that the number was somehow "denormalized" and it couldn't store enough digits in the mantissa. Or maybe the error is when mpfr converts the correct answer to double. I don't know. From owner-freebsd-numerics@FreeBSD.ORG Mon Jul 30 01:43:23 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 974471065672 for ; Mon, 30 Jul 2012 01:43:23 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id 690DE8FC17 for ; Mon, 30 Jul 2012 01:43:23 +0000 (UTC) Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q6U1hMWE056861 for ; Sun, 29 Jul 2012 20:43:22 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <5015E6BA.3020401@missouri.edu> Date: Sun, 29 Jul 2012 20:43:22 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:14.0) Gecko/20120714 Thunderbird/14.0 MIME-Version: 1.0 To: freebsd-numerics@freebsd.org References: <500DAD41.5030104@missouri.edu> <20120724113214.G934@besplex.bde.org> <501204AD.30605@missouri.edu> <20120727032611.GB25690@server.rulingia.com> <20120728125824.GA26553@server.rulingia.com> <501460BB.30806@missouri.edu> <20120728231300.GA20741@server.rulingia.com> <50148F02.4020104@missouri.edu> <20120729222706.GA29048@server.rulingia.com> <5015BB9F.90807@missouri.edu> <20120729235302.GA77192@server.rulingia.com> <5015D372.2060502@missouri.edu> In-Reply-To: <5015D372.2060502@missouri.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: Use of C99 extra long double math functions after r236148 X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 Jul 2012 01:43:23 -0000 On 07/29/2012 07:21 PM, Stephen Montgomery-Smith wrote: > On 07/29/2012 06:53 PM, Peter Jeremy wrote: >> [Pruning CC list to keep mailman happy]] > > I had mailman complain to me in my last post. But it only took the > moderator a few minutes to let it through. > >> >> On 2012-Jul-29 17:39:27 -0500, Stephen Montgomery-Smith >> wrote: >>> On 07/29/2012 05:27 PM, Peter Jeremy wrote: >>>> WG14/N1256 G.6. I hadn't considered extending that to verifying that >>>> purely real or imaginary inputs give purely real or imaginary outputs, >>>> with the appropriately signed zero. This might be reasonable but it's >>>> not completely trivial to implement in general since the domains of >>>> the real part can be different. >>> >>> Maybe this should be a different program, since its logical structure >>> would be quite different. In particular, you wouldn't be checking the >>> value of the non-zero parts. >> >> Adding code to skip checks on the real or imaginary part of the >> result is quite easy. >> >>> Also I forgot that the real part of casinh(0+I*x) isn't always 0. If >>> |x|>1, it is something non-zero. And so you need to check that >>> creal(casinh(0+I*x)) and creal(casinh(-0+I*x)) have opposite signs in >>> this case. >> >> This is related to my "domains can be different" comment. Adding code >> to restrict the domain of the argument to be compatible with the real >> function isn't too hard (off the top of my head, I think the domains >> are all one of [-1,1], [0,Inf] or (0,Inf]). Handling behaviour >> outside that domain requires more special-casing because the behaviour >> is less consistent. > > I have a rather good handle on what that behavior will be for the > complex arc-functions (since I have been working hard on them recently). > > casinh(x+I*y) and casin(x+I*y) have positive real and imaginary parts if > x and y are positive. (Positive in this context includes 0, but not -0.) > > In particular: > the sign of creal(casinh(z)) is the same as the sign of x; > the sign of cimag(casinh(z)) is the same as the sign of y; > the sign of creal(casin(z)) is the same as the sign of x; > the sign of cimag(casin(z)) is the same as the sign of y; > > cacosh(z) and cacos(z) always have positive real part. > > The imaginary part of cacos(x+I*y) has the opposite sign to x (since it > is PI/2 - casin(x+I*y) Oops The imaginary part of cacos(x+I*y) has the opposite sign to y (since it is PI/2 - casin(x+I*y) > > The imaginary part of cacosh(x+I*y) has the same sign as x. The imaginary part of cacosh(x+I*y) has the same sign as y. > > The signs for catanh and catan are exactly the same as for casinh and > casin. > > > > For clog: > > The imaginary part of clog(x+I*y) has the same sign as y. > The real part of clog will never be -0, and this doesn't have to be > checked. > > > I am fairly sure I got these correct. If your program starts spitting > out huge numbers of errors, then I am wrong. But it won't take me long > to figure out which ones I got wrong. > >> >>>> I'm less sure of the next logical >>>> step, which is to check things like >>>> casinh(x + I*0) = asinh(x) + I*0 >>> >>> Does C99 mandate this? >> >> Nope. They are just mathematical equivalences (at least within the >> domains supported by the real function). POLA implies that they >> should be true but unless they are special-cased, the complex variant >> probably has less accuracy as a result of the additional calculations >> to support the imaginary component. >> >>> My programs probably won't satisfy this, because >>> I realized that the computation works in these cases anyway. Of course, >>> it would be easy to make it happen. >> >> It's probably up to the implementation - special casing pure real or >> imaginary arguments should give those cases a shorter and simpler (and >> therefore faster and more accurate) calculation but it's a matter of >> whether this case in common enough to justify the additional test(s) >> in all cases. >> >> It's also just occurred to me that doing so may result in unexpected >> output discontinuities between cfoo(x-I*tiny), cfoo(x-I*0), cfoo(x+I*0) >> and cfoo(x+I*tiny). > > Since the functions are correct within a few ULP, you shouldn't see any > discontinuities like this. The algorithms I coded are meant to be > rather good at handling these situations. > > > _______________________________________________ > freebsd-numerics@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-numerics > To unsubscribe, send any mail to "freebsd-numerics-unsubscribe@freebsd.org" > > From owner-freebsd-numerics@FreeBSD.ORG Mon Jul 30 02:51:30 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 956991065670 for ; Mon, 30 Jul 2012 02:51:30 +0000 (UTC) (envelope-from peter@rulingia.com) Received: from vps.rulingia.com (host-122-100-2-194.octopus.com.au [122.100.2.194]) by mx1.freebsd.org (Postfix) with ESMTP id 0B3CD8FC0A for ; Mon, 30 Jul 2012 02:51:29 +0000 (UTC) Received: from server.rulingia.com (c220-239-247-45.belrs5.nsw.optusnet.com.au [220.239.247.45]) by vps.rulingia.com (8.14.5/8.14.5) with ESMTP id q6U2pQ7Y081126 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Mon, 30 Jul 2012 12:51:27 +1000 (EST) (envelope-from peter@rulingia.com) X-Bogosity: Ham, spamicity=0.000000 Received: from server.rulingia.com (localhost.rulingia.com [127.0.0.1]) by server.rulingia.com (8.14.5/8.14.5) with ESMTP id q6U2pJqC082836 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Mon, 30 Jul 2012 12:51:19 +1000 (EST) (envelope-from peter@server.rulingia.com) Received: (from peter@localhost) by server.rulingia.com (8.14.5/8.14.5/Submit) id q6U2pI2D082835 for freebsd-numerics@freebsd.org; Mon, 30 Jul 2012 12:51:18 +1000 (EST) (envelope-from peter) Date: Mon, 30 Jul 2012 12:51:18 +1000 From: Peter Jeremy To: freebsd-numerics@freebsd.org Message-ID: <20120730025118.GB7958@server.rulingia.com> References: <5015DA8A.10200@missouri.edu> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="CE+1k2dSO48ffgeK" Content-Disposition: inline In-Reply-To: <5015DA8A.10200@missouri.edu> X-PGP-Key: http://www.rulingia.com/keys/peter.pgp User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: Bad error computing 1/z X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 Jul 2012 02:51:30 -0000 --CE+1k2dSO48ffgeK Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable [Excuse the long lines] On 2012-Jul-29 19:51:22 -0500, Stephen Montgomery-Smith wrote: >As I was debugging catanh, I noticed the following oddness. > >If z =3D cpack(x,y), where >x =3D 1 >y =3D 0x1.25691d4068c910p+512, approximately 1.53672e+154 >then the real part of 1/z is wrong by about 4ULP. The real part of 1/z=20 >is approx 4.2346e-309. I know the reciprocal of this would cause an=20 >overflow. I think I'm missing a bit of context here. If I calculate it using doubles with gcc, I get: z: +1.0000000000000000e+00 0x3ff0000000000000 +1.5367160172612364e+1= 54 0x5ff25691d4068c91 1.0/z: +4.2346036163332497e-309 0x00030b8596824af0 -6.5073832039716640e-1= 55 0x9febeb7fc100edd7 Doing the same thing with 40 digits precision in emacs-calc, I calculate 1/z as: (16#C.2E165A092BC21FF1FBEB5F45FAB8DA33*16.^-257, -16#D.F5BFE08076EBB470CE55= E8BF55A1B0B5*16.^-129) (4.234603616333252354574574265966349487777e-309, -6.50738320397166433428813= 7901223264158669e-155) That give me fractions of: 0x0.30B8596824AF087FC7EFAD7D17EAE368D 0x1.BEB7FC100EDD768E19CABD17EAB43617 Therefore the perfectly rounded double result should be 0x00030b8596824af1 0x9febeb7fc100edd7 ie, the real part calculates as 1ULP low when using doubles. Note that since the real part is a denormal, it only has 50 bits of precision. Are you sure your ULP calculations are taking into account the number of fraction bits available when numbers are denormals? --=20 Peter Jeremy --CE+1k2dSO48ffgeK Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iEYEARECAAYFAlAV9qYACgkQ/opHv/APuIcvPgCgqBX4lYN/wwnAlkEk1lzgUBbh eJ8AoMD3ABR4zgi9TC2ew2WSLl3Gz4Pq =nz4n -----END PGP SIGNATURE----- --CE+1k2dSO48ffgeK-- From owner-freebsd-numerics@FreeBSD.ORG Mon Jul 30 03:05:14 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 605E6106566B for ; Mon, 30 Jul 2012 03:05:14 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id 274E98FC16 for ; Mon, 30 Jul 2012 03:05:14 +0000 (UTC) Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q6U35D3X062352 for ; Sun, 29 Jul 2012 22:05:13 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <5015F9E9.9070408@missouri.edu> Date: Sun, 29 Jul 2012 22:05:13 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:14.0) Gecko/20120714 Thunderbird/14.0 MIME-Version: 1.0 To: freebsd-numerics@freebsd.org References: <5015DA8A.10200@missouri.edu> <20120730025118.GB7958@server.rulingia.com> In-Reply-To: <20120730025118.GB7958@server.rulingia.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: Bad error computing 1/z X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 Jul 2012 03:05:14 -0000 On 07/29/2012 09:51 PM, Peter Jeremy wrote: > [Excuse the long lines] > > On 2012-Jul-29 19:51:22 -0500, Stephen Montgomery-Smith wrote: >> As I was debugging catanh, I noticed the following oddness. >> >> If z = cpack(x,y), where >> x = 1 >> y = 0x1.25691d4068c910p+512, approximately 1.53672e+154 >> then the real part of 1/z is wrong by about 4ULP. The real part of 1/z >> is approx 4.2346e-309. I know the reciprocal of this would cause an >> overflow. > > I think I'm missing a bit of context here. If I calculate it using > doubles with gcc, I get: > z: +1.0000000000000000e+00 0x3ff0000000000000 +1.5367160172612364e+154 0x5ff25691d4068c91 > 1.0/z: +4.2346036163332497e-309 0x00030b8596824af0 -6.5073832039716640e-155 0x9febeb7fc100edd7 > > Doing the same thing with 40 digits precision in emacs-calc, I > calculate 1/z as: > (16#C.2E165A092BC21FF1FBEB5F45FAB8DA33*16.^-257, -16#D.F5BFE08076EBB470CE55E8BF55A1B0B5*16.^-129) > (4.234603616333252354574574265966349487777e-309, -6.507383203971664334288137901223264158669e-155) > > That give me fractions of: > 0x0.30B8596824AF087FC7EFAD7D17EAE368D 0x1.BEB7FC100EDD768E19CABD17EAB43617 > Therefore the perfectly rounded double result should be > 0x00030b8596824af1 0x9febeb7fc100edd7 > ie, the real part calculates as 1ULP low when using doubles. > > Note that since the real part is a denormal, it only has 50 bits of > precision. Are you sure your ULP calculations are taking into account > the number of fraction bits available when numbers are denormals? > You answered my question. The real part is denormal, and so my ULP calculation were misleading. Thanks, Stephen From owner-freebsd-numerics@FreeBSD.ORG Mon Jul 30 03:35:15 2012 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 953E7106564A for ; Mon, 30 Jul 2012 03:35:15 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail04.syd.optusnet.com.au (mail04.syd.optusnet.com.au [211.29.132.185]) by mx1.freebsd.org (Postfix) with ESMTP id 2D17E8FC0A for ; Mon, 30 Jul 2012 03:35:14 +0000 (UTC) Received: from c122-106-171-246.carlnfd1.nsw.optusnet.com.au (c122-106-171-246.carlnfd1.nsw.optusnet.com.au [122.106.171.246]) by mail04.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q6U3Z5RR028963 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 30 Jul 2012 13:35:06 +1000 Date: Mon, 30 Jul 2012 13:35:05 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Stephen Montgomery-Smith In-Reply-To: <5015F9E9.9070408@missouri.edu> Message-ID: <20120730131929.Y876@besplex.bde.org> References: <5015DA8A.10200@missouri.edu> <20120730025118.GB7958@server.rulingia.com> <5015F9E9.9070408@missouri.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-numerics@FreeBSD.org Subject: Re: Bad error computing 1/z X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 Jul 2012 03:35:15 -0000 On Sun, 29 Jul 2012, Stephen Montgomery-Smith wrote: > On 07/29/2012 09:51 PM, Peter Jeremy wrote: >> [Excuse the long lines] >> >> On 2012-Jul-29 19:51:22 -0500, Stephen Montgomery-Smith >> wrote: >>> As I was debugging catanh, I noticed the following oddness. >>> >>> If z = cpack(x,y), where >>> x = 1 >>> y = 0x1.25691d4068c910p+512, approximately 1.53672e+154 >>> then the real part of 1/z is wrong by about 4ULP. The real part of 1/z >>> is approx 4.2346e-309. I know the reciprocal of this would cause an >>> overflow. >> ... >> Note that since the real part is a denormal, it only has 50 bits of >> precision. Are you sure your ULP calculations are taking into account >> the number of fraction bits available when numbers are denormals? > > You answered my question. The real part is denormal, and so my ULP > calculation were misleading. A large error is expected anyway. The calculation involves 1/(x*x+y*y). gcc won't be as careful as hypot(), so it will give various overflow and underflow and accuracy errors for x*x+y*y. After avoiding the overflow and underflow errors, I think x*x+y*y has an accuracy error of at most 1 ulp. The division adds another 0.5 ulps. 1/z calculated by gcc is much more accurate than the singular function z*z :-). The real part of the latter is x*x-y*y. This may have an error of almost 2**MANT_DIG ulps due to cancelation. The C has some examples with pseudocode for some of the worst problems. This code is too slow to actually use. I think only hardware overflow and underflow traps can be fast enough to use (when they don't occur often). Anything that uses fenv to test for exceptions after every operation is too slow, and testing after every few operation makes it hard to recover. A "few" probably needs to be a few hundreds or thousands since fenv accesses take about as long as a few tens or hundreds of additions or multiplications on at least x86. Bruce From owner-freebsd-numerics@FreeBSD.ORG Mon Jul 30 22:56:31 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 395531065670 for ; Mon, 30 Jul 2012 22:56:31 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id 039F78FC19 for ; Mon, 30 Jul 2012 22:56:30 +0000 (UTC) Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q6UMuUmT022826 for ; Mon, 30 Jul 2012 17:56:30 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <5017111E.6060003@missouri.edu> Date: Mon, 30 Jul 2012 17:56:30 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:14.0) Gecko/20120714 Thunderbird/14.0 MIME-Version: 1.0 To: freebsd-numerics@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 Jul 2012 22:56:31 -0000 I will be posting my updates on this web page: http://people.freebsd.org/~stephen/ The file catrig.c contains implementations of cacos(h), casin(h), catan(h). I have been working on it everyday - I seem unable to put it down. I have now tested the following values for x and y: 1. Random values normally distributed N(0,1) 2. Random values e^X where X is uniformly distributed on [-300,300] 3. Random values 1+e^x or -1+e^x where X is uniformly distributed on [-20,0]. 4. Combinations from (2) and (3). The worst case errors seem to be below 4 ULP. It passes Peter Jeremy's program http://www.rulingia.com/~peter/ctest.c for checking edge cases like inf, nan, -0 etc. I am looking for suggestions: 1. For the implementations - improvements to the program to make it more accurate. 2. Other ranges of values to test for. 3. And yes, I am now ready at accept changes to the writing style. Thanks, Stephen From owner-freebsd-numerics@FreeBSD.ORG Tue Jul 31 20:46:23 2012 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 522791065670 for ; Tue, 31 Jul 2012 20:46:23 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id 1A1F48FC19 for ; Tue, 31 Jul 2012 20:46:23 +0000 (UTC) Received: from [128.206.184.213] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q6VKkK7b024207; Tue, 31 Jul 2012 15:46:22 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <5018441C.5040909@missouri.edu> Date: Tue, 31 Jul 2012 15:46:20 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:14.0) Gecko/20120728 Thunderbird/14.0 MIME-Version: 1.0 To: Bruce Evans References: <5015DA8A.10200@missouri.edu> <20120730025118.GB7958@server.rulingia.com> <5015F9E9.9070408@missouri.edu> <20120730131929.Y876@besplex.bde.org> In-Reply-To: <20120730131929.Y876@besplex.bde.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-numerics@FreeBSD.org Subject: Re: Bad error computing 1/z X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 31 Jul 2012 20:46:23 -0000 On 07/29/12 22:35, Bruce Evans wrote: > 1/z calculated by gcc is much more accurate than the singular function > z*z :-). The real part of the latter is x*x-y*y. This may have an > error of almost 2**MANT_DIG ulps due to cancelation. Could one avoid the cancellation errors in computing x*x-y*y by using the formula (x-y)*(x+y) instead? Here is my proposed argument: Case 1: x and y have the same sign. If they are nearly equal, then the computation (x-y) should be exact. The error in computing (x+y) shouldn't be worse than 0.5 ULP. So the whole calculation could probably be performed with an error less than 1 ULP, maybe even 0.5 ULP if one took a lot of care. Case 2: x and y have different signs - just switch the roles of (x+y) and (x-y) in the preceding argument. Do you know for a fact that gcc uses x*x-y*y? To use (x-y)*(x+y) would be no worse performance wise - perhaps better if addition is cheaper than multiplication. It would be the simplest programming switch in the world, as long as one knew where to find the code fragment that needed to be changed. From owner-freebsd-numerics@FreeBSD.ORG Wed Aug 1 08:13:04 2012 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id D3EFA10656AA for ; Wed, 1 Aug 2012 08:13:04 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail06.syd.optusnet.com.au (mail06.syd.optusnet.com.au [211.29.132.187]) by mx1.freebsd.org (Postfix) with ESMTP id 520238FC0C for ; Wed, 1 Aug 2012 08:13:03 +0000 (UTC) Received: from c122-106-171-246.carlnfd1.nsw.optusnet.com.au (c122-106-171-246.carlnfd1.nsw.optusnet.com.au [122.106.171.246]) by mail06.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q718D0sh030560 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 1 Aug 2012 18:13:01 +1000 Date: Wed, 1 Aug 2012 18:12:59 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Stephen Montgomery-Smith In-Reply-To: <5018441C.5040909@missouri.edu> Message-ID: <20120801180703.B1153@besplex.bde.org> References: <5015DA8A.10200@missouri.edu> <20120730025118.GB7958@server.rulingia.com> <5015F9E9.9070408@missouri.edu> <20120730131929.Y876@besplex.bde.org> <5018441C.5040909@missouri.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-numerics@FreeBSD.org, Bruce Evans Subject: Re: Bad error computing 1/z X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Aug 2012 08:13:04 -0000 On Tue, 31 Jul 2012, Stephen Montgomery-Smith wrote: > On 07/29/12 22:35, Bruce Evans wrote: > >> 1/z calculated by gcc is much more accurate than the singular function >> z*z :-). The real part of the latter is x*x-y*y. This may have an >> error of almost 2**MANT_DIG ulps due to cancelation. > > Could one avoid the cancellation errors in computing x*x-y*y by using the > formula (x-y)*(x+y) instead? Yes, but there seems to be no similar trick for general complex multiplication or higher powers of z. > Here is my proposed argument: Case 1: x and y have the same sign. If they > are nearly equal, then the computation (x-y) should be exact. The error in > computing (x+y) shouldn't be worse than 0.5 ULP. So the whole calculation > could probably be performed with an error less than 1 ULP, maybe even 0.5 ULP > if one took a lot of care. Case 2: x and y have different signs - just > switch the roles of (x+y) and (x-y) in the preceding argument. 1.5 or even 2.0 ulps I think (since there are 3 operations). > Do you know for a fact that gcc uses x*x-y*y? To use (x-y)*(x+y) would be no > worse performance wise - perhaps better if addition is cheaper than > multiplication. It would be the simplest programming switch in the world, as > long as one knew where to find the code fragment that needed to be changed. I checked, and it doesn't seem to. Bruce From owner-freebsd-numerics@FreeBSD.ORG Fri Aug 3 20:35:47 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C90B1106566B for ; Fri, 3 Aug 2012 20:35:47 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id 8E1438FC0A for ; Fri, 3 Aug 2012 20:35:47 +0000 (UTC) Received: from [128.206.184.213] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q73KZfD0087564 for ; Fri, 3 Aug 2012 15:35:41 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <501C361D.4010807@missouri.edu> Date: Fri, 03 Aug 2012 15:35:41 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:14.0) Gecko/20120728 Thunderbird/14.0 MIME-Version: 1.0 To: freebsd-numerics@freebsd.org References: <5017111E.6060003@missouri.edu> In-Reply-To: <5017111E.6060003@missouri.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 03 Aug 2012 20:35:47 -0000 On 07/30/12 17:56, Stephen Montgomery-Smith wrote: > I will be posting my updates on this web page: > > http://people.freebsd.org/~stephen/ > > The file catrig.c contains implementations of cacos(h), casin(h), > catan(h). I have been working on it everyday - I seem unable to put it > down. I know you guys are busy and haven't had time to look this over. But in many ways this is a good thing, as I am constantly finding problems with the code. One of the hardest things is trying to avoid unwarranted underflows. For example, when |z| is large (bigger than 1e20) I use the approximation catanh(z) = 1/z + cpack(0, copysign(M_PI_2, cimag(z))) If the real part of 1/z underflows, then this is a legitimate underflow. But if the imaginary part of 1/z underflows, then this is not legitimate, because then this really small number is being added to plus or minus I*PI/2, and it should just disappear into rounding error. I found this out by testing the underflow flag for lots of random inputs. I ended up writing my own code to compute the real part of 1/z without computing the imaginary part. I have a working solution, but I'm sure it isn't optimal. (I see there is code in the IEEE standard PDF document which I might copy, or I might copy ideas from the hypot function.) Another problem I experienced - I sometimes need to compute log(0.5*y) when x is extremely small. I was shocked to find that this produced spurious underflows when x was very close to DBL_MIN. Of course with hindsight it is obvious why it happens, but these errors are hard to think of in advance, and hard to track down. Right now I think I have working code. But I also thought I had working code last week. So who knows. Anyway, thanks for reading my ramblings. I do find this stuff interesting, so if anyone wants to reply and continue the conversation, please do so. Stephen From owner-freebsd-numerics@FreeBSD.ORG Sat Aug 4 08:49:57 2012 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id DCD491065670 for ; Sat, 4 Aug 2012 08:49:57 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail27.syd.optusnet.com.au (mail27.syd.optusnet.com.au [211.29.133.168]) by mx1.freebsd.org (Postfix) with ESMTP id 7403A8FC0A for ; Sat, 4 Aug 2012 08:49:56 +0000 (UTC) Received: from c122-106-171-246.carlnfd1.nsw.optusnet.com.au (c122-106-171-246.carlnfd1.nsw.optusnet.com.au [122.106.171.246]) by mail27.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q748nmhm032279 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 4 Aug 2012 18:49:49 +1000 Date: Sat, 4 Aug 2012 18:49:48 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Stephen Montgomery-Smith In-Reply-To: <501C361D.4010807@missouri.edu> Message-ID: <20120804165555.X1231@besplex.bde.org> References: <5017111E.6060003@missouri.edu> <501C361D.4010807@missouri.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-numerics@FreeBSD.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 04 Aug 2012 08:49:58 -0000 On Fri, 3 Aug 2012, Stephen Montgomery-Smith wrote: > On 07/30/12 17:56, Stephen Montgomery-Smith wrote: >> I will be posting my updates on this web page: >> >> http://people.freebsd.org/~stephen/ >> >> The file catrig.c contains implementations of cacos(h), casin(h), >> catan(h). I have been working on it everyday - I seem unable to put it >> down. > > I know you guys are busy and haven't had time to look this over. But in many > ways this is a good thing, as I am constantly finding problems with the code. :-). I made good progress with clog[fl](). The accuracy problems near 0 are long solved and proved to be solved. Several more special cases were needed to avoid underflow. It now has about as many special cases as hypot(), with underflow avoidance in similar places. I'm having most difficulty with accuracy problems not so near 0 (x*x + y*y near 1/4 or 2). I expected to get accuracies of 1-1.5 ulps but got 2-3 ulps. > One of the hardest things is trying to avoid unwarranted underflows. For > example, when |z| is large (bigger than 1e20) I use the approximation > > catanh(z) = 1/z + cpack(0, copysign(M_PI_2, cimag(z))) > > If the real part of 1/z underflows, then this is a legitimate underflow. But > if the imaginary part of 1/z underflows, then this is not legitimate, because > then this really small number is being added to plus or minus I*PI/2, and it > should just disappear into rounding error. The same techniques as used in hypot() and now clog() internals should work. Here it is for clog(): % /* Avoid underflow when ax is not 1. Also handle zero args. */ % if (ay < DBL_MAX / 0x1p53 && ay * 0x1p53 <= ax) % return (cpack(log(ax), atan2(y, x))); [Underflow can occur when ax is 1, but in no other cases, because x*x+y*y cannot be very close to 1 in other cases. The proof that doubled precision is enough for x*x-1+y*y includes showing by a general argument that the closeness is at most 2**-157.] hypot() does this using a more magic bit test (use difference of exponents). The magic 53 here is not just DBL_MANT_DIG, but is a good enough threshold for when log(ax*ax+ay*ay) = log(ax) + log(1+(ay/ax)*(ay/ax)) rounds to almost the same as log(ax). hypot() doesn't take logs so has a different magic number (60). 1/z sqares the hypotenuse but has a factor of x or y, so it probably needs another different magic number. > I found this out by testing the underflow flag for lots of random inputs. I I didn't test the underflow flag, but noticed underflow from inaccuracies. Gradual underflow (denormals) occurs more often, so it would be worth tsting for it too. > ended up writing my own code to compute the real part of 1/z without > computing the imaginary part. I have a working solution, but I'm sure it > isn't optimal. (I see there is code in the IEEE standard PDF document which > I might copy, or I might copy ideas from the hypot function.) I think the above will work -- just ignore x or y in x*x+y*y when it is tiny relative to y or x, and ignore y in y/(x*x+y*y) when it is tiny relative to x*x+y*y. Then there is the problem of x and y tiny or denormal but with their ration not tiny enough to ignore one of them. 1/z is then large or infinite, but it it should rarely be infinite, and underflow shouldn't occur for just x*x+y*y. This is easily handled as in hypot() (not sure about the method here) and now clog() by scaling: % /* Avoid underflow. */ % if (ax < 0x1p-511) % return (cpack( % log(hypot(x * 0x1p511, y * 0x1p511)) - 511 * ln2_lo - % 511 * ln2_hi, % atan2(y, x))); Bruce From owner-freebsd-numerics@FreeBSD.ORG Sat Aug 4 16:46:17 2012 Return-Path: Delivered-To: freebsd-numerics@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id CC2B3106566C for ; Sat, 4 Aug 2012 16:46:17 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id A3EAD8FC0C for ; Sat, 4 Aug 2012 16:46:17 +0000 (UTC) Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q74GkFUE097238; Sat, 4 Aug 2012 11:46:16 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <501D51D7.1020101@missouri.edu> Date: Sat, 04 Aug 2012 11:46:15 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:14.0) Gecko/20120714 Thunderbird/14.0 MIME-Version: 1.0 To: Bruce Evans References: <5017111E.6060003@missouri.edu> <501C361D.4010807@missouri.edu> <20120804165555.X1231@besplex.bde.org> In-Reply-To: <20120804165555.X1231@besplex.bde.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-numerics@FreeBSD.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 04 Aug 2012 16:46:17 -0000 On 08/04/2012 03:49 AM, Bruce Evans wrote: > On Fri, 3 Aug 2012, Stephen Montgomery-Smith wrote: > I made good progress with clog[fl](). The accuracy problems near 0 > are long solved and proved to be solved. I would be interested to see how you solved this. Would you be willing to post your code somewhere? > Several more special cases > were needed to avoid underflow. It now has about as many special cases > as hypot(), with underflow avoidance in similar places. I'm having > most difficulty with accuracy problems not so near 0 (x*x + y*y near > 1/4 or 2). I expected to get accuracies of 1-1.5 ulps but got 2-3 > ulps. For the inverse trig functions, my accuracy is about 4 ulps, and I am not looking to improve it at all. In particular, the algorithm for casin(h) and cacos(h) is so complicated that I don't see how this can be improved without a lot of work. However if you solve this problem for clog, I can see that it is very likely that your solution will also work for catan(h). If you get this problem licked, I will be interested to see how you do it. >> One of the hardest things is trying to avoid unwarranted underflows. >> For example, when |z| is large (bigger than 1e20) I use the approximation >> >> catanh(z) = 1/z + cpack(0, copysign(M_PI_2, cimag(z))) >> >> If the real part of 1/z underflows, then this is a legitimate >> underflow. But if the imaginary part of 1/z underflows, then this is >> not legitimate, because then this really small number is being added >> to plus or minus I*PI/2, and it should just disappear into rounding >> error. > > The same techniques as used in hypot() and now clog() internals should > work. > Here it is for clog(): > > % /* Avoid underflow when ax is not 1. Also handle zero args. */ > % if (ay < DBL_MAX / 0x1p53 && ay * 0x1p53 <= ax) > % return (cpack(log(ax), atan2(y, x))); > > [Underflow can occur when ax is 1, but in no other cases, because > x*x+y*y cannot be very close to 1 in other cases. The proof that > doubled precision is enough for x*x-1+y*y includes showing by a general > argument that the closeness is at most 2**-157.] > > hypot() does this using a more magic bit test (use difference of > exponents). The magic 53 here is not just DBL_MANT_DIG, but is a good > enough threshold for when log(ax*ax+ay*ay) = log(ax) + > log(1+(ay/ax)*(ay/ax)) rounds to almost the same as log(ax). hypot() > doesn't take logs so has a different magic number (60). 1/z sqares > the hypotenuse but has a factor of x or y, so it probably needs another > different magic number. Again, I would be interested to see the details when you are finished. I am guessing that you have many special cases to handle things when ax or ay are close to DBL_MAX or DBL_MIN. And then in the middle of the night (about when you sent your email), I realized that you can also have inputs like DBL_MIN/0x1p45. These denormals can really mess things up, and trigger false underflows. I see below that you have given thought to this as well. > >> I found this out by testing the underflow flag for lots of random >> inputs. I > > I didn't test the underflow flag, but noticed underflow from inaccuracies. > Gradual underflow (denormals) occurs more often, so it would be worth > tsting for it too. > >> ended up writing my own code to compute the real part of 1/z without >> computing the imaginary part. I have a working solution, but I'm sure >> it isn't optimal. (I see there is code in the IEEE standard PDF >> document which I might copy, or I might copy ideas from the hypot >> function.) > > I think the above will work -- just ignore x or y in x*x+y*y when it > is tiny relative to y or x, and ignore y in y/(x*x+y*y) when it is > tiny relative to x*x+y*y. > > Then there is the problem of x and y tiny or denormal but with their > ration not tiny enough to ignore one of them. 1/z is then large or > infinite, but it it should rarely be infinite, and underflow shouldn't > occur for just x*x+y*y. This is easily handled as in hypot() (not sure > about the method here) and now clog() by scaling: > > % /* Avoid underflow. */ > % if (ax < 0x1p-511) > % return (cpack( > % log(hypot(x * 0x1p511, y * 0x1p511)) - 511 * ln2_lo - > % 511 * ln2_hi, > % atan2(y, x))); Wow - this is really getting down and dirty! Why do you first subtract 511 * ln_2 lo and then 511 * ln2_hi? I am guessing that you are aiming at saving a few ulp by trying to represent M_LN_2 in some kind of 1.5 times double precision? This could quite likely help with my implementation of catan(h). And from a mathematical point of view, catan(h) is somehow more fundamental than casin(h) and cacos(h), so it might be worthwhile pushing for those extra few ulps of accuracy for catan(h). I'll wait until I can see your entire code, then I will peruse it for inspiration. From owner-freebsd-numerics@FreeBSD.ORG Sat Aug 4 16:59:12 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 41B74106566C for ; Sat, 4 Aug 2012 16:59:12 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id E23EA8FC0A for ; Sat, 4 Aug 2012 16:59:11 +0000 (UTC) Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q74GxAPH098101 for ; Sat, 4 Aug 2012 11:59:10 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <501D54DE.4010707@missouri.edu> Date: Sat, 04 Aug 2012 11:59:10 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:14.0) Gecko/20120714 Thunderbird/14.0 MIME-Version: 1.0 To: freebsd-numerics@freebsd.org References: <5017111E.6060003@missouri.edu> <501C361D.4010807@missouri.edu> <20120804165555.X1231@besplex.bde.org> <501D51D7.1020101@missouri.edu> In-Reply-To: <501D51D7.1020101@missouri.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 04 Aug 2012 16:59:12 -0000 On 08/04/2012 11:46 AM, Stephen Montgomery-Smith wrote: > On 08/04/2012 03:49 AM, Bruce Evans wrote: >> On Fri, 3 Aug 2012, Stephen Montgomery-Smith wrote: > >> I made good progress with clog[fl](). The accuracy problems near 0 >> are long solved and proved to be solved. Do you have access to a logl and log1pl? I found hypotl and atan2l. From owner-freebsd-numerics@FreeBSD.ORG Sat Aug 4 20:46:00 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 045921065672 for ; Sat, 4 Aug 2012 20:46:00 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail26.syd.optusnet.com.au (mail26.syd.optusnet.com.au [211.29.133.167]) by mx1.freebsd.org (Postfix) with ESMTP id 2E9B18FC15 for ; Sat, 4 Aug 2012 20:45:58 +0000 (UTC) Received: from c122-106-171-246.carlnfd1.nsw.optusnet.com.au (c122-106-171-246.carlnfd1.nsw.optusnet.com.au [122.106.171.246]) by mail26.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q74KjmCw016206 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 5 Aug 2012 06:45:50 +1000 Date: Sun, 5 Aug 2012 06:45:48 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Stephen Montgomery-Smith In-Reply-To: <501D51D7.1020101@missouri.edu> Message-ID: <20120805030609.R3101@besplex.bde.org> References: <5017111E.6060003@missouri.edu> <501C361D.4010807@missouri.edu> <20120804165555.X1231@besplex.bde.org> <501D51D7.1020101@missouri.edu> MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="0-1856759337-1344113148=:3101" Cc: freebsd-numerics@freebsd.org, Bruce Evans Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 04 Aug 2012 20:46:00 -0000 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --0-1856759337-1344113148=:3101 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed On Sat, 4 Aug 2012, Stephen Montgomery-Smith wrote: > On 08/04/2012 03:49 AM, Bruce Evans wrote: >> On Fri, 3 Aug 2012, Stephen Montgomery-Smith wrote: > >> I made good progress with clog[fl](). The accuracy problems near 0 >> are long solved and proved to be solved. > > I would be interested to see how you solved this. Would you be willing to > post your code somewhere? I just subtracted 1 in the right place: x*x - 1 + y*y needs doubled precision (and |x| >= |y|) (also need x*x >= 0.5) (x*x - 0.5) + (y*y - 0.5) adjustment for above when x*x and y*y are between 0.25 and 0.5 x*x + y*y - 1 needs tripled precision (and |x| != 1) log(x*x + y*y) log() would subtract 1 at the same point as the previous expression, so would need a tripled-precision log. To see that doubled precision is enough: - no problem if x*x + y*y is far from 1, so assume it is close to 1 - doubled precision makes x*x and y*y exact (It must be full doubled precision. I first thought that there was a problem splitting types with odd MANT_DIG. E.g., MANT_DIG = 53 for doubles. This must be split as 26 high bits and 27 low bits, but then squaring the 27 low bits in double precision at first seems to need 54 bits, and we only have 53. This problem is avoided by rounding the high term to nearest. Then the low term only needs 26 bits.) - sort so that |x| >= |y|, and take absolute values so that I don't need to write '|*|' again - if x*x >= 0.5, then x*x - 1 is exact in doubled precision (since x*x must be <= 2 by the assumption that x*x + y*y is near 1. - if x*x < 0.5, then it must be >= 0.25, and y*y must be in the same range, with both slightly less than 0.5, to get x*x + y*y. So x*x - 0.5 and y*y - 0.5 are exact in doubled precision. - we have created 2 exact terms that add up to x*x + y*y - 1 in infinite precision. In doubled precision, the sum is usually not exact. But in the problem case where there is a large cancelation, the sum is exact; other cases are easier: - when the exponents of the 2 terms differ by more than 1, there is no cancelation and we get a doubled precision result with all bits correct (the last one correctly rounded). This is more than enough. - when the exponents of the 2 terms differ by at most 1, first line up the bits by shifting as necessary. Clearly at most 2 more bits are required to hold an exact result. But cancelation of just 2 leading bits opens space to hold the exact result in doubled precision. Otherwise, we are in the previous case, with 1-2 bits less than doubled precision. This is still more than enough. So we have x*x + y*y - 1 in doubled precision except possibly for errors other than the rounding error in the lowest 2 bits. To see that the difference is not very small: - if x > 1, then x >= 1 + DBL_EPSILON, and x*x is already not very close to 1 - if x < sqrt(2)/2, similarly (I didn't check this properly) - if sqrt(2)/2 < x < 1, write x = 1 - u where u is positive and not large. Then x*x = 1 - 2*u + u*u. Any y such that y*y when added to this gets close to 1 must be near sqrt(2*u). Since u can't be very small, y can't be very small. The worst cast is when u = XXX_EPSILON/2. In double precision, u = 2**-53. Then 2*u = 2**-52 and y ~= 2**-26. When we line up bits to add the doubled precision terms the don't extend very far to the right. They fit in less than 3 53-bit bit strings, with the leading bit representing 1 and the rightmost bit representing 2**-158. (x*x and -1 fit in 2 53-bit bit-strings. Since y ~= 2**-26, y*y's exponent is 52 or 53 less than 1's exponent. Shifting to line up the bits requires just 1 more 53-bit bit string to the right of the other 2.) If everything canceled except for the rightmost bit, then the value would be 2**-158. This is the smallest possible value, since the above worst case has u*u's bits as far as possible to the right (after shifting), and the final result gets its lowest bits from u*u. The smallest value observed in double precision was 2**-156. This happens when u = 2**-52. (Due to numerical accidents, when u = 2**-53, y*y can't kill enough bits of -2*u + u*u to get close.) Limiting the closeness is important for showing that underflow can't happen either in the calculations or on conversion of the doubled precision result, but underflow tends to be avoided naturally. It is important that y*y doesn't underflow (including when y is split into hi+lo, lo*lo must not underflow). So tiny y must not be handled by this case. The "lining up bits" argument shows that tiny y can't get close to 1. > For the inverse trig functions, my accuracy is about 4 ulps, and I am not > looking to improve it at all. In particular, the algorithm for casin(h) and > cacos(h) is so complicated that I don't see how this can be improved without > a lot of work. > > However if you solve this problem for clog, I can see that it is very likely > that your solution will also work for catan(h). If you get this problem > licked, I will be interested to see how you do it. I seem to have finally got it below 2.0 (mostly below 1.5) for clog(). clog() is simpler, but it was still hard to find all the exceptional behaviours, so that the worse cases could even be noticed. >> Then there is the problem of x and y tiny or denormal but with their >> ration not tiny enough to ignore one of them. 1/z is then large or >> infinite, but it it should rarely be infinite, and underflow shouldn't >> occur for just x*x+y*y. This is easily handled as in hypot() (not sure >> about the method here) and now clog() by scaling: >> >> % /* Avoid underflow. */ >> % if (ax < 0x1p-511) >> % return (cpack( >> % log(hypot(x * 0x1p511, y * 0x1p511)) - 511 * ln2_lo - >> % 511 * ln2_hi, >> % atan2(y, x))); > > Wow - this is really getting down and dirty! Why do you first subtract 511 * > ln_2 lo and then 511 * ln2_hi? I am guessing that you are aiming at saving a > few ulp by trying to represent M_LN_2 in some kind of 1.5 times double > precision? Representing log(2) and hi+lo is standard technique, but is needed in contexts like this mainly for multiplication by a variable k. Here I should probably use a double constant ln2_times_511. This constant shouldn't be calculated as 511*ln2 since this might lose accuracy (just half an ulp?) unnecessarily. The terms are supposed to be added from low to high for maximal accuracy, but I didn't check that the logh(hypot()) term is actually lowest. I tried replacing this by a sloppy 511*ln2 and it lost more than half an ulp, so maybe the addition is doing something good too. All the -511's are wrong, in different ways. I'm too tired to get this right now, but had changed them to -468 (off by 10 error adding 53 to them). This seemed to work, but has several more off-by-53 errors. The -511's for the boost are wrong magic unrelated to the -511 for the threshold. Full code attached. % ... % double complex % clog(double complex z) % { % ... % /* Avoid spurious underflow, and reduce inaccuracies when ax is 1. */ % if (ax == 1) { % if (ay < 0x1p-511) % return (cpack((ay * 0.5) * ay, atan2(y, x))); % return (cpack(log1p(ay * ay) * 0.5, atan2(y, x))); % } As before, but moved. % % /* Avoid underflow when ax is not small. Also handle zero args. */ % if (ay < DBL_MAX / 0x1p53 && ay * 0x1p53 <= ax) % return (cpack(log(ax), atan2(y, x))); This is the case of large differences in exponents in hypot(). % % /* Avoid overflow. Also handle infinite args. */ % if (ax > 0x1p1023) % return (cpack(log(hypot(x * 0.5, y * 0.5)) + ln2_lo + ln2_hi, % atan2(y, x))); Use ln2* a lot. Avoid divisions and M_E. % if (ax > 0x1p511) % return (cpack(log(hypot(x, y)), atan2(y, x))); Subcase for overflow. The 0x1p1023 threshold is to avoid hypot() overflowing in the general expression. This one is to avoid squares overflowing, now that we avoid hypot() in the usual case and do our own squaring. % % /* Avoid underflow when ax is small. */ % if (ax < 0x1p-468) % return (cpack(log(hypot(x * 0x1p468, y * 0x1p468)) - % 468 * ln2_lo - 468 * ln2_hi, atan2(y, x))); Buggy -468. See above. I tested mainly in float precision where the corresponding constants are less buggy. % % /* Take extra care when the real part of the result is near 0. */ % EXTRACT_WORDS(hx, lx, ax); % bits = ((uint64_t)hx << 32) | lx; % bits += 0x04000000; % hx = bits >> 32; % lx = bits & 0xf8000000; % INSERT_WORDS(xh, hx, lx); % xl = ax - xh; % EXTRACT_WORDS(hy, ly, ay); % bits = ((uint64_t)hy << 32) | ly; % bits += 0x04000000; % hy = bits >> 32; % ly = bits & 0xf8000000; % INSERT_WORDS(yh, hy, ly); % yl = ay - yh; The hi+lo decomposition is painful in double precision. We fall through to here and use this rather inefficient decomposition in most cases, to reduce special cases above and make it easier to be more accurate (just need log*() with more accuracy. In testing the float case, I replaced log*f() by log*l() and got perfect rounding from the extra- precision code. This helped verify the algorithm.) % % xh2 = xh * xh; % yh2 = yh * yh; % % h = ax * ax + ay * ay; % using_log1p = 0; % if (h < 1 - 0x1p-5 || h > 1 + 0x1p-4) { Not very close to 1. % t2 = yh2; % t4 = xl * xl + yl * yl + 2 * (xh * xl + yh * yl); % norm(t2, t4); % spadd(t2, t4, xh2); Mostly, full multi-precision code isn't useful, so this just adds up terms from low to high, with extra precision for the final step only (this extra precision is probably only useful if we subtract 1 later). This could be optimized a bit by subtracting 1 or 0.5 and 0.5 as in the more delicate case, but 1 would still need to be subtracted later in some cases, so I just fall through to that. % } else { Very close to 1. % if (xh2 >= 0.5 && xh2 <= 2) { % using_log1p = 1; % xh2 -= 1; % } else if (xh2 >= 0.25 && xh2 < 0.5 && yh2 >= 0.25) { % using_log1p = 1; % xh2 -= 0.5; % yh2 -= 0.5; % } % t2 = xl * xl; % t1 = 2 * xh * xl; Exact so far. % norm(t1, t2); % spadd(t1, t2, xh2); % norm(t1, t2); % t3 = t1; % t4 = t2; x*x [-1 or -0.5] is in (t3, t4), exactly I hope. % % t1 = 2 * yh * yl; % t2 = yl * yl; % norm(t1, t2); % spadd(t1, t2, yh2); % norm(t1, t2); y*y [-0.5] is in (t1, t2), exactly I hope. % % norm(t2, t4); % norm(t1, t3); % % spadd(t2, t4, t3); % spadd(t2, t4, t1); Add up terms. Not exact in all cases, and not a full Dekkers algorithm AFAIK. % } % if (!using_log1p && h > 0.5 && h < 3) { % using_log1p = 1; % spadd(t2, t4, -1); Sloppier addition of -1 for cases not near 1. Only guaranteed not to lose accuracy when the infinite precision h is between 0.5 and 2, but it helps up to about 3. % } % if (using_log1p) % return (cpack(log1p(t2 + t4) * 0.5, atan2(y, x))); % return (cpack(log(t2 + t4) * 0.5, atan2(y, x))); % } Subtracting 1 is no good for h far outside of [0.5, 2], so we can't use log1p() in all cases. t2+t4 for the log() case is just x*x + y*y evaluated as accurately as possible (saves an ulp or 2 in the final result. % float complex % clogf(float complex z) % { % ... % /* Avoid underflow when ax is not small. Also handle zero args. */ % if (ay < FLT_MAX / 0x1p24F && ay * 0x1p24F <= ax) % return (cpackf(logf(ax), atan2f(y, x))); Everything is lexically identical with log() if possible. Magic numbers are always different. I should have converted FOO_MAX to magic. FP constants in float functions must have float type to avoid defeating half the point of having float functions by evaluating some expressions in non-float precision. % /* Avoid underflow when ax is small. */ % if (ay < 0x1p-39F) % return (cpackf(logf(hypotf(x * 0x1p39F, y * 0x1p39F)) - % 39 * ln2f_lo - 39 * ln2f_hi, atan2f(y, x))); -39 = -63 + 24 has the same bugs as -458 = -511 + 53 in double precision. % % /* Take extra care when the real part of the result is near 0. */ % #if 1 I develop in float precision for easier testing, and duplicate the algorithm under this part of the ifdef. % GET_FLOAT_WORD(hx, ax); % SET_FLOAT_WORD(xh, hx & 0xfffff000); % xl = ax - xh; % GET_FLOAT_WORD(hy, ay); % SET_FLOAT_WORD(yh, hy & 0xfffff000); % yl = ay - yh; The hi+lo decomposition is less painful in float precision. % ... % #else % dh = (double_t)ax * ax + (double_t)ay * ay; % if (dh > 0.5 && dh < 3) { % dh = (double_t)ax * ax - 1 + (double_t)ay * ay; % return (cpackf(log1pf(dh) * 0.5F, atan2f(y, x))); % } % return (cpackf(logf(dh) * 0.5F, atan2f(y, x))); More than doubled precision is avaiable in hardware on most supported arches and is easy to use. Your version had the -1 in the wrong place. double_t is used for minor efficiency reasons. Elswhere, clogf() uses uses float_t and clog() uses double_t to avoid compiler bugfeatures (norm() and spadd() don't work without them) and for efficiency. I made norm() and spadd() type-generic, and removed the STRICT_ASSIGN()' in them that were to work around the compiler bugs. It is now the responsibility of callers to pass them float_t or double_t. % #endif % } % % long double complex % clogl(long double complex z) % { % ... % /* Avoid spurious underflow, and reduce inaccuracies when ax is 1. */ % if (ax == 1) { % if (ay < 0x1p-8191L) % return (cpackl((ay * 0.5) * ay, atan2l(y, x))); % return (cpackl(log1pl(ay * ay) * 0.5, atan2l(y, x))); % } Magic numbers for ld80 only. I can translate this to ld128 in about 10 minutes while I rememeber what the magic numbers are for. Ifdefing them would take longer and give a mess. % /* Take extra care when the real part of the result is near 0. */ % xh = ax + 0x1p32 - 0x1p32; % xl = ax - xh; % yh = ay + 0x1p32 - 0x1p32; % yl = ay - yh; The hi+lo decomposition is painless in long double precision, because ilong double expressions are not evaluated in extra precision. % if (!using_log1p && h > 0.5F && h < 3) { % using_log1p = 1; % spadd(t2, t4, -1); % } Forgot to translate away the F suffix. I avoid F and L suffixes wherever possible, so that no translation or diffs for it are necessary wherever possible, but there are squllions of 0.5F's. % if (using_log1p) % return (cpackl(log1pl(t2 + t4) * 0.5, atan2l(y, x))); % return (cpackl(logl(t2 + t4) * 0.5, atan2l(y, x))); % } Bruce --0-1856759337-1344113148=:3101 Content-Type: TEXT/PLAIN; charset=US-ASCII; name="cplex.c" Content-Transfer-Encoding: BASE64 Content-ID: <20120805064548.W3101@besplex.bde.org> Content-Description: Content-Disposition: attachment; filename="cplex.c" I2luY2x1ZGUgPGNvbXBsZXguaD4NCiNpbmNsdWRlIDxmbG9hdC5oPg0KDQoj aW5jbHVkZSAiZnBtYXRoLmgiDQojaW5jbHVkZSAibWF0aC5oIg0KI2luY2x1 ZGUgIm1hdGhfcHJpdmF0ZS5oIg0KDQojdW5kZWYgbm9ybQ0KI2RlZmluZQlu b3JtKGEsIGIpIGRvIHsJCVwNCglfX3R5cGVvZihhKSBfX3c7CVwNCgkJCQlc DQoJaWYgKGZhYnMoYSkgPCBmYWJzKGIpKSB7IFwNCgkJX193ID0gKGEpOwlc DQoJCShhKSA9IChiKTsJXA0KCQkoYikgPSBfX3c7CVwNCgl9CQkJXA0KCV9f dyA9IChhKSArIChiKTsJXA0KCShiKSA9ICgoYSkgLSBfX3cpICsgKGIpOyBc DQoJKGEpID0gX193OwkJXA0KfSB3aGlsZSAoMCkNCg0KI3VuZGVmIHNwYWRk DQojZGVmaW5lCXNwYWRkKGEsIGIsIGMpIGRvIHsJXA0KCV9fdHlwZW9mKGEp IF9fdG1wOwlcDQoJCQkJXA0KCV9fdG1wID0gKGMpOwkJXA0KCW5vcm0oX190 bXAsIChhKSk7CVwNCgkoYikgKz0gKGEpOwkJXA0KCShhKSA9IF9fdG1wOwkJ XA0KfSB3aGlsZSAoMCkNCg0Kc3RhdGljIGNvbnN0IGRvdWJsZQ0KbG4yX2hp ID0gNi45MzE0NzE4MDU1ODI5ODcxZS0xLAkJLyogIDB4MTYyZTQyZmVmYTAw MDAuMHAtNTMgKi8NCmxuMl9sbyA9IDEuNjQ2NTk0OTU4Mjg5NzA4MmUtMTI7 CS8qICAweDFjZjc5YWJjOWUzYjNhLjBwLTkyICovDQoNCmRvdWJsZSBjb21w bGV4DQpjbG9nKGRvdWJsZSBjb21wbGV4IHopDQp7DQoJZG91YmxlX3QgaCwg dCwgdDEsIHQyLCB0MywgdDQsIHgsIHk7DQoJZG91YmxlX3QgYXgsIGF5LCB4 aCwgeGgyLCB4bCwgeWgsIHloMiwgeWw7DQoJdWludDY0X3QgYml0czsNCgl1 aW50MzJfdCBoeCwgaHksIGx4LCBseTsNCglpbnQgdXNpbmdfbG9nMXA7DQoN Cgl4ID0gY3JlYWwoeik7DQoJeSA9IGNpbWFnKHopOw0KDQoJLyogSGFuZGxl IE5hTnMgdXNpbmcgdGhlIGdlbmVyYWwgZm9ybXVsYSB0byBtaXggdGhlbSBy aWdodC4gKi8NCglpZiAoeCAhPSB4IHx8IHkgIT0geSkNCgkJcmV0dXJuIChj cGFjayhsb2coaHlwb3QoeCwgeSkpLCBhdGFuMih5LCB4KSkpOw0KDQoJYXgg PSBmYWJzKHgpOw0KCWF5ID0gZmFicyh5KTsNCglpZiAoYXggPCBheSkgew0K CQl0ID0gYXg7DQoJCWF4ID0gYXk7DQoJCWF5ID0gdDsNCgl9DQoNCgkvKiBB dm9pZCBzcHVyaW91cyB1bmRlcmZsb3csIGFuZCByZWR1Y2UgaW5hY2N1cmFj aWVzIHdoZW4gYXggaXMgMS4gKi8NCglpZiAoYXggPT0gMSkgew0KCQlpZiAo YXkgPCAweDFwLTUxMSkNCgkJCXJldHVybiAoY3BhY2soKGF5ICogMC41KSAq IGF5LCBhdGFuMih5LCB4KSkpOw0KCQlyZXR1cm4gKGNwYWNrKGxvZzFwKGF5 ICogYXkpICogMC41LCBhdGFuMih5LCB4KSkpOw0KCX0NCg0KCS8qIEF2b2lk IHVuZGVyZmxvdyB3aGVuIGF4IGlzIG5vdCBzbWFsbC4gIEFsc28gaGFuZGxl IHplcm8gYXJncy4gKi8NCglpZiAoYXkgPCBEQkxfTUFYIC8gMHgxcDUzICYm IGF5ICogMHgxcDUzIDw9IGF4KQ0KCQlyZXR1cm4gKGNwYWNrKGxvZyhheCks IGF0YW4yKHksIHgpKSk7DQoNCgkvKiBBdm9pZCBvdmVyZmxvdy4gIEFsc28g aGFuZGxlIGluZmluaXRlIGFyZ3MuICovDQoJaWYgKGF4ID4gMHgxcDEwMjMp DQoJCXJldHVybiAoY3BhY2sobG9nKGh5cG90KHggKiAwLjUsIHkgKiAwLjUp KSArIGxuMl9sbyArIGxuMl9oaSwNCgkJICAgIGF0YW4yKHksIHgpKSk7DQoJ aWYgKGF4ID4gMHgxcDUxMSkNCgkJcmV0dXJuIChjcGFjayhsb2coaHlwb3Qo eCwgeSkpLCBhdGFuMih5LCB4KSkpOw0KDQoJLyogQXZvaWQgdW5kZXJmbG93 IHdoZW4gYXggaXMgc21hbGwuICovDQoJaWYgKGF4IDwgMHgxcC00NjgpDQoJ CXJldHVybiAoY3BhY2sobG9nKGh5cG90KHggKiAweDFwNDY4LCB5ICogMHgx cDQ2OCkpIC0NCgkJICAgIDQ2OCAqIGxuMl9sbyAtIDQ2OCAqIGxuMl9oaSwg YXRhbjIoeSwgeCkpKTsNCg0KCS8qIFRha2UgZXh0cmEgY2FyZSB3aGVuIHRo ZSByZWFsIHBhcnQgb2YgdGhlIHJlc3VsdCBpcyBuZWFyIDAuICovDQoJRVhU UkFDVF9XT1JEUyhoeCwgbHgsIGF4KTsNCgliaXRzID0gKCh1aW50NjRfdClo eCA8PCAzMikgfCBseDsNCgliaXRzICs9IDB4MDQwMDAwMDA7DQoJaHggPSBi aXRzID4+IDMyOw0KCWx4ID0gYml0cyAmIDB4ZjgwMDAwMDA7DQoJSU5TRVJU X1dPUkRTKHhoLCBoeCwgbHgpOw0KCXhsID0gYXggLSB4aDsNCglFWFRSQUNU X1dPUkRTKGh5LCBseSwgYXkpOw0KCWJpdHMgPSAoKHVpbnQ2NF90KWh5IDw8 IDMyKSB8IGx5Ow0KCWJpdHMgKz0gMHgwNDAwMDAwMDsNCgloeSA9IGJpdHMg Pj4gMzI7DQoJbHkgPSBiaXRzICYgMHhmODAwMDAwMDsNCglJTlNFUlRfV09S RFMoeWgsIGh5LCBseSk7DQoJeWwgPSBheSAtIHloOw0KDQoJeGgyID0geGgg KiB4aDsNCgl5aDIgPSB5aCAqIHloOw0KDQoJaCA9IGF4ICogYXggKyBheSAq IGF5Ow0KCXVzaW5nX2xvZzFwID0gMDsNCglpZiAoaCA8IDEgLSAweDFwLTUg fHwgaCA+IDEgKyAweDFwLTQpIHsNCgkJdDIgPSB5aDI7DQoJCXQ0ID0geGwg KiB4bCArIHlsICogeWwgKyAyICogKHhoICogeGwgKyB5aCAqIHlsKTsNCgkJ bm9ybSh0MiwgdDQpOw0KCQlzcGFkZCh0MiwgdDQsIHhoMik7DQoJfSBlbHNl IHsNCgkJaWYgKHhoMiA+PSAwLjUgJiYgeGgyIDw9IDIpIHsNCgkJCXVzaW5n X2xvZzFwID0gMTsNCgkJCXhoMiAtPSAxOw0KCQl9IGVsc2UgaWYgKHhoMiA+ PSAwLjI1ICYmIHhoMiA8IDAuNSAmJiB5aDIgPj0gMC4yNSkgew0KCQkJdXNp bmdfbG9nMXAgPSAxOw0KCQkJeGgyIC09IDAuNTsNCgkJCXloMiAtPSAwLjU7 DQoJCX0NCgkJdDIgPSB4bCAqIHhsOw0KCQl0MSA9IDIgKiB4aCAqIHhsOw0K CQlub3JtKHQxLCB0Mik7DQoJCXNwYWRkKHQxLCB0MiwgeGgyKTsNCgkJbm9y bSh0MSwgdDIpOw0KCQl0MyA9IHQxOw0KCQl0NCA9IHQyOw0KDQoJCXQxID0g MiAqIHloICogeWw7DQoJCXQyID0geWwgKiB5bDsNCgkJbm9ybSh0MSwgdDIp Ow0KCQlzcGFkZCh0MSwgdDIsIHloMik7DQoJCW5vcm0odDEsIHQyKTsNCg0K CQlub3JtKHQyLCB0NCk7DQoJCW5vcm0odDEsIHQzKTsNCg0KCQlzcGFkZCh0 MiwgdDQsIHQzKTsNCgkJc3BhZGQodDIsIHQ0LCB0MSk7DQoJfQ0KCWlmICgh dXNpbmdfbG9nMXAgJiYgaCA+IDAuNSAmJiBoIDwgMykgew0KCQl1c2luZ19s b2cxcCA9IDE7DQoJCXNwYWRkKHQyLCB0NCwgLTEpOw0KCX0NCglpZiAodXNp bmdfbG9nMXApDQoJCXJldHVybiAoY3BhY2sobG9nMXAodDIgKyB0NCkgKiAw LjUsIGF0YW4yKHksIHgpKSk7DQoJcmV0dXJuIChjcGFjayhsb2codDIgKyB0 NCkgKiAwLjUsIGF0YW4yKHksIHgpKSk7DQp9DQoNCnN0YXRpYyBjb25zdCBm bG9hdA0KbG4yZl9oaSA9ICA2LjkzMTQ1NzUxOTVlLTEsCQkvKiAgMHhiMTcy MDAuMHAtMjQgKi8NCmxuMmZfbG8gPSAgMS40Mjg2MDY3NjUzZS02OwkJLyog IDB4YmZiZThlLjBwLTQzICovDQoNCmZsb2F0IGNvbXBsZXgNCmNsb2dmKGZs b2F0IGNvbXBsZXggeikNCnsNCglkb3VibGVfdCBkaDsNCglmbG9hdF90IGgs IHQsIHQxLCB0MiwgdDMsIHQ0LCB4LCB5Ow0KCWZsb2F0X3QgYXgsIGF5LCB4 aCwgeGgyLCB4bCwgeWgsIHloMiwgeWw7DQoJdWludDMyX3QgaHgsIGh5Ow0K CWludCB1c2luZ19sb2cxcDsNCg0KCXggPSBjcmVhbGYoeik7DQoJeSA9IGNp bWFnZih6KTsNCg0KCS8qIEhhbmRsZSBOYU5zIHVzaW5nIHRoZSBnZW5lcmFs IGZvcm11bGEgdG8gbWl4IHRoZW0gcmlnaHQuICovDQoJaWYgKHggIT0geCB8 fCB5ICE9IHkpDQoJCXJldHVybiAoY3BhY2tmKGxvZ2YoaHlwb3RmKHgsIHkp KSwgYXRhbjJmKHksIHgpKSk7DQoNCglheCA9IGZhYnNmKHgpOw0KCWF5ID0g ZmFic2YoeSk7DQoJaWYgKGF4IDwgYXkpIHsNCgkJdCA9IGF4Ow0KCQlheCA9 IGF5Ow0KCQlheSA9IHQ7DQoJfQ0KDQoJLyogQXZvaWQgc3B1cmlvdXMgdW5k ZXJmbG93LCBhbmQgcmVkdWNlIGluYWNjdXJhY2llcyB3aGVuIGF4IGlzIDEu ICovDQoJaWYgKGF4ID09IDEpIHsNCgkJaWYgKGF5IDwgMHgxcC02M0YpDQoJ CQlyZXR1cm4gKGNwYWNrZigoYXkgKiAwLjVGKSAqIGF5LCBhdGFuMmYoeSwg eCkpKTsNCgkJcmV0dXJuIChjcGFja2YobG9nMXBmKGF5ICogYXkpICogMC41 RiwgYXRhbjJmKHksIHgpKSk7DQoJfQ0KDQoJLyogQXZvaWQgdW5kZXJmbG93 IHdoZW4gYXggaXMgbm90IHNtYWxsLiAgQWxzbyBoYW5kbGUgemVybyBhcmdz LiAqLw0KCWlmIChheSA8IEZMVF9NQVggLyAweDFwMjRGICYmIGF5ICogMHgx cDI0RiA8PSBheCkNCgkJcmV0dXJuIChjcGFja2YobG9nZihheCksIGF0YW4y Zih5LCB4KSkpOw0KDQoJLyogQXZvaWQgb3ZlcmZsb3cuICBBbHNvIGhhbmRs ZSBpbmZpbml0ZSBhcmdzLiAqLw0KCWlmIChheCA+IDB4MXAxMjdGKQ0KCQly ZXR1cm4gKGNwYWNrZigNCgkJICAgIGxvZ2YoaHlwb3RmKHggKiAwLjVGLCB5 ICogMC41RikpICsgbG4yZl9sbyArIGxuMmZfaGksDQoJCSAgICBhdGFuMmYo eSwgeCkpKTsNCglpZiAoYXggPiAweDFwNjNGKQ0KCQlyZXR1cm4gKGNwYWNr Zihsb2dmKGh5cG90Zih4LCB5KSksIGF0YW4yZih5LCB4KSkpOw0KDQoJLyog QXZvaWQgdW5kZXJmbG93IHdoZW4gYXggaXMgc21hbGwuICovDQoJaWYgKGF5 IDwgMHgxcC0zOUYpDQoJCXJldHVybiAoY3BhY2tmKGxvZ2YoaHlwb3RmKHgg KiAweDFwMzlGLCB5ICogMHgxcDM5RikpIC0NCgkJICAgIDM5ICogbG4yZl9s byAtIDM5ICogbG4yZl9oaSwgYXRhbjJmKHksIHgpKSk7DQoNCgkvKiBUYWtl IGV4dHJhIGNhcmUgd2hlbiB0aGUgcmVhbCBwYXJ0IG9mIHRoZSByZXN1bHQg aXMgbmVhciAwLiAqLw0KI2lmIDENCglHRVRfRkxPQVRfV09SRChoeCwgYXgp Ow0KCVNFVF9GTE9BVF9XT1JEKHhoLCBoeCAmIDB4ZmZmZmYwMDApOw0KCXhs ID0gYXggLSB4aDsNCglHRVRfRkxPQVRfV09SRChoeSwgYXkpOw0KCVNFVF9G TE9BVF9XT1JEKHloLCBoeSAmIDB4ZmZmZmYwMDApOw0KCXlsID0gYXkgLSB5 aDsNCg0KCXhoMiA9IHhoICogeGg7DQoJeWgyID0geWggKiB5aDsNCg0KCWgg PSBheCAqIGF4ICsgYXkgKiBheTsNCgl1c2luZ19sb2cxcCA9IDA7DQoJaWYg KGggPCAoZmxvYXQpKDEgLSAweDFwLTUpIHx8IGggPiAoZmxvYXQpKDEgKyAw eDFwLTQpKSB7DQoJCXQyID0geWgyOw0KCQl0NCA9IHhsICogeGwgKyB5bCAq IHlsICsgMiAqICh4aCAqIHhsICsgeWggKiB5bCk7DQoJCW5vcm0odDIsIHQ0 KTsNCgkJc3BhZGQodDIsIHQ0LCB4aDIpOw0KCX0gZWxzZSB7DQoJCWlmICh4 aDIgPj0gMC41RiAmJiB4aDIgPD0gMikgew0KCQkJdXNpbmdfbG9nMXAgPSAx Ow0KCQkJeGgyIC09IDE7DQoJCX0gZWxzZSBpZiAoeGgyID49IDAuMjVGICYm IHhoMiA8IDAuNUYgJiYgeWgyID49IDAuMjVGKSB7DQoJCQl1c2luZ19sb2cx cCA9IDE7DQoJCQl4aDIgLT0gMC41RjsNCgkJCXloMiAtPSAwLjVGOw0KCQl9 DQoJCXQyID0geGwgKiB4bDsNCgkJdDEgPSAyICogeGggKiB4bDsNCgkJbm9y bSh0MSwgdDIpOw0KCQlzcGFkZCh0MSwgdDIsIHhoMik7DQoJCW5vcm0odDEs IHQyKTsNCgkJdDMgPSB0MTsNCgkJdDQgPSB0MjsNCg0KCQl0MSA9IDIgKiB5 aCAqIHlsOw0KCQl0MiA9IHlsICogeWw7DQoJCW5vcm0odDEsIHQyKTsNCgkJ c3BhZGQodDEsIHQyLCB5aDIpOw0KCQlub3JtKHQxLCB0Mik7DQoNCgkJbm9y bSh0MiwgdDQpOw0KCQlub3JtKHQxLCB0Myk7DQoNCgkJc3BhZGQodDIsIHQ0 LCB0Myk7DQoJCXNwYWRkKHQyLCB0NCwgdDEpOw0KCX0NCglpZiAoIXVzaW5n X2xvZzFwICYmIGggPiAwLjVGICYmIGggPCAzKSB7DQoJCXVzaW5nX2xvZzFw ID0gMTsNCgkJc3BhZGQodDIsIHQ0LCAtMSk7DQoJfQ0KCWlmICh1c2luZ19s b2cxcCkNCgkJcmV0dXJuIChjcGFja2YobG9nMXBmKHQyICsgdDQpICogMC41 RiwgYXRhbjJmKHksIHgpKSk7DQoJcmV0dXJuIChjcGFja2YobG9nZih0MiAr IHQ0KSAqIDAuNUYsIGF0YW4yZih5LCB4KSkpOw0KI2Vsc2UNCglkaCA9IChk b3VibGVfdClheCAqIGF4ICsgKGRvdWJsZV90KWF5ICogYXk7DQoJaWYgKGRo ID4gMC41ICYmIGRoIDwgMykgew0KCQlkaCA9IChkb3VibGVfdClheCAqIGF4 IC0gMSArIChkb3VibGVfdClheSAqIGF5Ow0KCQlyZXR1cm4gKGNwYWNrZihs b2cxcGYoZGgpICogMC41RiwgYXRhbjJmKHksIHgpKSk7DQoJfQ0KCXJldHVy biAoY3BhY2tmKGxvZ2YoZGgpICogMC41RiwgYXRhbjJmKHksIHgpKSk7DQoj ZW5kaWYNCn0NCg0KbG9uZyBkb3VibGUgY29tcGxleA0KY2xvZ2wobG9uZyBk b3VibGUgY29tcGxleCB6KQ0Kew0KCWxvbmcgZG91YmxlIGgsIHQsIHQxLCB0 MiwgdDMsIHQ0LCB4LCB5Ow0KCWxvbmcgZG91YmxlIGF4LCBheSwgeGgsIHho MiwgeGwsIHloLCB5aDIsIHlsOw0KCWludCB1c2luZ19sb2cxcDsNCg0KCXgg PSBjcmVhbGwoeik7DQoJeSA9IGNpbWFnbCh6KTsNCg0KCS8qIEhhbmRsZSBO YU5zIHVzaW5nIHRoZSBnZW5lcmFsIGZvcm11bGEgdG8gbWl4IHRoZW0gcmln aHQuICovDQoJaWYgKHggIT0geCB8fCB5ICE9IHkpDQoJCXJldHVybiAoY3Bh Y2tsKGxvZ2woaHlwb3RsKHgsIHkpKSwgYXRhbjJsKHksIHgpKSk7DQoNCglh eCA9IGZhYnNsKHgpOw0KCWF5ID0gZmFic2woeSk7DQoJaWYgKGF4IDwgYXkp IHsNCgkJdCA9IGF4Ow0KCQlheCA9IGF5Ow0KCQlheSA9IHQ7DQoJfQ0KDQoJ LyogQXZvaWQgc3B1cmlvdXMgdW5kZXJmbG93LCBhbmQgcmVkdWNlIGluYWNj dXJhY2llcyB3aGVuIGF4IGlzIDEuICovDQoJaWYgKGF4ID09IDEpIHsNCgkJ aWYgKGF5IDwgMHgxcC04MTkxTCkNCgkJCXJldHVybiAoY3BhY2tsKChheSAq IDAuNSkgKiBheSwgYXRhbjJsKHksIHgpKSk7DQoJCXJldHVybiAoY3BhY2ts KGxvZzFwbChheSAqIGF5KSAqIDAuNSwgYXRhbjJsKHksIHgpKSk7DQoJfQ0K DQoJLyogQXZvaWQgdW5kZXJmbG93IHdoZW4gYXggaXMgbm90IHNtYWxsLiAg QWxzbyBoYW5kbGUgemVybyBhcmdzLiAqLw0KCWlmIChheSA8IExEQkxfTUFY IC8gMHgxcDY0ICYmIGF5ICogMHgxcDY0IDw9IGF4KQ0KCQlyZXR1cm4gKGNw YWNrbChsb2dsKGF4KSwgYXRhbjJsKHksIHgpKSk7DQoNCgkvKiBBdm9pZCBv dmVyZmxvdy4gIEFsc28gaGFuZGxlIGluZmluaXRlIGFyZ3MuICovDQoJaWYg KGF4ID4gMHgxcDE2MzgzTCkNCgkJcmV0dXJuIChjcGFja2wobG9nbChoeXBv dGwoeCAqIDAuNSwgeSAqIDAuNSkpICsgbG4yX2xvICsgbG4yX2hpLA0KCQkg ICAgYXRhbjJsKHksIHgpKSk7DQoJaWYgKGF4ID4gMHgxcDgxOTEpDQoJCXJl dHVybiAoY3BhY2tsKGxvZ2woaHlwb3RsKHgsIHkpKSwgYXRhbjJsKHksIHgp KSk7DQoNCgkvKiBBdm9pZCB1bmRlcmZsb3cgd2hlbiBheCBpcyBzbWFsbC4g Ki8NCglpZiAoYXggPCAweDFwLTgxMjdMKQ0KCQlyZXR1cm4gKGNwYWNrbChs b2dsKGh5cG90bCh4ICogMHgxcDgxMjdMLCB5ICogMHgxcDgxMjdMKSkgLQ0K CQkgICAgODEyNyAqIGxuMl9sbyAtIDgxMjcgKiBsbjJfaGksIGF0YW4ybCh5 LCB4KSkpOw0KDQoJLyogVGFrZSBleHRyYSBjYXJlIHdoZW4gdGhlIHJlYWwg cGFydCBvZiB0aGUgcmVzdWx0IGlzIG5lYXIgMC4gKi8NCgl4aCA9IGF4ICsg MHgxcDMyIC0gMHgxcDMyOw0KCXhsID0gYXggLSB4aDsNCgl5aCA9IGF5ICsg MHgxcDMyIC0gMHgxcDMyOw0KCXlsID0gYXkgLSB5aDsNCg0KCXhoMiA9IHho ICogeGg7DQoJeWgyID0geWggKiB5aDsNCg0KCWggPSBheCAqIGF4ICsgYXkg KiBheTsNCgl1c2luZ19sb2cxcCA9IDA7DQoJaWYgKGggPCAxIC0gMHgxcC01 IHx8IGggPiAxICsgMHgxcC00KSB7DQoJCXQyID0geWgyOw0KCQl0NCA9IHhs ICogeGwgKyB5bCAqIHlsICsgMiAqICh4aCAqIHhsICsgeWggKiB5bCk7DQoJ CW5vcm0odDIsIHQ0KTsNCgkJc3BhZGQodDIsIHQ0LCB4aDIpOw0KCX0gZWxz ZSB7DQoJCWlmICh4aDIgPj0gMC41ICYmIHhoMiA8PSAyKSB7DQoJCQl1c2lu Z19sb2cxcCA9IDE7DQoJCQl4aDIgLT0gMTsNCgkJfSBlbHNlIGlmICh4aDIg Pj0gMC4yNSAmJiB4aDIgPCAwLjUgJiYgeWgyID49IDAuMjUpIHsNCgkJCXVz aW5nX2xvZzFwID0gMTsNCgkJCXhoMiAtPSAwLjU7DQoJCQl5aDIgLT0gMC41 Ow0KCQl9DQoJCXQyID0geGwgKiB4bDsNCgkJdDEgPSAyICogeGggKiB4bDsN CgkJbm9ybSh0MSwgdDIpOw0KCQlzcGFkZCh0MSwgdDIsIHhoMik7DQoJCW5v cm0odDEsIHQyKTsNCgkJdDMgPSB0MTsNCgkJdDQgPSB0MjsNCg0KCQl0MSA9 IDIgKiB5aCAqIHlsOw0KCQl0MiA9IHlsICogeWw7DQoJCW5vcm0odDEsIHQy KTsNCgkJc3BhZGQodDEsIHQyLCB5aDIpOw0KCQlub3JtKHQxLCB0Mik7DQoN CgkJbm9ybSh0MiwgdDQpOw0KCQlub3JtKHQxLCB0Myk7DQoNCgkJc3BhZGQo dDIsIHQ0LCB0Myk7DQoJCXNwYWRkKHQyLCB0NCwgdDEpOw0KCX0NCglpZiAo IXVzaW5nX2xvZzFwICYmIGggPiAwLjVGICYmIGggPCAzKSB7DQoJCXVzaW5n X2xvZzFwID0gMTsNCgkJc3BhZGQodDIsIHQ0LCAtMSk7DQoJfQ0KCWlmICh1 c2luZ19sb2cxcCkNCgkJcmV0dXJuIChjcGFja2wobG9nMXBsKHQyICsgdDQp ICogMC41LCBhdGFuMmwoeSwgeCkpKTsNCglyZXR1cm4gKGNwYWNrbChsb2ds KHQyICsgdDQpICogMC41LCBhdGFuMmwoeSwgeCkpKTsNCn0NCg== --0-1856759337-1344113148=:3101-- From owner-freebsd-numerics@FreeBSD.ORG Sat Aug 4 20:52:13 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7BC72106566B for ; Sat, 4 Aug 2012 20:52:13 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail16.syd.optusnet.com.au (mail16.syd.optusnet.com.au [211.29.132.197]) by mx1.freebsd.org (Postfix) with ESMTP id EA8998FC08 for ; Sat, 4 Aug 2012 20:52:12 +0000 (UTC) Received: from c122-106-171-246.carlnfd1.nsw.optusnet.com.au (c122-106-171-246.carlnfd1.nsw.optusnet.com.au [122.106.171.246]) by mail16.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q74KqAKt015288 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 5 Aug 2012 06:52:11 +1000 Date: Sun, 5 Aug 2012 06:52:10 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Stephen Montgomery-Smith In-Reply-To: <501D54DE.4010707@missouri.edu> Message-ID: <20120805064612.H3101@besplex.bde.org> References: <5017111E.6060003@missouri.edu> <501C361D.4010807@missouri.edu> <20120804165555.X1231@besplex.bde.org> <501D51D7.1020101@missouri.edu> <501D54DE.4010707@missouri.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-numerics@freebsd.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 04 Aug 2012 20:52:13 -0000 On Sat, 4 Aug 2012, Stephen Montgomery-Smith wrote: > On 08/04/2012 11:46 AM, Stephen Montgomery-Smith wrote: >> On 08/04/2012 03:49 AM, Bruce Evans wrote: >>> On Fri, 3 Aug 2012, Stephen Montgomery-Smith wrote: >> >>> I made good progress with clog[fl](). The accuracy problems near 0 >>> are long solved and proved to be solved. > > Do you have access to a logl and log1pl? I found hypotl and atan2l. I have my development version which hasn't been committed for too long. It was handy for testing some things in clog(), but you can do almost the same by testing float functions and using double log/log1p() in them when you want to see errors in your functions except ones from when they call logf/log1pf(). Bruce From owner-freebsd-numerics@FreeBSD.ORG Sat Aug 4 22:03:41 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E7F9F106566C for ; Sat, 4 Aug 2012 22:03:41 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id AEA3C8FC08 for ; Sat, 4 Aug 2012 22:03:41 +0000 (UTC) Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q74M3YDA017807; Sat, 4 Aug 2012 17:03:35 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <501D9C36.2040207@missouri.edu> Date: Sat, 04 Aug 2012 17:03:34 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:14.0) Gecko/20120714 Thunderbird/14.0 MIME-Version: 1.0 To: Bruce Evans References: <5017111E.6060003@missouri.edu> <501C361D.4010807@missouri.edu> <20120804165555.X1231@besplex.bde.org> <501D51D7.1020101@missouri.edu> <20120805030609.R3101@besplex.bde.org> In-Reply-To: <20120805030609.R3101@besplex.bde.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-numerics@freebsd.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 04 Aug 2012 22:03:42 -0000 On 08/04/2012 03:45 PM, Bruce Evans wrote: > On Sat, 4 Aug 2012, Stephen Montgomery-Smith wrote: > >> On 08/04/2012 03:49 AM, Bruce Evans wrote: >>> On Fri, 3 Aug 2012, Stephen Montgomery-Smith wrote: >> >>> I made good progress with clog[fl](). The accuracy problems near 0 >>> are long solved and proved to be solved. >> >> I would be interested to see how you solved this. Would you be >> willing to post your code somewhere? I am starting to go through your code. What is the purpose of the macros norm and spadd? Also, the proof you provided (which I haven't gone through in detail yet) - do you intend to put them as comments in the code? For example, when you write: % EXTRACT_WORDS(hx, lx, ax); % bits = ((uint64_t)hx << 32) | lx; % bits += 0x04000000; % hx = bits >> 32; % lx = bits & 0xf8000000; % INSERT_WORDS(xh, hx, lx); % xl = ax - xh; it would be very helpful to include a comment like % /* * xh is ax rounded to nearest 26 bit floating point. * xl is such that x = xh + xl. * Because xh is rounded and not truncated to 26 bits, * only the top 26 bits (not 27=53-26) of the mantissa of * xl can be non-zero. */ because while that can be deduced from reading the code, it is by no means obvious. By the way, that rounding to the nearest 26 bits is a very smart idea.