From owner-freebsd-numerics@FreeBSD.ORG  Sun Sep 16 20:53:45 2012
Return-Path: <owner-freebsd-numerics@FreeBSD.ORG>
Delivered-To: freebsd-numerics@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id F3447106566C
	for <freebsd-numerics@freebsd.org>;
	Sun, 16 Sep 2012 20:53:44 +0000 (UTC)
	(envelope-from stephen@missouri.edu)
Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu
	[128.206.184.213])
	by mx1.freebsd.org (Postfix) with ESMTP id BC1708FC08
	for <freebsd-numerics@freebsd.org>;
	Sun, 16 Sep 2012 20:53:44 +0000 (UTC)
Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213])
	by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id
	q8GKrhpE064673; Sun, 16 Sep 2012 15:53:43 -0500 (CDT)
	(envelope-from stephen@missouri.edu)
Message-ID: <50563C57.60806@missouri.edu>
Date: Sun, 16 Sep 2012 15:53:43 -0500
From: Stephen Montgomery-Smith <stephen@missouri.edu>
User-Agent: Mozilla/5.0 (X11; Linux i686;
	rv:15.0) Gecko/20120827 Thunderbird/15.0
MIME-Version: 1.0
To: Bruce Evans <brde@optusnet.com.au>
References: <5017111E.6060003@missouri.edu> <502A780B.2010106@missouri.edu>
	<20120815223631.N1751@besplex.bde.org>
	<502C0CF8.8040003@missouri.edu>
	<20120906221028.O1542@besplex.bde.org>
	<5048D00B.8010401@missouri.edu> <504D3CCD.2050006@missouri.edu>
	<504FF726.9060001@missouri.edu>
	<20120912191556.F1078@besplex.bde.org>
	<20120912225847.J1771@besplex.bde.org>
	<50511B40.3070009@missouri.edu>
	<20120913204808.T1964@besplex.bde.org>
	<5051F59C.6000603@missouri.edu>
	<20120914014208.I2862@besplex.bde.org>
	<50526050.2070303@missouri.edu>
	<20120914212403.H1983@besplex.bde.org>
	<50538E28.6050400@missouri.edu>
	<20120915231032.C2669@besplex.bde.org>
	<50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu>
	<5054C200.7090307@missouri.edu>
	<20120916041132.D6344@besplex.bde.org>
	<50553424.2080902@missouri.edu>
	<20120916134730.Y957@besplex.bde.org>
	<5055ECA8.2080008@missouri.edu>
	<20120917022614.R2943@besplex.bde.org>
	<50562213.9020400@missouri.edu>
	<20120917060116.G3825@besplex.bde.org>
In-Reply-To: <20120917060116.G3825@besplex.bde.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-numerics@freebsd.org
Subject: Re: Complex arg-trig functions
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
	<freebsd-numerics.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
	<mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 16 Sep 2012 20:53:45 -0000

On 09/16/2012 03:29 PM, Bruce Evans wrote:
> On Sun, 16 Sep 2012, Stephen Montgomery-Smith wrote:
>
>> On 09/16/2012 11:51 AM, Bruce Evans wrote:
>>>
>>> I don't like that.  It will be much slower on almost 1/4 of arg space.
>>> The only reason to consider not doing it is that the args that it
>>> applies to are not very likely, and optimizing for them may pessimize
>>> the usual case.
>>
>> The pessimization when |z| is not small is tiny.  It takes no time at
>> all to check that |z| is small.
>
> Not necessarily on out-of-order machines (most x86).  The CPU executes
> multiple paths speculatively and concurrently.  If it does more on an
> unused path, then it might do less on the used path.  It may mispredict
> the branch on the size of |z| and thus misguess which path to do more
> on.  (I don't know many details of this.  For example, does it do
> anything at all on paths predicted to be not taken?)  Losses from this
> are usually described as branch mispredictions.  They might cost 20
> (50? 100?) cycles after taking 2 about cycles to actually check |z|
> (2 cycles pipelined but more like <length of pipe> + 8 in real time,
> and it is the latter time that you lose by backing out).
>
> The only sure way to avoid branch mispredictions is to not have any,
> and catrig is too complicated for that.

Yes, but I did a time test.  And in my case the test was almost always 
failing.

>
>> On the other hand let me go through the code and see what happens when
>> |x| is small or |y| is small.  There are actually specific formulas
>> that work well in these two cases, and they are probably not that much
>> slower than the formulas I decided to remove.  And when you chase
>> through all the logic and "if" statements, you may find that you
>> didn't use up a whole bunch of time for these very special cases of
>> |z| small - most of the extra time merely being the decisions invoked
>> by the "if" statements.
>
> But all general cases end up going through an extern function like
> acos() or atan2(), and just calling another function is a significant
> overhead.  When |z| is small, the arg(s) to the other function will
> probably be an special case for it (e.g., acos(small)).  The other
> function should optimize this and not take as long as an average call.
> However, since it is special, it may cause branch mispredictions for
> other uses of the function.

I understand what you are saying.  I guess it just seems to me that the 
"proper" way to do it is to make the C compiler really awesome and do 
this for you.  (Doesn't the Intel compiler try to embed functions inline 
if it knows it will speed things up)?

>> Furthermore, casinh etc are not commonly used functions.  Putting huge
>> amounts of effort looking at special cases to speed it up a little
>> somehow feels wrong to me.  In fact, if the programmer knows that he
>> will be wanting casinh, and evaluated very fast, then he should be
>> motivated enough to try out using z in the case when |z| is small, and
>> see if that really speeds things up.

Well, if casinh goes 20% slower, your not going to be testing too many 
fewer cases.

> True.  Now I mainly want it to be fast so that I can test more cases.

I understand.  But putting those special cases into casinh offends my 
sense of taste.