From owner-freebsd-numerics@FreeBSD.ORG Sun Sep 16 20:53:45 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id F3447106566C for ; Sun, 16 Sep 2012 20:53:44 +0000 (UTC) (envelope-from stephen@missouri.edu) Received: from wilberforce.math.missouri.edu (wilberforce.math.missouri.edu [128.206.184.213]) by mx1.freebsd.org (Postfix) with ESMTP id BC1708FC08 for ; Sun, 16 Sep 2012 20:53:44 +0000 (UTC) Received: from [127.0.0.1] (wilberforce.math.missouri.edu [128.206.184.213]) by wilberforce.math.missouri.edu (8.14.5/8.14.5) with ESMTP id q8GKrhpE064673; Sun, 16 Sep 2012 15:53:43 -0500 (CDT) (envelope-from stephen@missouri.edu) Message-ID: <50563C57.60806@missouri.edu> Date: Sun, 16 Sep 2012 15:53:43 -0500 From: Stephen Montgomery-Smith User-Agent: Mozilla/5.0 (X11; Linux i686; rv:15.0) Gecko/20120827 Thunderbird/15.0 MIME-Version: 1.0 To: Bruce Evans References: <5017111E.6060003@missouri.edu> <502A780B.2010106@missouri.edu> <20120815223631.N1751@besplex.bde.org> <502C0CF8.8040003@missouri.edu> <20120906221028.O1542@besplex.bde.org> <5048D00B.8010401@missouri.edu> <504D3CCD.2050006@missouri.edu> <504FF726.9060001@missouri.edu> <20120912191556.F1078@besplex.bde.org> <20120912225847.J1771@besplex.bde.org> <50511B40.3070009@missouri.edu> <20120913204808.T1964@besplex.bde.org> <5051F59C.6000603@missouri.edu> <20120914014208.I2862@besplex.bde.org> <50526050.2070303@missouri.edu> <20120914212403.H1983@besplex.bde.org> <50538E28.6050400@missouri.edu> <20120915231032.C2669@besplex.bde.org> <50548E15.3010405@missouri.edu> <5054C027.2040008@missouri.edu> <5054C200.7090307@missouri.edu> <20120916041132.D6344@besplex.bde.org> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> <50562213.9020400@missouri.edu> <20120917060116.G3825@besplex.bde.org> In-Reply-To: <20120917060116.G3825@besplex.bde.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-numerics@freebsd.org Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Sep 2012 20:53:45 -0000 On 09/16/2012 03:29 PM, Bruce Evans wrote: > On Sun, 16 Sep 2012, Stephen Montgomery-Smith wrote: > >> On 09/16/2012 11:51 AM, Bruce Evans wrote: >>> >>> I don't like that. It will be much slower on almost 1/4 of arg space. >>> The only reason to consider not doing it is that the args that it >>> applies to are not very likely, and optimizing for them may pessimize >>> the usual case. >> >> The pessimization when |z| is not small is tiny. It takes no time at >> all to check that |z| is small. > > Not necessarily on out-of-order machines (most x86). The CPU executes > multiple paths speculatively and concurrently. If it does more on an > unused path, then it might do less on the used path. It may mispredict > the branch on the size of |z| and thus misguess which path to do more > on. (I don't know many details of this. For example, does it do > anything at all on paths predicted to be not taken?) Losses from this > are usually described as branch mispredictions. They might cost 20 > (50? 100?) cycles after taking 2 about cycles to actually check |z| > (2 cycles pipelined but more like + 8 in real time, > and it is the latter time that you lose by backing out). > > The only sure way to avoid branch mispredictions is to not have any, > and catrig is too complicated for that. Yes, but I did a time test. And in my case the test was almost always failing. > >> On the other hand let me go through the code and see what happens when >> |x| is small or |y| is small. There are actually specific formulas >> that work well in these two cases, and they are probably not that much >> slower than the formulas I decided to remove. And when you chase >> through all the logic and "if" statements, you may find that you >> didn't use up a whole bunch of time for these very special cases of >> |z| small - most of the extra time merely being the decisions invoked >> by the "if" statements. > > But all general cases end up going through an extern function like > acos() or atan2(), and just calling another function is a significant > overhead. When |z| is small, the arg(s) to the other function will > probably be an special case for it (e.g., acos(small)). The other > function should optimize this and not take as long as an average call. > However, since it is special, it may cause branch mispredictions for > other uses of the function. I understand what you are saying. I guess it just seems to me that the "proper" way to do it is to make the C compiler really awesome and do this for you. (Doesn't the Intel compiler try to embed functions inline if it knows it will speed things up)? >> Furthermore, casinh etc are not commonly used functions. Putting huge >> amounts of effort looking at special cases to speed it up a little >> somehow feels wrong to me. In fact, if the programmer knows that he >> will be wanting casinh, and evaluated very fast, then he should be >> motivated enough to try out using z in the case when |z| is small, and >> see if that really speeds things up. Well, if casinh goes 20% slower, your not going to be testing too many fewer cases. > True. Now I mainly want it to be fast so that I can test more cases. I understand. But putting those special cases into casinh offends my sense of taste.