From owner-freebsd-numerics@freebsd.org  Fri Sep  7 17:54:09 2018
Return-Path: <owner-freebsd-numerics@freebsd.org>
Delivered-To: freebsd-numerics@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9C7BF1007B78
 for <freebsd-numerics@mailman.ysv.freebsd.org>;
 Fri,  7 Sep 2018 17:54:09 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
 [128.95.76.21])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "troutmask", Issuer "troutmask" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 3322372868
 for <freebsd-numerics@freebsd.org>; Fri,  7 Sep 2018 17:54:09 +0000 (UTC)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (localhost [127.0.0.1])
 by troutmask.apl.washington.edu (8.15.2/8.15.2) with ESMTPS id w87Hs7l6049699
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Fri, 7 Sep 2018 10:54:07 -0700 (PDT)
 (envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
 by troutmask.apl.washington.edu (8.15.2/8.15.2/Submit) id w87Hs6nr049698;
 Fri, 7 Sep 2018 10:54:06 -0700 (PDT) (envelope-from sgk)
Date: Fri, 7 Sep 2018 10:54:06 -0700
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Bruce Evans <brde@optusnet.com.au>
Cc: freebsd-numerics@freebsd.org
Subject: Re: j0 (and y0) in the range 2 <= x < (p/2)*log(2)
Message-ID: <20180907175406.GA49351@troutmask.apl.washington.edu>
Reply-To: sgk@troutmask.apl.washington.edu
References: <20180903235724.GA95333@troutmask.apl.washington.edu>
 <20180905201540.D1142@besplex.bde.org>
 <20180905152104.GA26453@troutmask.apl.washington.edu>
 <20180906034525.A2959@besplex.bde.org>
 <20180905190223.GA27865@troutmask.apl.washington.edu>
 <20180906202658.K978@besplex.bde.org>
 <20180906223026.GA43005@troutmask.apl.washington.edu>
 <20180907230825.A933@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20180907230825.A933@besplex.bde.org>
User-Agent: Mutt/1.10.1 (2018-07-13)
X-BeenThere: freebsd-numerics@freebsd.org
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: "Discussions of high quality implementation of libm functions."
 <freebsd-numerics.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-numerics/>
List-Post: <mailto:freebsd-numerics@freebsd.org>
List-Help: <mailto:freebsd-numerics-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-numerics>, 
 <mailto:freebsd-numerics-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 07 Sep 2018 17:54:09 -0000

On Sat, Sep 08, 2018 at 12:07:26AM +1000, Bruce Evans wrote:
> On Thu, 6 Sep 2018, Steve Kargl wrote:
> 
> > On Fri, Sep 07, 2018 at 02:02:20AM +1000, Bruce Evans wrote:
> >> ...
> >> I now think that no single fudge factor works for both P0 and Q0.  P0
> >> and Q0 have very different characteristics.  Merely choosing the
> >> ...
> >> To find different fudge factors, I would try starting with a common one
> >> and then iteratively adjust to different ones.
> >>
> >> j0 has a zero in the middle of at least the first subinterval, and the
> >> relative error should be minimized for this.  I think the choice of
> >> subintervals is related to this.  With only one zero per subinterval,
> >> the relative error near it can be minimized without messing up the
> >> relative error near other zeros.  The adjustment for this is absolutely
> >> tiny, so it is easier to arrange that it doesn't mess up the absolute
> >> error too much.
> 
> In fact, this must more or less work, and the difficulty is to make it work
> more than less.  Start with any reasonable formula like (u*cc-v*ss).  (This
> is not specific to j0, except it gives an example of a nontrivial formula.)
> cc and ss are held fixed.  The best choice of u is not known, but we if we
> can get fairly close to it (perhaps even off by a factor of 2), then we can
> throw away any previous approximation for v and solve for v (in high
> precision), then approximate that.  There must be no singularities for this
> to work in high precision.  We expect to avoid the singularities by some
> combination of starting with u close enough, keeping the interval small,
> and special handling of zeros.
> 
> Here the formula for v is something divided by ss, so we want to avoid
> zeros of ss.  ss is sin(x) - cos(x) which is close to sqrt(2) of the
> entire interval [2,2.857], so there is no problem with pole singularities.
> In the next interval, ss has a zero in the middle of the interval, so we
> should switch the roles of u and v so as to divide by cc instead (it is
> not so close to -sqrt(2)).
> 
> Some magic is needed to make the resulting v easy to approximate.
> Hopefully not much more than a good u and a small interval.
> 
> Then there is the known zero in the middle of the interval.  v should be
> chosen to at least have the correct signs on the FP values on each side
> of the zero.

I don't see how the above works.  We have 

j0(x) = sqrt(1/pi) * (cc*u-ss*v) / sqrt(x).

If we take u to be accurate (with some amount of error from the
asymptotic series) and assume v as an unknown, we can solve for
v to high accuracy.  But, u is still inaccurate.  So, in the
interval [2,2.857] we accept whatever the truncated asymptotic
series gives for u and adjust v to compensate.  In the next
interval, [2.857,4.547], we flip the roles of u and v based on
the behavior of cc and ss.  Eventually, we get to the interval
[8,inf).  The divergent asymptotic series in 8 <= x < p/2*log(2)
give u and v, which yield poor values for j0(x).  There are
4 zeros of j0(x) in this range, and cc and ss have 4 and 3 zeros,
respectfully.    

There are some subtleties in your approach that I don't
understand, yet!

I'll probably try a different approach tomorrow.  j0(x),
cc, and ss are smoothly varying functions of x.  u and v
appear to be slowly varying (in small intervals), so we can
have

j0(x)   = sqrt(1/pi) * (cc(x)  *u-ss(x)  *v) / sqrt(x).
j0(x+e) = sqrt(1/pi) * (cc(x+e)*u-ss(x+e)*v) / sqrt(x+e).
 
where e is some small perturbution.  Two equation and two
unknowns is easily solved.  Simply need to do this across
the interval to generate u(x) and v(x).

> I noticed that this method cannot work for jn, since u and v depend on n
> so there is no way to generate before runtime.  jn actually uses a different
> method and I suspect that it is much more inaccurate.  My tests don't
> cover jn since its n arg is an integer and I only have coverage for 2
> real args or 1 complex arg.

jn with n > 1 only uses the large argument asymptotic expansion for
the case of very large x where p0(x) = 1 and q0(x) = 0.  This then
gives (from the comment in e_jn.c)

     *     Jn(x) = cos(x-(2n+1)*pi/4)*sqrt(2/x*pi)

The work is in computing cos().  Elsewhere jn is computed via
upward or downward recursion.

> > I'm beginning to like my use of the product formula.  It
> > allows one to remove the zero from an interval.  I did some
> 
> I haven't seen that.  Is it for the final evaluation or for determining
> the approximation?  I don't know of any better way determining approximations
> near zeros except to multiply by a linear factor to remove the zero.
> 
> It is becoming clearer why a separate interval is needed for each zero.
> Polynomials can only give good approximations for one zero at a time
> except in special cases.

I haven't sent the code to you as its a work in progress.  I did 
send you some of my testing results for 2 <= x < 4 a couple weeks
ago.  For j0(x), the infinite product is (https://dlmf.nist.gov/10.21)

j0(x) = prod(1 - (x/zj)**2), j = 0, 1, 2, ...

with z0 the first zero of j0(x), etc.  This can be rewritten and
approximated by 

j0(x) / ((x-z0)*(x+z0)) = R(x) / S(x) 

While developing the rational approximation, on the LHS we use
z0 = z0hi + z0lo.  Both z0hi and z0lo have p-bits of precision.
The final result is

j0(x) = (x - z0hi - z0lo) * (x + z0) * R(x) / S(x)

-- 
Steve