Date: Thu, 28 Feb 2019 07:15:14 +1100 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Steve Kargl <sgk@troutmask.apl.washington.edu> Cc: freebsd-numerics@freebsd.org Subject: Re: Update ENTERI() macro Message-ID: <20190228060920.R4413@besplex.bde.org> In-Reply-To: <20190227161906.GA77785@troutmask.apl.washington.edu> References: <20190226191825.GA68479@troutmask.apl.washington.edu> <20190227145002.P907@besplex.bde.org> <20190227074811.GA75972@troutmask.apl.washington.edu> <20190227201214.V1823@besplex.bde.org> <20190227161906.GA77785@troutmask.apl.washington.edu>
index | next in thread | previous in thread | raw e-mail
On Wed, 27 Feb 2019, Steve Kargl wrote:
> On Wed, Feb 27, 2019 at 09:15:52PM +1100, Bruce Evans wrote:
>>
>> ENTERI() hard-codes the long double for simplicity. Remember, it is only
>> needed for long double precision on i386. But I forgot about long double
>> complex types, and didn't dream about indirect long double types in sincosl().
>
> That simplicity does not work for long double complex. We will
>
> need either ENTERIC as in
>
> #define ENTERIC() ENTERIT(long double complex)
>
> or a direct use of ENTERIT as you have done s_clogl.c
I wrote ENTERIT() to work around this problem.
>>> I'm fine with making ENTERI() only toggle precision, and adding
>>> a LEAVEI() to reset precision. RETURNI(r) would then be
>>>
>>> #define RETURNI(r) \
>>> do { \
>>> LEAVEI(); \
>>> return (r); \
>>> } while (0)
>>
>> No, may be an expression, so it must be evaluated before LEAVEI(). This
>> is the reason for existence of the variable to hold the result.
>
> So, we'll need RETURNI for long double and one for long double complex.
> Or, we give RETURNI a second parameter, which is the input parameter of
> the function
I said to use your method of __typeof(). I tested this:
XX --- /tmp/math_private.h Sun Nov 27 17:58:57 2005
XX +++ ./math_private.h Thu Feb 28 06:17:26 2019
XX @@ -474,21 +474,22 @@
XX /* Support switching the mode to FP_PE if necessary. */
XX #if defined(__i386__) && !defined(NO_FPSETPREC)
XX -#define ENTERI() ENTERIT(long double)
XX -#define ENTERIT(returntype) \
XX - returntype __retval; \
XX +#define ENTERI() \
XX fp_prec_t __oprec; \
XX \
XX if ((__oprec = fpgetprec()) != FP_PE) \
XX fpsetprec(FP_PE)
XX -#define RETURNI(x) do { \
XX - __retval = (x); \
XX - if (__oprec != FP_PE) \
XX - fpsetprec(__oprec); \
XX +#define LEAVEI() \
XX + if ((__oprec = fpgetprec()) != FP_PE) \
XX + fpsetprec(FP_PE)
XX +#define RETURNI(expr) do { \
XX + __typeof(expr) __retval = (expr); \
XX + \
XX + LEAVEI(); \
XX RETURNF(__retval); \
XX } while (0)
XX #else
XX #define ENTERI()
XX -#define ENTERIT(x)
XX -#define RETURNI(x) RETURNF(x)
XX +#define LEAVEI()
XX +#define RETURNI(expr) RETURNF(expr)
XX #endif
XX
This compiles, but has minor problems. Note that the apparent style
bug of initializing __retval in its declaration is needed in cases
where __typeof() gives a const type. This happens in my code that
uses RETURNI(1 + tiny) to set inexact. I think it would also happen
for RETURNI(1). The type is then int instead of floating point, and
I need to check that this is harmless.
clogl() is the only user of ENTERIT(). Its size expands from 2302
bytes text to 2399 when compiled by gcc-3.3.3. I hope that this is
just gcc not doing a very good job optimizing the returns (there are
many RETURNI()s fpr clogl()). Repeating the return code instead of
jumping to it might even be optimal.
> #define RETURNI(x, r) \
> do { \
> x = (r) \
> LEAVEI(); \
> return (r); \
> } while (0)
>
> This will cause a lot of churn.
Indeed.
My version causes 1 line of churn:
XX --- /tmp/s_clogl.c Fri Jul 20 16:00:11 2018
XX +++ ./s_clogl.c Thu Feb 28 05:58:05 2019
XX @@ -66,5 +66,5 @@
XX int kx, ky;
XX
XX - ENTERIT(long double complex);
XX + ENTERI();
XX
XX x = creall(z);
>> Combined sin and cos probably does work better outside of benchmarks for
>> sin and cos alone, since it does less work so leaves more resources for
>> the, more useful things.
>
> Exactly! I have a significant amount of Fortran code that does
>
> z = cmplx(cos(x), sin(x))
>
> in modern C this is 'z = CMPLX(cos(x), sin(x))'. GCC with optimization
> enables will convert this to z = cexp(cmplx(0,x)) where it expects cexp
> to optimize this to sincos().
This is an pessimization unless everything is inlined. An optimization
would convert cexp(cmplx(0,x)) to sin(x) and cos(x) or sincos(x).
> GCC on FreeBSD will not do this optimization
> because FreeBSD's libm is not C99 compliant.
It is more conformant than most for cexp(). I think old gcc just doesn't
attempt such optimizations.
> When I worked on sincos() I tried a few variations. This included
> the simpliest implementation:
>
> void
> sincos(double x, double *s, double *c)
> {
> *c = cos(x);
> *s= sin(x);
> }
>
> I tried argument reduction with kernels.
>
> void
> sincos(double x, double *s, double *c)
> {
> a = inline argument reduction done to set a.
> *c = k_cos(x);
> *s= k_sin(x);
> }
You mean *c = s_cos(x), etc. That was good enough.
> And finally the version that was committed where k_cos and k_sin
> were manually inlined and re-arranged to reduce redundant computations.
That has excessive manual inlining. It should have only inlined s_cos()
and s_sin(), and changed k_cos() and k_sin() from extern to static inline.
Someday the data for these inline functions should be deduplicated, but
the data is small compared with that for the expl kernel.
Bruce
home |
help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20190228060920.R4413>
