Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 23 Sep 2012 04:05:37 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Stephen Montgomery-Smith <stephen@missouri.edu>
Cc:        freebsd-numerics@freebsd.org
Subject:   Re: Complex arg-trig functions
Message-ID:  <20120923030719.E1209@besplex.bde.org>
In-Reply-To: <505D4F84.90005@missouri.edu>
References:  <5017111E.6060003@missouri.edu> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> <50562213.9020400@missouri.edu> <20120917060116.G3825@besplex.bde.org> <50563C57.60806@missouri.edu> <20120918012459.V5094@besplex.bde.org> <5057A932.3000603@missouri.edu> <5057F24B.7020605@missouri.edu> <20120918162105.U991@besplex.bde.org> <20120918232850.N2144@besplex.bde.org> <20120919010613.T2493@besplex.bde.org> <505BD9B4.8020801@missouri.edu> <20120921172402.W945@besplex.bde.org> <20120921212525.W1732@besplex.bde.org> <505C7490.90600@missouri.edu> <20120922042112.E3044@besplex.bde.org> <505CBF14.70908@missouri.edu> <505CC11A.5030502@missouri.edu> <20120922081607.F3613@besplex.bde.org> <20120922091625.Y3828@besplex.b! de.org> <505D1037.8010202@missouri.edu> <20120922142349.X4599@besplex.bde.org> <505D4BFA.5050401@missouri.edu> <505D4F84.90005@missouri.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 22 Sep 2012, Stephen Montgomery-Smith wrote:

> I am finding some errors with catrigl.c in real_part_reciprocal.  I don't 
> know how SET_LDBL_EXPSIGN is meant to work.  But I needed to add the extra 
> statement:
>
> +        scale = 1;
>        SET_LDBL_EXPSIGN(scale, 0x7fff - ix);

Good fix.  I forgot that the normalization bit is not part of the exponent
for ld80.  So setting ony the exponent bits gives a pseudo-zero (zero
normalized mantissa and nonzero exponent).  I think pseudo-zeros are
treated as zeros on i387.

Your fix works by setting the normalization bit.  On i387, scale = 1
gives some exponent and sign that won't be used, and and a mantissa
of 0x8000000000000000ULL.  SET_LDBL_EXPSIGN() keeps this mantissa
and overrides the exponent and sign to (0, whatever).

I don't understand why my tests didn't discover this bug.  They only
cover the exponent range of doubles, but that is plenty to reach the
buggy code.

In logl() I spent a lot of time optimizing settings of long doubles
as bits, end ended up using just SET_LDBL_EXPSIGN() to modify a
normal value that didn't need special setting.  Alternative algorithms
that created a special normal value first or set all the mantissa bits
as bits were slower.  The access macros for setting the mantissa bits
weren't even committed.  Many long double functions use direct bit-field
accesses instead.  This is unportable and tends to be slower.

Here is the method used in ld80/s_expl.c for setting 2**k:

@ 	/* Prepare scale factors. */
@ 	v.xbits.man = 1ULL << 63;

This is the non-implicit normalization bit for ld80.  ld128 has implicit
normalization so it uses 0 here.  The macros in _fpmath.h for handling
the normalization bit are poor, and the normalization is known for ld80,
so this just hard-codes the value.

scale = 0 or scale = 1 here tends to be slower, since it asks to set the
sign and exponent bits too.  I used it in catrig to reduce unportabilities
(there is only the expsign access, and there is a macro for that).
Compilers may be able to optimize away the extra setting of the sign
and exponent bits by noticing that they will be overwritten soon, and
when they don't it turns out that setting things twice is often the best
method for confusing compilers into generating optimal memory accesses,
since optimal often doesn't equal least number.

@ 	if (k >= LDBL_MIN_EXP) {
@ 		v.xbits.expsign = BIAS + k;
@ 		twopk = v.e;
@ 	} else {
@ 		v.xbits.expsign = BIAS + k + 10000;
@ 		twopkp10000 = v.e;
@ 	}

This has complications to avoid setting unrepresentable exponent bits for
infinities and denormals.  In catrig, these complications are not at
runtime (the original exponent is large so negating it doesn't ask for
an infinity; negating it might ask for a denormal so 1 is added to
the negation of it to produce the new exponent, and this cannot ask for
an infinity either).

I don't like the direct bit-field accesses in the above although I wrote
them.  Efficiency tests show that these particular bit-field accesses
are optimized well enough on amd64 and i386.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120923030719.E1209>