Date: Sun, 23 Sep 2012 04:05:37 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Stephen Montgomery-Smith <stephen@missouri.edu> Cc: freebsd-numerics@freebsd.org Subject: Re: Complex arg-trig functions Message-ID: <20120923030719.E1209@besplex.bde.org> In-Reply-To: <505D4F84.90005@missouri.edu> References: <5017111E.6060003@missouri.edu> <50553424.2080902@missouri.edu> <20120916134730.Y957@besplex.bde.org> <5055ECA8.2080008@missouri.edu> <20120917022614.R2943@besplex.bde.org> <50562213.9020400@missouri.edu> <20120917060116.G3825@besplex.bde.org> <50563C57.60806@missouri.edu> <20120918012459.V5094@besplex.bde.org> <5057A932.3000603@missouri.edu> <5057F24B.7020605@missouri.edu> <20120918162105.U991@besplex.bde.org> <20120918232850.N2144@besplex.bde.org> <20120919010613.T2493@besplex.bde.org> <505BD9B4.8020801@missouri.edu> <20120921172402.W945@besplex.bde.org> <20120921212525.W1732@besplex.bde.org> <505C7490.90600@missouri.edu> <20120922042112.E3044@besplex.bde.org> <505CBF14.70908@missouri.edu> <505CC11A.5030502@missouri.edu> <20120922081607.F3613@besplex.bde.org> <20120922091625.Y3828@besplex.b! de.org> <505D1037.8010202@missouri.edu> <20120922142349.X4599@besplex.bde.org> <505D4BFA.5050401@missouri.edu> <505D4F84.90005@missouri.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 22 Sep 2012, Stephen Montgomery-Smith wrote: > I am finding some errors with catrigl.c in real_part_reciprocal. I don't > know how SET_LDBL_EXPSIGN is meant to work. But I needed to add the extra > statement: > > + scale = 1; > SET_LDBL_EXPSIGN(scale, 0x7fff - ix); Good fix. I forgot that the normalization bit is not part of the exponent for ld80. So setting ony the exponent bits gives a pseudo-zero (zero normalized mantissa and nonzero exponent). I think pseudo-zeros are treated as zeros on i387. Your fix works by setting the normalization bit. On i387, scale = 1 gives some exponent and sign that won't be used, and and a mantissa of 0x8000000000000000ULL. SET_LDBL_EXPSIGN() keeps this mantissa and overrides the exponent and sign to (0, whatever). I don't understand why my tests didn't discover this bug. They only cover the exponent range of doubles, but that is plenty to reach the buggy code. In logl() I spent a lot of time optimizing settings of long doubles as bits, end ended up using just SET_LDBL_EXPSIGN() to modify a normal value that didn't need special setting. Alternative algorithms that created a special normal value first or set all the mantissa bits as bits were slower. The access macros for setting the mantissa bits weren't even committed. Many long double functions use direct bit-field accesses instead. This is unportable and tends to be slower. Here is the method used in ld80/s_expl.c for setting 2**k: @ /* Prepare scale factors. */ @ v.xbits.man = 1ULL << 63; This is the non-implicit normalization bit for ld80. ld128 has implicit normalization so it uses 0 here. The macros in _fpmath.h for handling the normalization bit are poor, and the normalization is known for ld80, so this just hard-codes the value. scale = 0 or scale = 1 here tends to be slower, since it asks to set the sign and exponent bits too. I used it in catrig to reduce unportabilities (there is only the expsign access, and there is a macro for that). Compilers may be able to optimize away the extra setting of the sign and exponent bits by noticing that they will be overwritten soon, and when they don't it turns out that setting things twice is often the best method for confusing compilers into generating optimal memory accesses, since optimal often doesn't equal least number. @ if (k >= LDBL_MIN_EXP) { @ v.xbits.expsign = BIAS + k; @ twopk = v.e; @ } else { @ v.xbits.expsign = BIAS + k + 10000; @ twopkp10000 = v.e; @ } This has complications to avoid setting unrepresentable exponent bits for infinities and denormals. In catrig, these complications are not at runtime (the original exponent is large so negating it doesn't ask for an infinity; negating it might ask for a denormal so 1 is added to the negation of it to produce the new exponent, and this cannot ask for an infinity either). I don't like the direct bit-field accesses in the above although I wrote them. Efficiency tests show that these particular bit-field accesses are optimized well enough on amd64 and i386. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120923030719.E1209>