Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 5 Aug 2012 06:45:48 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Stephen Montgomery-Smith <stephen@missouri.edu>
Cc:        freebsd-numerics@freebsd.org, Bruce Evans <brde@optusnet.com.au>
Subject:   Re: Complex arg-trig functions
Message-ID:  <20120805030609.R3101@besplex.bde.org>
In-Reply-To: <501D51D7.1020101@missouri.edu>
References:  <5017111E.6060003@missouri.edu> <501C361D.4010807@missouri.edu> <20120804165555.X1231@besplex.bde.org> <501D51D7.1020101@missouri.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--0-1856759337-1344113148=:3101
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed

On Sat, 4 Aug 2012, Stephen Montgomery-Smith wrote:

> On 08/04/2012 03:49 AM, Bruce Evans wrote:
>> On Fri, 3 Aug 2012, Stephen Montgomery-Smith wrote:
>
>> I made good progress with clog[fl]().  The accuracy problems near 0
>> are long solved and proved to be solved.
>
> I would be interested to see how you solved this.  Would you be willing to 
> post your code somewhere?

I just subtracted 1 in the right place:

         x*x - 1 + y*y              needs doubled precision (and |x| >= |y|)
                                    (also need x*x >= 0.5)
         (x*x - 0.5) + (y*y - 0.5)  adjustment for above when x*x and y*y
                                    are between 0.25 and 0.5
         x*x + y*y - 1              needs tripled precision (and |x| != 1)
         log(x*x + y*y)             log() would subtract 1 at the same point
                                    as the previous expression, so would need
                                    a tripled-precision log.

To see that doubled precision is enough:
- no problem if x*x + y*y is far from 1, so assume it is close to 1
- doubled precision makes x*x and y*y exact
     (It must be full doubled precision.  I first thought that there was a
     problem splitting types with odd MANT_DIG.  E.g., MANT_DIG = 53
     for doubles.  This must be split as 26 high bits and 27 low bits,
     but then squaring the 27 low bits in double precision at first
     seems to need 54 bits, and we only have 53.  This problem is avoided
     by rounding the high term to nearest.  Then the low term only needs
     26 bits.)
- sort so that |x| >= |y|, and take absolute values so that I don't need
   to write '|*|' again
- if x*x >= 0.5, then x*x - 1 is exact in doubled precision (since x*x
   must be <= 2 by the assumption that x*x + y*y is near 1.
- if x*x < 0.5, then it must be >= 0.25, and y*y must be in the same range,
   with both slightly less than 0.5, to get x*x + y*y.  So x*x - 0.5 and
   y*y - 0.5 are exact in doubled precision.
- we have created 2 exact terms that add up to x*x + y*y - 1 in infinite
   precision.  In doubled precision, the sum is usually not exact.  But
   in the problem case where there is a large cancelation, the sum is
   exact; other cases are easier:
   - when the exponents of the 2 terms differ by more than 1, there is no
     cancelation and we get a doubled precision result with all bits
     correct (the last one correctly rounded).  This is more than enough.
   - when the exponents of the 2 terms differ by at most 1, first line up
     the bits by shifting as necessary.  Clearly at most 2 more bits are
     required to hold an exact result.  But cancelation of just 2 leading
     bits opens space to hold the exact result in doubled precision.
     Otherwise, we are in the previous case, with 1-2 bits less than
     doubled precision.  This is still more than enough.
   So we have x*x + y*y - 1 in doubled precision except possibly for
   errors other than the rounding error in the lowest 2 bits.

To see that the difference is not very small:
- if x > 1, then x >= 1 + DBL_EPSILON, and x*x is already not very close
   to 1
- if x < sqrt(2)/2, similarly (I didn't check this properly)
- if sqrt(2)/2 < x < 1, write x = 1 - u where u is positive and not large.
   Then x*x = 1 - 2*u + u*u.  Any y such that y*y when added to this gets
   close to 1 must be near sqrt(2*u).  Since u can't be very small, y
   can't be very small.  The worst cast is when u = XXX_EPSILON/2.  In
   double precision, u = 2**-53.  Then 2*u = 2**-52 and y ~= 2**-26.  When
   we line up bits to add the doubled precision terms the don't extend
   very far to the right.  They fit in less than 3 53-bit bit strings, with
   the leading bit representing 1 and the rightmost bit representing
   2**-158.
     (x*x and -1 fit in 2 53-bit bit-strings.  Since y ~= 2**-26, y*y's
     exponent is 52 or 53 less than 1's exponent.  Shifting to line up
     the bits requires just 1 more 53-bit bit string to the right of
     the other 2.)
   If everything canceled except for the rightmost bit, then the value
   would be 2**-158.  This is the smallest possible value, since the
   above worst case has u*u's bits as far as possible to the right
   (after shifting), and the final result gets its lowest bits from u*u.
   The smallest value observed in double precision was 2**-156.  This
   happens when u = 2**-52.  (Due to numerical accidents, when u = 2**-53,
   y*y can't kill enough bits of -2*u + u*u to get close.)

Limiting the closeness is important for showing that underflow can't
happen either in the calculations or on conversion of the doubled
precision result, but underflow tends to be avoided naturally.  It
is important that y*y doesn't underflow (including when y is split
into hi+lo, lo*lo must not underflow).  So tiny y must not be handled
by this case.  The "lining up bits" argument shows that tiny y can't
get close to 1.

> For the inverse trig functions, my accuracy is about 4 ulps, and I am not 
> looking to improve it at all.  In particular, the algorithm for casin(h) and 
> cacos(h) is so complicated that I don't see how this can be improved without 
> a lot of work.
>
> However if you solve this problem for clog, I can see that it is very likely 
> that your solution will also work for catan(h).  If you get this problem 
> licked, I will be interested to see how you do it.

I seem to have finally got it below 2.0 (mostly below 1.5) for clog().
clog() is simpler, but it was still hard to find all the exceptional
behaviours, so that the worse cases could even be noticed.

>> Then there is the problem of x and y tiny or denormal but with their
>> ration not tiny enough to ignore one of them.  1/z is then large or
>> infinite, but it it should rarely be infinite, and underflow shouldn't
>> occur for just x*x+y*y.  This is easily handled as in hypot() (not sure
>> about the method here) and now clog() by scaling:
>> 
>> %     /* Avoid underflow. */
>> %     if (ax < 0x1p-511)
>> %         return (cpack(
>> %             log(hypot(x * 0x1p511, y * 0x1p511)) - 511 * ln2_lo -
>> %             511 * ln2_hi,
>> %             atan2(y, x)));
>
> Wow - this is really getting down and dirty!  Why do you first subtract 511 * 
> ln_2 lo and then 511 * ln2_hi?  I am guessing that you are aiming at saving a 
> few ulp by trying to represent M_LN_2 in some kind of 1.5 times double 
> precision?

Representing log(2) and hi+lo is standard technique, but is needed in
contexts like this mainly for multiplication by a variable k.  Here I
should probably use a double constant ln2_times_511.  This constant
shouldn't be calculated as 511*ln2 since this might lose accuracy (just
half an ulp?) unnecessarily.  The terms are supposed to be added from
low to high for maximal accuracy, but I didn't check that the
logh(hypot()) term is actually lowest.  I tried replacing this by a
sloppy 511*ln2 and it lost more than half an ulp, so maybe the addition
is doing something good too.

All the -511's are wrong, in different ways.  I'm too tired to get this
right now, but had changed them to -468 (off by 10 error adding 53 to
them).  This seemed to work, but has several more off-by-53 errors.  The
-511's for the boost are wrong magic unrelated to the -511 for the
threshold.

Full code attached.

% ...
% double complex
% clog(double complex z)
% {
% ...
% 	/* Avoid spurious underflow, and reduce inaccuracies when ax is 1. */
% 	if (ax == 1) {
% 		if (ay < 0x1p-511)
% 			return (cpack((ay * 0.5) * ay, atan2(y, x)));
% 		return (cpack(log1p(ay * ay) * 0.5, atan2(y, x)));
% 	}

As before, but moved.

% 
% 	/* Avoid underflow when ax is not small.  Also handle zero args. */
% 	if (ay < DBL_MAX / 0x1p53 && ay * 0x1p53 <= ax)
% 		return (cpack(log(ax), atan2(y, x)));

This is the case of large differences in exponents in hypot().

% 
% 	/* Avoid overflow.  Also handle infinite args. */
% 	if (ax > 0x1p1023)
% 		return (cpack(log(hypot(x * 0.5, y * 0.5)) + ln2_lo + ln2_hi,
% 		    atan2(y, x)));

Use ln2* a lot.  Avoid divisions and M_E.


% 	if (ax > 0x1p511)
% 		return (cpack(log(hypot(x, y)), atan2(y, x)));

Subcase for overflow.  The 0x1p1023 threshold is to avoid hypot()
overflowing in the general expression.  This one is to avoid squares
overflowing, now that we avoid hypot() in the usual case and do our
own squaring.

% 
% 	/* Avoid underflow when ax is small. */
% 	if (ax < 0x1p-468)
% 		return (cpack(log(hypot(x * 0x1p468, y * 0x1p468)) -
% 		    468 * ln2_lo - 468 * ln2_hi, atan2(y, x)));

Buggy -468.  See above.  I tested mainly in float precision where the
corresponding constants are less buggy.

% 
% 	/* Take extra care when the real part of the result is near 0. */
% 	EXTRACT_WORDS(hx, lx, ax);
% 	bits = ((uint64_t)hx << 32) | lx;
% 	bits += 0x04000000;
% 	hx = bits >> 32;
% 	lx = bits & 0xf8000000;
% 	INSERT_WORDS(xh, hx, lx);
% 	xl = ax - xh;
% 	EXTRACT_WORDS(hy, ly, ay);
% 	bits = ((uint64_t)hy << 32) | ly;
% 	bits += 0x04000000;
% 	hy = bits >> 32;
% 	ly = bits & 0xf8000000;
% 	INSERT_WORDS(yh, hy, ly);
% 	yl = ay - yh;

The hi+lo decomposition is painful in double precision.

We fall through to here and use this rather inefficient decomposition in
most cases, to reduce special cases above and make it easier to be more
accurate (just need log*() with more accuracy.  In testing the float case,
I replaced log*f() by log*l() and got perfect rounding from the extra-
precision code.  This helped verify the algorithm.)

% 
% 	xh2 = xh * xh;
% 	yh2 = yh * yh;
% 
% 	h = ax * ax + ay * ay;
% 	using_log1p = 0;
% 	if (h < 1 - 0x1p-5 || h > 1 + 0x1p-4) {

Not very close to 1.

% 		t2 = yh2;
% 		t4 = xl * xl + yl * yl + 2 * (xh * xl + yh * yl);
% 		norm(t2, t4);
% 		spadd(t2, t4, xh2);

Mostly, full multi-precision code isn't useful, so this just adds up
terms from low to high, with extra precision for the final step only
(this extra precision is probably only useful if we subtract 1 later).
This could be optimized a bit by subtracting 1 or 0.5 and 0.5 as in the
more delicate case, but 1 would still need to be subtracted later in
some cases, so I just fall through to that.

% 	} else {

Very close to 1.

% 		if (xh2 >= 0.5 && xh2 <= 2) {
% 			using_log1p = 1;
% 			xh2 -= 1;
% 		} else if (xh2 >= 0.25 && xh2 < 0.5 && yh2 >= 0.25) {
% 			using_log1p = 1;
% 			xh2 -= 0.5;
% 			yh2 -= 0.5;
% 		}
% 		t2 = xl * xl;
% 		t1 = 2 * xh * xl;

Exact so far.

% 		norm(t1, t2);
% 		spadd(t1, t2, xh2);
% 		norm(t1, t2);
% 		t3 = t1;
% 		t4 = t2;

x*x [-1 or -0.5] is in (t3, t4), exactly I hope.

% 
% 		t1 = 2 * yh * yl;
% 		t2 = yl * yl;
% 		norm(t1, t2);
% 		spadd(t1, t2, yh2);
% 		norm(t1, t2);

y*y [-0.5] is in (t1, t2), exactly I hope.

% 
% 		norm(t2, t4);
% 		norm(t1, t3);
% 
% 		spadd(t2, t4, t3);
% 		spadd(t2, t4, t1);

Add up terms.  Not exact in all cases, and not a full Dekkers algorithm
AFAIK.

% 	}
% 	if (!using_log1p && h > 0.5 && h < 3) {
% 		using_log1p = 1;
% 		spadd(t2, t4, -1);

Sloppier addition of -1 for cases not near 1.  Only guaranteed not to
lose accuracy when the infinite precision h is between 0.5 and 2, but
it helps up to about 3.

% 	}
% 	if (using_log1p)
% 		return (cpack(log1p(t2 + t4) * 0.5, atan2(y, x)));
% 	return (cpack(log(t2 + t4) * 0.5, atan2(y, x)));
% }

Subtracting 1 is no good for h far outside of [0.5, 2], so we can't use
log1p() in all cases.  t2+t4 for the log() case is just x*x + y*y
evaluated as accurately as possible (saves an ulp or 2 in the final
result.

% float complex
% clogf(float complex z)
% {
% ...
% 	/* Avoid underflow when ax is not small.  Also handle zero args. */
% 	if (ay < FLT_MAX / 0x1p24F && ay * 0x1p24F <= ax)
% 		return (cpackf(logf(ax), atan2f(y, x)));

Everything is lexically identical with log() if possible.  Magic numbers
are always different.  I should have converted FOO_MAX to magic.

FP constants in float functions must have float type to avoid defeating
half the point of having float functions by evaluating some expressions
in non-float precision.

% 	/* Avoid underflow when ax is small. */
% 	if (ay < 0x1p-39F)
% 		return (cpackf(logf(hypotf(x * 0x1p39F, y * 0x1p39F)) -
% 		    39 * ln2f_lo - 39 * ln2f_hi, atan2f(y, x)));

-39 = -63 + 24 has the same bugs as -458 = -511 + 53 in double precision.

% 
% 	/* Take extra care when the real part of the result is near 0. */
% #if 1

I develop in float precision for easier testing, and duplicate the
algorithm under this part of the ifdef.

% 	GET_FLOAT_WORD(hx, ax);
% 	SET_FLOAT_WORD(xh, hx & 0xfffff000);
% 	xl = ax - xh;
% 	GET_FLOAT_WORD(hy, ay);
% 	SET_FLOAT_WORD(yh, hy & 0xfffff000);
% 	yl = ay - yh;

The hi+lo decomposition is less painful in float precision.

% ...
% #else
% 	dh = (double_t)ax * ax + (double_t)ay * ay;
% 	if (dh > 0.5 && dh < 3) {
% 		dh = (double_t)ax * ax - 1 + (double_t)ay * ay;
% 		return (cpackf(log1pf(dh) * 0.5F, atan2f(y, x)));
% 	}
% 	return (cpackf(logf(dh) * 0.5F, atan2f(y, x)));

More than doubled precision is avaiable in hardware on most supported
arches and is easy to use.  Your version had the -1 in the wrong place.

double_t is used for minor efficiency reasons.  Elswhere, clogf() uses
uses float_t and clog() uses double_t to avoid compiler bugfeatures
(norm() and spadd() don't work without them) and for efficiency.

I made norm() and spadd() type-generic, and removed the STRICT_ASSIGN()'
in them that were to work around the compiler bugs.  It is now the
responsibility of callers to pass them float_t or double_t.

% #endif
% }
% 
% long double complex
% clogl(long double complex z)
% {
% ...
% 	/* Avoid spurious underflow, and reduce inaccuracies when ax is 1. */
% 	if (ax == 1) {
% 		if (ay < 0x1p-8191L)
% 			return (cpackl((ay * 0.5) * ay, atan2l(y, x)));
% 		return (cpackl(log1pl(ay * ay) * 0.5, atan2l(y, x)));
% 	}

Magic numbers for ld80 only.  I can translate this to ld128 in about 10
minutes while I rememeber what the magic numbers are for.  Ifdefing them
would take longer and give a mess.

% 	/* Take extra care when the real part of the result is near 0. */
% 	xh = ax + 0x1p32 - 0x1p32;
% 	xl = ax - xh;
% 	yh = ay + 0x1p32 - 0x1p32;
% 	yl = ay - yh;

The hi+lo decomposition is painless in long double precision, because
ilong double expressions are not evaluated in extra precision.

% 	if (!using_log1p && h > 0.5F && h < 3) {
% 		using_log1p = 1;
% 		spadd(t2, t4, -1);
% 	}

Forgot to translate away the F suffix.  I avoid F and L suffixes wherever
possible, so that no translation or diffs for it are necessary wherever
possible, but there are squllions of 0.5F's.

% 	if (using_log1p)
% 		return (cpackl(log1pl(t2 + t4) * 0.5, atan2l(y, x)));
% 	return (cpackl(logl(t2 + t4) * 0.5, atan2l(y, x)));
% }

Bruce
--0-1856759337-1344113148=:3101
Content-Type: TEXT/PLAIN; charset=US-ASCII; name="cplex.c"
Content-Transfer-Encoding: BASE64
Content-ID: <20120805064548.W3101@besplex.bde.org>
Content-Description: 
Content-Disposition: attachment; filename="cplex.c"

I2luY2x1ZGUgPGNvbXBsZXguaD4NCiNpbmNsdWRlIDxmbG9hdC5oPg0KDQoj
aW5jbHVkZSAiZnBtYXRoLmgiDQojaW5jbHVkZSAibWF0aC5oIg0KI2luY2x1
ZGUgIm1hdGhfcHJpdmF0ZS5oIg0KDQojdW5kZWYgbm9ybQ0KI2RlZmluZQlu
b3JtKGEsIGIpIGRvIHsJCVwNCglfX3R5cGVvZihhKSBfX3c7CVwNCgkJCQlc
DQoJaWYgKGZhYnMoYSkgPCBmYWJzKGIpKSB7IFwNCgkJX193ID0gKGEpOwlc
DQoJCShhKSA9IChiKTsJXA0KCQkoYikgPSBfX3c7CVwNCgl9CQkJXA0KCV9f
dyA9IChhKSArIChiKTsJXA0KCShiKSA9ICgoYSkgLSBfX3cpICsgKGIpOyBc
DQoJKGEpID0gX193OwkJXA0KfSB3aGlsZSAoMCkNCg0KI3VuZGVmIHNwYWRk
DQojZGVmaW5lCXNwYWRkKGEsIGIsIGMpIGRvIHsJXA0KCV9fdHlwZW9mKGEp
IF9fdG1wOwlcDQoJCQkJXA0KCV9fdG1wID0gKGMpOwkJXA0KCW5vcm0oX190
bXAsIChhKSk7CVwNCgkoYikgKz0gKGEpOwkJXA0KCShhKSA9IF9fdG1wOwkJ
XA0KfSB3aGlsZSAoMCkNCg0Kc3RhdGljIGNvbnN0IGRvdWJsZQ0KbG4yX2hp
ID0gNi45MzE0NzE4MDU1ODI5ODcxZS0xLAkJLyogIDB4MTYyZTQyZmVmYTAw
MDAuMHAtNTMgKi8NCmxuMl9sbyA9IDEuNjQ2NTk0OTU4Mjg5NzA4MmUtMTI7
CS8qICAweDFjZjc5YWJjOWUzYjNhLjBwLTkyICovDQoNCmRvdWJsZSBjb21w
bGV4DQpjbG9nKGRvdWJsZSBjb21wbGV4IHopDQp7DQoJZG91YmxlX3QgaCwg
dCwgdDEsIHQyLCB0MywgdDQsIHgsIHk7DQoJZG91YmxlX3QgYXgsIGF5LCB4
aCwgeGgyLCB4bCwgeWgsIHloMiwgeWw7DQoJdWludDY0X3QgYml0czsNCgl1
aW50MzJfdCBoeCwgaHksIGx4LCBseTsNCglpbnQgdXNpbmdfbG9nMXA7DQoN
Cgl4ID0gY3JlYWwoeik7DQoJeSA9IGNpbWFnKHopOw0KDQoJLyogSGFuZGxl
IE5hTnMgdXNpbmcgdGhlIGdlbmVyYWwgZm9ybXVsYSB0byBtaXggdGhlbSBy
aWdodC4gKi8NCglpZiAoeCAhPSB4IHx8IHkgIT0geSkNCgkJcmV0dXJuIChj
cGFjayhsb2coaHlwb3QoeCwgeSkpLCBhdGFuMih5LCB4KSkpOw0KDQoJYXgg
PSBmYWJzKHgpOw0KCWF5ID0gZmFicyh5KTsNCglpZiAoYXggPCBheSkgew0K
CQl0ID0gYXg7DQoJCWF4ID0gYXk7DQoJCWF5ID0gdDsNCgl9DQoNCgkvKiBB
dm9pZCBzcHVyaW91cyB1bmRlcmZsb3csIGFuZCByZWR1Y2UgaW5hY2N1cmFj
aWVzIHdoZW4gYXggaXMgMS4gKi8NCglpZiAoYXggPT0gMSkgew0KCQlpZiAo
YXkgPCAweDFwLTUxMSkNCgkJCXJldHVybiAoY3BhY2soKGF5ICogMC41KSAq
IGF5LCBhdGFuMih5LCB4KSkpOw0KCQlyZXR1cm4gKGNwYWNrKGxvZzFwKGF5
ICogYXkpICogMC41LCBhdGFuMih5LCB4KSkpOw0KCX0NCg0KCS8qIEF2b2lk
IHVuZGVyZmxvdyB3aGVuIGF4IGlzIG5vdCBzbWFsbC4gIEFsc28gaGFuZGxl
IHplcm8gYXJncy4gKi8NCglpZiAoYXkgPCBEQkxfTUFYIC8gMHgxcDUzICYm
IGF5ICogMHgxcDUzIDw9IGF4KQ0KCQlyZXR1cm4gKGNwYWNrKGxvZyhheCks
IGF0YW4yKHksIHgpKSk7DQoNCgkvKiBBdm9pZCBvdmVyZmxvdy4gIEFsc28g
aGFuZGxlIGluZmluaXRlIGFyZ3MuICovDQoJaWYgKGF4ID4gMHgxcDEwMjMp
DQoJCXJldHVybiAoY3BhY2sobG9nKGh5cG90KHggKiAwLjUsIHkgKiAwLjUp
KSArIGxuMl9sbyArIGxuMl9oaSwNCgkJICAgIGF0YW4yKHksIHgpKSk7DQoJ
aWYgKGF4ID4gMHgxcDUxMSkNCgkJcmV0dXJuIChjcGFjayhsb2coaHlwb3Qo
eCwgeSkpLCBhdGFuMih5LCB4KSkpOw0KDQoJLyogQXZvaWQgdW5kZXJmbG93
IHdoZW4gYXggaXMgc21hbGwuICovDQoJaWYgKGF4IDwgMHgxcC00NjgpDQoJ
CXJldHVybiAoY3BhY2sobG9nKGh5cG90KHggKiAweDFwNDY4LCB5ICogMHgx
cDQ2OCkpIC0NCgkJICAgIDQ2OCAqIGxuMl9sbyAtIDQ2OCAqIGxuMl9oaSwg
YXRhbjIoeSwgeCkpKTsNCg0KCS8qIFRha2UgZXh0cmEgY2FyZSB3aGVuIHRo
ZSByZWFsIHBhcnQgb2YgdGhlIHJlc3VsdCBpcyBuZWFyIDAuICovDQoJRVhU
UkFDVF9XT1JEUyhoeCwgbHgsIGF4KTsNCgliaXRzID0gKCh1aW50NjRfdClo
eCA8PCAzMikgfCBseDsNCgliaXRzICs9IDB4MDQwMDAwMDA7DQoJaHggPSBi
aXRzID4+IDMyOw0KCWx4ID0gYml0cyAmIDB4ZjgwMDAwMDA7DQoJSU5TRVJU
X1dPUkRTKHhoLCBoeCwgbHgpOw0KCXhsID0gYXggLSB4aDsNCglFWFRSQUNU
X1dPUkRTKGh5LCBseSwgYXkpOw0KCWJpdHMgPSAoKHVpbnQ2NF90KWh5IDw8
IDMyKSB8IGx5Ow0KCWJpdHMgKz0gMHgwNDAwMDAwMDsNCgloeSA9IGJpdHMg
Pj4gMzI7DQoJbHkgPSBiaXRzICYgMHhmODAwMDAwMDsNCglJTlNFUlRfV09S
RFMoeWgsIGh5LCBseSk7DQoJeWwgPSBheSAtIHloOw0KDQoJeGgyID0geGgg
KiB4aDsNCgl5aDIgPSB5aCAqIHloOw0KDQoJaCA9IGF4ICogYXggKyBheSAq
IGF5Ow0KCXVzaW5nX2xvZzFwID0gMDsNCglpZiAoaCA8IDEgLSAweDFwLTUg
fHwgaCA+IDEgKyAweDFwLTQpIHsNCgkJdDIgPSB5aDI7DQoJCXQ0ID0geGwg
KiB4bCArIHlsICogeWwgKyAyICogKHhoICogeGwgKyB5aCAqIHlsKTsNCgkJ
bm9ybSh0MiwgdDQpOw0KCQlzcGFkZCh0MiwgdDQsIHhoMik7DQoJfSBlbHNl
IHsNCgkJaWYgKHhoMiA+PSAwLjUgJiYgeGgyIDw9IDIpIHsNCgkJCXVzaW5n
X2xvZzFwID0gMTsNCgkJCXhoMiAtPSAxOw0KCQl9IGVsc2UgaWYgKHhoMiA+
PSAwLjI1ICYmIHhoMiA8IDAuNSAmJiB5aDIgPj0gMC4yNSkgew0KCQkJdXNp
bmdfbG9nMXAgPSAxOw0KCQkJeGgyIC09IDAuNTsNCgkJCXloMiAtPSAwLjU7
DQoJCX0NCgkJdDIgPSB4bCAqIHhsOw0KCQl0MSA9IDIgKiB4aCAqIHhsOw0K
CQlub3JtKHQxLCB0Mik7DQoJCXNwYWRkKHQxLCB0MiwgeGgyKTsNCgkJbm9y
bSh0MSwgdDIpOw0KCQl0MyA9IHQxOw0KCQl0NCA9IHQyOw0KDQoJCXQxID0g
MiAqIHloICogeWw7DQoJCXQyID0geWwgKiB5bDsNCgkJbm9ybSh0MSwgdDIp
Ow0KCQlzcGFkZCh0MSwgdDIsIHloMik7DQoJCW5vcm0odDEsIHQyKTsNCg0K
CQlub3JtKHQyLCB0NCk7DQoJCW5vcm0odDEsIHQzKTsNCg0KCQlzcGFkZCh0
MiwgdDQsIHQzKTsNCgkJc3BhZGQodDIsIHQ0LCB0MSk7DQoJfQ0KCWlmICgh
dXNpbmdfbG9nMXAgJiYgaCA+IDAuNSAmJiBoIDwgMykgew0KCQl1c2luZ19s
b2cxcCA9IDE7DQoJCXNwYWRkKHQyLCB0NCwgLTEpOw0KCX0NCglpZiAodXNp
bmdfbG9nMXApDQoJCXJldHVybiAoY3BhY2sobG9nMXAodDIgKyB0NCkgKiAw
LjUsIGF0YW4yKHksIHgpKSk7DQoJcmV0dXJuIChjcGFjayhsb2codDIgKyB0
NCkgKiAwLjUsIGF0YW4yKHksIHgpKSk7DQp9DQoNCnN0YXRpYyBjb25zdCBm
bG9hdA0KbG4yZl9oaSA9ICA2LjkzMTQ1NzUxOTVlLTEsCQkvKiAgMHhiMTcy
MDAuMHAtMjQgKi8NCmxuMmZfbG8gPSAgMS40Mjg2MDY3NjUzZS02OwkJLyog
IDB4YmZiZThlLjBwLTQzICovDQoNCmZsb2F0IGNvbXBsZXgNCmNsb2dmKGZs
b2F0IGNvbXBsZXggeikNCnsNCglkb3VibGVfdCBkaDsNCglmbG9hdF90IGgs
IHQsIHQxLCB0MiwgdDMsIHQ0LCB4LCB5Ow0KCWZsb2F0X3QgYXgsIGF5LCB4
aCwgeGgyLCB4bCwgeWgsIHloMiwgeWw7DQoJdWludDMyX3QgaHgsIGh5Ow0K
CWludCB1c2luZ19sb2cxcDsNCg0KCXggPSBjcmVhbGYoeik7DQoJeSA9IGNp
bWFnZih6KTsNCg0KCS8qIEhhbmRsZSBOYU5zIHVzaW5nIHRoZSBnZW5lcmFs
IGZvcm11bGEgdG8gbWl4IHRoZW0gcmlnaHQuICovDQoJaWYgKHggIT0geCB8
fCB5ICE9IHkpDQoJCXJldHVybiAoY3BhY2tmKGxvZ2YoaHlwb3RmKHgsIHkp
KSwgYXRhbjJmKHksIHgpKSk7DQoNCglheCA9IGZhYnNmKHgpOw0KCWF5ID0g
ZmFic2YoeSk7DQoJaWYgKGF4IDwgYXkpIHsNCgkJdCA9IGF4Ow0KCQlheCA9
IGF5Ow0KCQlheSA9IHQ7DQoJfQ0KDQoJLyogQXZvaWQgc3B1cmlvdXMgdW5k
ZXJmbG93LCBhbmQgcmVkdWNlIGluYWNjdXJhY2llcyB3aGVuIGF4IGlzIDEu
ICovDQoJaWYgKGF4ID09IDEpIHsNCgkJaWYgKGF5IDwgMHgxcC02M0YpDQoJ
CQlyZXR1cm4gKGNwYWNrZigoYXkgKiAwLjVGKSAqIGF5LCBhdGFuMmYoeSwg
eCkpKTsNCgkJcmV0dXJuIChjcGFja2YobG9nMXBmKGF5ICogYXkpICogMC41
RiwgYXRhbjJmKHksIHgpKSk7DQoJfQ0KDQoJLyogQXZvaWQgdW5kZXJmbG93
IHdoZW4gYXggaXMgbm90IHNtYWxsLiAgQWxzbyBoYW5kbGUgemVybyBhcmdz
LiAqLw0KCWlmIChheSA8IEZMVF9NQVggLyAweDFwMjRGICYmIGF5ICogMHgx
cDI0RiA8PSBheCkNCgkJcmV0dXJuIChjcGFja2YobG9nZihheCksIGF0YW4y
Zih5LCB4KSkpOw0KDQoJLyogQXZvaWQgb3ZlcmZsb3cuICBBbHNvIGhhbmRs
ZSBpbmZpbml0ZSBhcmdzLiAqLw0KCWlmIChheCA+IDB4MXAxMjdGKQ0KCQly
ZXR1cm4gKGNwYWNrZigNCgkJICAgIGxvZ2YoaHlwb3RmKHggKiAwLjVGLCB5
ICogMC41RikpICsgbG4yZl9sbyArIGxuMmZfaGksDQoJCSAgICBhdGFuMmYo
eSwgeCkpKTsNCglpZiAoYXggPiAweDFwNjNGKQ0KCQlyZXR1cm4gKGNwYWNr
Zihsb2dmKGh5cG90Zih4LCB5KSksIGF0YW4yZih5LCB4KSkpOw0KDQoJLyog
QXZvaWQgdW5kZXJmbG93IHdoZW4gYXggaXMgc21hbGwuICovDQoJaWYgKGF5
IDwgMHgxcC0zOUYpDQoJCXJldHVybiAoY3BhY2tmKGxvZ2YoaHlwb3RmKHgg
KiAweDFwMzlGLCB5ICogMHgxcDM5RikpIC0NCgkJICAgIDM5ICogbG4yZl9s
byAtIDM5ICogbG4yZl9oaSwgYXRhbjJmKHksIHgpKSk7DQoNCgkvKiBUYWtl
IGV4dHJhIGNhcmUgd2hlbiB0aGUgcmVhbCBwYXJ0IG9mIHRoZSByZXN1bHQg
aXMgbmVhciAwLiAqLw0KI2lmIDENCglHRVRfRkxPQVRfV09SRChoeCwgYXgp
Ow0KCVNFVF9GTE9BVF9XT1JEKHhoLCBoeCAmIDB4ZmZmZmYwMDApOw0KCXhs
ID0gYXggLSB4aDsNCglHRVRfRkxPQVRfV09SRChoeSwgYXkpOw0KCVNFVF9G
TE9BVF9XT1JEKHloLCBoeSAmIDB4ZmZmZmYwMDApOw0KCXlsID0gYXkgLSB5
aDsNCg0KCXhoMiA9IHhoICogeGg7DQoJeWgyID0geWggKiB5aDsNCg0KCWgg
PSBheCAqIGF4ICsgYXkgKiBheTsNCgl1c2luZ19sb2cxcCA9IDA7DQoJaWYg
KGggPCAoZmxvYXQpKDEgLSAweDFwLTUpIHx8IGggPiAoZmxvYXQpKDEgKyAw
eDFwLTQpKSB7DQoJCXQyID0geWgyOw0KCQl0NCA9IHhsICogeGwgKyB5bCAq
IHlsICsgMiAqICh4aCAqIHhsICsgeWggKiB5bCk7DQoJCW5vcm0odDIsIHQ0
KTsNCgkJc3BhZGQodDIsIHQ0LCB4aDIpOw0KCX0gZWxzZSB7DQoJCWlmICh4
aDIgPj0gMC41RiAmJiB4aDIgPD0gMikgew0KCQkJdXNpbmdfbG9nMXAgPSAx
Ow0KCQkJeGgyIC09IDE7DQoJCX0gZWxzZSBpZiAoeGgyID49IDAuMjVGICYm
IHhoMiA8IDAuNUYgJiYgeWgyID49IDAuMjVGKSB7DQoJCQl1c2luZ19sb2cx
cCA9IDE7DQoJCQl4aDIgLT0gMC41RjsNCgkJCXloMiAtPSAwLjVGOw0KCQl9
DQoJCXQyID0geGwgKiB4bDsNCgkJdDEgPSAyICogeGggKiB4bDsNCgkJbm9y
bSh0MSwgdDIpOw0KCQlzcGFkZCh0MSwgdDIsIHhoMik7DQoJCW5vcm0odDEs
IHQyKTsNCgkJdDMgPSB0MTsNCgkJdDQgPSB0MjsNCg0KCQl0MSA9IDIgKiB5
aCAqIHlsOw0KCQl0MiA9IHlsICogeWw7DQoJCW5vcm0odDEsIHQyKTsNCgkJ
c3BhZGQodDEsIHQyLCB5aDIpOw0KCQlub3JtKHQxLCB0Mik7DQoNCgkJbm9y
bSh0MiwgdDQpOw0KCQlub3JtKHQxLCB0Myk7DQoNCgkJc3BhZGQodDIsIHQ0
LCB0Myk7DQoJCXNwYWRkKHQyLCB0NCwgdDEpOw0KCX0NCglpZiAoIXVzaW5n
X2xvZzFwICYmIGggPiAwLjVGICYmIGggPCAzKSB7DQoJCXVzaW5nX2xvZzFw
ID0gMTsNCgkJc3BhZGQodDIsIHQ0LCAtMSk7DQoJfQ0KCWlmICh1c2luZ19s
b2cxcCkNCgkJcmV0dXJuIChjcGFja2YobG9nMXBmKHQyICsgdDQpICogMC41
RiwgYXRhbjJmKHksIHgpKSk7DQoJcmV0dXJuIChjcGFja2YobG9nZih0MiAr
IHQ0KSAqIDAuNUYsIGF0YW4yZih5LCB4KSkpOw0KI2Vsc2UNCglkaCA9IChk
b3VibGVfdClheCAqIGF4ICsgKGRvdWJsZV90KWF5ICogYXk7DQoJaWYgKGRo
ID4gMC41ICYmIGRoIDwgMykgew0KCQlkaCA9IChkb3VibGVfdClheCAqIGF4
IC0gMSArIChkb3VibGVfdClheSAqIGF5Ow0KCQlyZXR1cm4gKGNwYWNrZihs
b2cxcGYoZGgpICogMC41RiwgYXRhbjJmKHksIHgpKSk7DQoJfQ0KCXJldHVy
biAoY3BhY2tmKGxvZ2YoZGgpICogMC41RiwgYXRhbjJmKHksIHgpKSk7DQoj
ZW5kaWYNCn0NCg0KbG9uZyBkb3VibGUgY29tcGxleA0KY2xvZ2wobG9uZyBk
b3VibGUgY29tcGxleCB6KQ0Kew0KCWxvbmcgZG91YmxlIGgsIHQsIHQxLCB0
MiwgdDMsIHQ0LCB4LCB5Ow0KCWxvbmcgZG91YmxlIGF4LCBheSwgeGgsIHho
MiwgeGwsIHloLCB5aDIsIHlsOw0KCWludCB1c2luZ19sb2cxcDsNCg0KCXgg
PSBjcmVhbGwoeik7DQoJeSA9IGNpbWFnbCh6KTsNCg0KCS8qIEhhbmRsZSBO
YU5zIHVzaW5nIHRoZSBnZW5lcmFsIGZvcm11bGEgdG8gbWl4IHRoZW0gcmln
aHQuICovDQoJaWYgKHggIT0geCB8fCB5ICE9IHkpDQoJCXJldHVybiAoY3Bh
Y2tsKGxvZ2woaHlwb3RsKHgsIHkpKSwgYXRhbjJsKHksIHgpKSk7DQoNCglh
eCA9IGZhYnNsKHgpOw0KCWF5ID0gZmFic2woeSk7DQoJaWYgKGF4IDwgYXkp
IHsNCgkJdCA9IGF4Ow0KCQlheCA9IGF5Ow0KCQlheSA9IHQ7DQoJfQ0KDQoJ
LyogQXZvaWQgc3B1cmlvdXMgdW5kZXJmbG93LCBhbmQgcmVkdWNlIGluYWNj
dXJhY2llcyB3aGVuIGF4IGlzIDEuICovDQoJaWYgKGF4ID09IDEpIHsNCgkJ
aWYgKGF5IDwgMHgxcC04MTkxTCkNCgkJCXJldHVybiAoY3BhY2tsKChheSAq
IDAuNSkgKiBheSwgYXRhbjJsKHksIHgpKSk7DQoJCXJldHVybiAoY3BhY2ts
KGxvZzFwbChheSAqIGF5KSAqIDAuNSwgYXRhbjJsKHksIHgpKSk7DQoJfQ0K
DQoJLyogQXZvaWQgdW5kZXJmbG93IHdoZW4gYXggaXMgbm90IHNtYWxsLiAg
QWxzbyBoYW5kbGUgemVybyBhcmdzLiAqLw0KCWlmIChheSA8IExEQkxfTUFY
IC8gMHgxcDY0ICYmIGF5ICogMHgxcDY0IDw9IGF4KQ0KCQlyZXR1cm4gKGNw
YWNrbChsb2dsKGF4KSwgYXRhbjJsKHksIHgpKSk7DQoNCgkvKiBBdm9pZCBv
dmVyZmxvdy4gIEFsc28gaGFuZGxlIGluZmluaXRlIGFyZ3MuICovDQoJaWYg
KGF4ID4gMHgxcDE2MzgzTCkNCgkJcmV0dXJuIChjcGFja2wobG9nbChoeXBv
dGwoeCAqIDAuNSwgeSAqIDAuNSkpICsgbG4yX2xvICsgbG4yX2hpLA0KCQkg
ICAgYXRhbjJsKHksIHgpKSk7DQoJaWYgKGF4ID4gMHgxcDgxOTEpDQoJCXJl
dHVybiAoY3BhY2tsKGxvZ2woaHlwb3RsKHgsIHkpKSwgYXRhbjJsKHksIHgp
KSk7DQoNCgkvKiBBdm9pZCB1bmRlcmZsb3cgd2hlbiBheCBpcyBzbWFsbC4g
Ki8NCglpZiAoYXggPCAweDFwLTgxMjdMKQ0KCQlyZXR1cm4gKGNwYWNrbChs
b2dsKGh5cG90bCh4ICogMHgxcDgxMjdMLCB5ICogMHgxcDgxMjdMKSkgLQ0K
CQkgICAgODEyNyAqIGxuMl9sbyAtIDgxMjcgKiBsbjJfaGksIGF0YW4ybCh5
LCB4KSkpOw0KDQoJLyogVGFrZSBleHRyYSBjYXJlIHdoZW4gdGhlIHJlYWwg
cGFydCBvZiB0aGUgcmVzdWx0IGlzIG5lYXIgMC4gKi8NCgl4aCA9IGF4ICsg
MHgxcDMyIC0gMHgxcDMyOw0KCXhsID0gYXggLSB4aDsNCgl5aCA9IGF5ICsg
MHgxcDMyIC0gMHgxcDMyOw0KCXlsID0gYXkgLSB5aDsNCg0KCXhoMiA9IHho
ICogeGg7DQoJeWgyID0geWggKiB5aDsNCg0KCWggPSBheCAqIGF4ICsgYXkg
KiBheTsNCgl1c2luZ19sb2cxcCA9IDA7DQoJaWYgKGggPCAxIC0gMHgxcC01
IHx8IGggPiAxICsgMHgxcC00KSB7DQoJCXQyID0geWgyOw0KCQl0NCA9IHhs
ICogeGwgKyB5bCAqIHlsICsgMiAqICh4aCAqIHhsICsgeWggKiB5bCk7DQoJ
CW5vcm0odDIsIHQ0KTsNCgkJc3BhZGQodDIsIHQ0LCB4aDIpOw0KCX0gZWxz
ZSB7DQoJCWlmICh4aDIgPj0gMC41ICYmIHhoMiA8PSAyKSB7DQoJCQl1c2lu
Z19sb2cxcCA9IDE7DQoJCQl4aDIgLT0gMTsNCgkJfSBlbHNlIGlmICh4aDIg
Pj0gMC4yNSAmJiB4aDIgPCAwLjUgJiYgeWgyID49IDAuMjUpIHsNCgkJCXVz
aW5nX2xvZzFwID0gMTsNCgkJCXhoMiAtPSAwLjU7DQoJCQl5aDIgLT0gMC41
Ow0KCQl9DQoJCXQyID0geGwgKiB4bDsNCgkJdDEgPSAyICogeGggKiB4bDsN
CgkJbm9ybSh0MSwgdDIpOw0KCQlzcGFkZCh0MSwgdDIsIHhoMik7DQoJCW5v
cm0odDEsIHQyKTsNCgkJdDMgPSB0MTsNCgkJdDQgPSB0MjsNCg0KCQl0MSA9
IDIgKiB5aCAqIHlsOw0KCQl0MiA9IHlsICogeWw7DQoJCW5vcm0odDEsIHQy
KTsNCgkJc3BhZGQodDEsIHQyLCB5aDIpOw0KCQlub3JtKHQxLCB0Mik7DQoN
CgkJbm9ybSh0MiwgdDQpOw0KCQlub3JtKHQxLCB0Myk7DQoNCgkJc3BhZGQo
dDIsIHQ0LCB0Myk7DQoJCXNwYWRkKHQyLCB0NCwgdDEpOw0KCX0NCglpZiAo
IXVzaW5nX2xvZzFwICYmIGggPiAwLjVGICYmIGggPCAzKSB7DQoJCXVzaW5n
X2xvZzFwID0gMTsNCgkJc3BhZGQodDIsIHQ0LCAtMSk7DQoJfQ0KCWlmICh1
c2luZ19sb2cxcCkNCgkJcmV0dXJuIChjcGFja2wobG9nMXBsKHQyICsgdDQp
ICogMC41LCBhdGFuMmwoeSwgeCkpKTsNCglyZXR1cm4gKGNwYWNrbChsb2ds
KHQyICsgdDQpICogMC41LCBhdGFuMmwoeSwgeCkpKTsNCn0NCg==

--0-1856759337-1344113148=:3101--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120805030609.R3101>