From owner-freebsd-numerics@FreeBSD.ORG Sat Aug 4 20:46:00 2012 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 045921065672 for ; Sat, 4 Aug 2012 20:46:00 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail26.syd.optusnet.com.au (mail26.syd.optusnet.com.au [211.29.133.167]) by mx1.freebsd.org (Postfix) with ESMTP id 2E9B18FC15 for ; Sat, 4 Aug 2012 20:45:58 +0000 (UTC) Received: from c122-106-171-246.carlnfd1.nsw.optusnet.com.au (c122-106-171-246.carlnfd1.nsw.optusnet.com.au [122.106.171.246]) by mail26.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q74KjmCw016206 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 5 Aug 2012 06:45:50 +1000 Date: Sun, 5 Aug 2012 06:45:48 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Stephen Montgomery-Smith In-Reply-To: <501D51D7.1020101@missouri.edu> Message-ID: <20120805030609.R3101@besplex.bde.org> References: <5017111E.6060003@missouri.edu> <501C361D.4010807@missouri.edu> <20120804165555.X1231@besplex.bde.org> <501D51D7.1020101@missouri.edu> MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="0-1856759337-1344113148=:3101" Cc: freebsd-numerics@freebsd.org, Bruce Evans Subject: Re: Complex arg-trig functions X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 04 Aug 2012 20:46:00 -0000 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --0-1856759337-1344113148=:3101 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed On Sat, 4 Aug 2012, Stephen Montgomery-Smith wrote: > On 08/04/2012 03:49 AM, Bruce Evans wrote: >> On Fri, 3 Aug 2012, Stephen Montgomery-Smith wrote: > >> I made good progress with clog[fl](). The accuracy problems near 0 >> are long solved and proved to be solved. > > I would be interested to see how you solved this. Would you be willing to > post your code somewhere? I just subtracted 1 in the right place: x*x - 1 + y*y needs doubled precision (and |x| >= |y|) (also need x*x >= 0.5) (x*x - 0.5) + (y*y - 0.5) adjustment for above when x*x and y*y are between 0.25 and 0.5 x*x + y*y - 1 needs tripled precision (and |x| != 1) log(x*x + y*y) log() would subtract 1 at the same point as the previous expression, so would need a tripled-precision log. To see that doubled precision is enough: - no problem if x*x + y*y is far from 1, so assume it is close to 1 - doubled precision makes x*x and y*y exact (It must be full doubled precision. I first thought that there was a problem splitting types with odd MANT_DIG. E.g., MANT_DIG = 53 for doubles. This must be split as 26 high bits and 27 low bits, but then squaring the 27 low bits in double precision at first seems to need 54 bits, and we only have 53. This problem is avoided by rounding the high term to nearest. Then the low term only needs 26 bits.) - sort so that |x| >= |y|, and take absolute values so that I don't need to write '|*|' again - if x*x >= 0.5, then x*x - 1 is exact in doubled precision (since x*x must be <= 2 by the assumption that x*x + y*y is near 1. - if x*x < 0.5, then it must be >= 0.25, and y*y must be in the same range, with both slightly less than 0.5, to get x*x + y*y. So x*x - 0.5 and y*y - 0.5 are exact in doubled precision. - we have created 2 exact terms that add up to x*x + y*y - 1 in infinite precision. In doubled precision, the sum is usually not exact. But in the problem case where there is a large cancelation, the sum is exact; other cases are easier: - when the exponents of the 2 terms differ by more than 1, there is no cancelation and we get a doubled precision result with all bits correct (the last one correctly rounded). This is more than enough. - when the exponents of the 2 terms differ by at most 1, first line up the bits by shifting as necessary. Clearly at most 2 more bits are required to hold an exact result. But cancelation of just 2 leading bits opens space to hold the exact result in doubled precision. Otherwise, we are in the previous case, with 1-2 bits less than doubled precision. This is still more than enough. So we have x*x + y*y - 1 in doubled precision except possibly for errors other than the rounding error in the lowest 2 bits. To see that the difference is not very small: - if x > 1, then x >= 1 + DBL_EPSILON, and x*x is already not very close to 1 - if x < sqrt(2)/2, similarly (I didn't check this properly) - if sqrt(2)/2 < x < 1, write x = 1 - u where u is positive and not large. Then x*x = 1 - 2*u + u*u. Any y such that y*y when added to this gets close to 1 must be near sqrt(2*u). Since u can't be very small, y can't be very small. The worst cast is when u = XXX_EPSILON/2. In double precision, u = 2**-53. Then 2*u = 2**-52 and y ~= 2**-26. When we line up bits to add the doubled precision terms the don't extend very far to the right. They fit in less than 3 53-bit bit strings, with the leading bit representing 1 and the rightmost bit representing 2**-158. (x*x and -1 fit in 2 53-bit bit-strings. Since y ~= 2**-26, y*y's exponent is 52 or 53 less than 1's exponent. Shifting to line up the bits requires just 1 more 53-bit bit string to the right of the other 2.) If everything canceled except for the rightmost bit, then the value would be 2**-158. This is the smallest possible value, since the above worst case has u*u's bits as far as possible to the right (after shifting), and the final result gets its lowest bits from u*u. The smallest value observed in double precision was 2**-156. This happens when u = 2**-52. (Due to numerical accidents, when u = 2**-53, y*y can't kill enough bits of -2*u + u*u to get close.) Limiting the closeness is important for showing that underflow can't happen either in the calculations or on conversion of the doubled precision result, but underflow tends to be avoided naturally. It is important that y*y doesn't underflow (including when y is split into hi+lo, lo*lo must not underflow). So tiny y must not be handled by this case. The "lining up bits" argument shows that tiny y can't get close to 1. > For the inverse trig functions, my accuracy is about 4 ulps, and I am not > looking to improve it at all. In particular, the algorithm for casin(h) and > cacos(h) is so complicated that I don't see how this can be improved without > a lot of work. > > However if you solve this problem for clog, I can see that it is very likely > that your solution will also work for catan(h). If you get this problem > licked, I will be interested to see how you do it. I seem to have finally got it below 2.0 (mostly below 1.5) for clog(). clog() is simpler, but it was still hard to find all the exceptional behaviours, so that the worse cases could even be noticed. >> Then there is the problem of x and y tiny or denormal but with their >> ration not tiny enough to ignore one of them. 1/z is then large or >> infinite, but it it should rarely be infinite, and underflow shouldn't >> occur for just x*x+y*y. This is easily handled as in hypot() (not sure >> about the method here) and now clog() by scaling: >> >> % /* Avoid underflow. */ >> % if (ax < 0x1p-511) >> % return (cpack( >> % log(hypot(x * 0x1p511, y * 0x1p511)) - 511 * ln2_lo - >> % 511 * ln2_hi, >> % atan2(y, x))); > > Wow - this is really getting down and dirty! Why do you first subtract 511 * > ln_2 lo and then 511 * ln2_hi? I am guessing that you are aiming at saving a > few ulp by trying to represent M_LN_2 in some kind of 1.5 times double > precision? Representing log(2) and hi+lo is standard technique, but is needed in contexts like this mainly for multiplication by a variable k. Here I should probably use a double constant ln2_times_511. This constant shouldn't be calculated as 511*ln2 since this might lose accuracy (just half an ulp?) unnecessarily. The terms are supposed to be added from low to high for maximal accuracy, but I didn't check that the logh(hypot()) term is actually lowest. I tried replacing this by a sloppy 511*ln2 and it lost more than half an ulp, so maybe the addition is doing something good too. All the -511's are wrong, in different ways. I'm too tired to get this right now, but had changed them to -468 (off by 10 error adding 53 to them). This seemed to work, but has several more off-by-53 errors. The -511's for the boost are wrong magic unrelated to the -511 for the threshold. Full code attached. % ... % double complex % clog(double complex z) % { % ... % /* Avoid spurious underflow, and reduce inaccuracies when ax is 1. */ % if (ax == 1) { % if (ay < 0x1p-511) % return (cpack((ay * 0.5) * ay, atan2(y, x))); % return (cpack(log1p(ay * ay) * 0.5, atan2(y, x))); % } As before, but moved. % % /* Avoid underflow when ax is not small. Also handle zero args. */ % if (ay < DBL_MAX / 0x1p53 && ay * 0x1p53 <= ax) % return (cpack(log(ax), atan2(y, x))); This is the case of large differences in exponents in hypot(). % % /* Avoid overflow. Also handle infinite args. */ % if (ax > 0x1p1023) % return (cpack(log(hypot(x * 0.5, y * 0.5)) + ln2_lo + ln2_hi, % atan2(y, x))); Use ln2* a lot. Avoid divisions and M_E. % if (ax > 0x1p511) % return (cpack(log(hypot(x, y)), atan2(y, x))); Subcase for overflow. The 0x1p1023 threshold is to avoid hypot() overflowing in the general expression. This one is to avoid squares overflowing, now that we avoid hypot() in the usual case and do our own squaring. % % /* Avoid underflow when ax is small. */ % if (ax < 0x1p-468) % return (cpack(log(hypot(x * 0x1p468, y * 0x1p468)) - % 468 * ln2_lo - 468 * ln2_hi, atan2(y, x))); Buggy -468. See above. I tested mainly in float precision where the corresponding constants are less buggy. % % /* Take extra care when the real part of the result is near 0. */ % EXTRACT_WORDS(hx, lx, ax); % bits = ((uint64_t)hx << 32) | lx; % bits += 0x04000000; % hx = bits >> 32; % lx = bits & 0xf8000000; % INSERT_WORDS(xh, hx, lx); % xl = ax - xh; % EXTRACT_WORDS(hy, ly, ay); % bits = ((uint64_t)hy << 32) | ly; % bits += 0x04000000; % hy = bits >> 32; % ly = bits & 0xf8000000; % INSERT_WORDS(yh, hy, ly); % yl = ay - yh; The hi+lo decomposition is painful in double precision. We fall through to here and use this rather inefficient decomposition in most cases, to reduce special cases above and make it easier to be more accurate (just need log*() with more accuracy. In testing the float case, I replaced log*f() by log*l() and got perfect rounding from the extra- precision code. This helped verify the algorithm.) % % xh2 = xh * xh; % yh2 = yh * yh; % % h = ax * ax + ay * ay; % using_log1p = 0; % if (h < 1 - 0x1p-5 || h > 1 + 0x1p-4) { Not very close to 1. % t2 = yh2; % t4 = xl * xl + yl * yl + 2 * (xh * xl + yh * yl); % norm(t2, t4); % spadd(t2, t4, xh2); Mostly, full multi-precision code isn't useful, so this just adds up terms from low to high, with extra precision for the final step only (this extra precision is probably only useful if we subtract 1 later). This could be optimized a bit by subtracting 1 or 0.5 and 0.5 as in the more delicate case, but 1 would still need to be subtracted later in some cases, so I just fall through to that. % } else { Very close to 1. % if (xh2 >= 0.5 && xh2 <= 2) { % using_log1p = 1; % xh2 -= 1; % } else if (xh2 >= 0.25 && xh2 < 0.5 && yh2 >= 0.25) { % using_log1p = 1; % xh2 -= 0.5; % yh2 -= 0.5; % } % t2 = xl * xl; % t1 = 2 * xh * xl; Exact so far. % norm(t1, t2); % spadd(t1, t2, xh2); % norm(t1, t2); % t3 = t1; % t4 = t2; x*x [-1 or -0.5] is in (t3, t4), exactly I hope. % % t1 = 2 * yh * yl; % t2 = yl * yl; % norm(t1, t2); % spadd(t1, t2, yh2); % norm(t1, t2); y*y [-0.5] is in (t1, t2), exactly I hope. % % norm(t2, t4); % norm(t1, t3); % % spadd(t2, t4, t3); % spadd(t2, t4, t1); Add up terms. Not exact in all cases, and not a full Dekkers algorithm AFAIK. % } % if (!using_log1p && h > 0.5 && h < 3) { % using_log1p = 1; % spadd(t2, t4, -1); Sloppier addition of -1 for cases not near 1. Only guaranteed not to lose accuracy when the infinite precision h is between 0.5 and 2, but it helps up to about 3. % } % if (using_log1p) % return (cpack(log1p(t2 + t4) * 0.5, atan2(y, x))); % return (cpack(log(t2 + t4) * 0.5, atan2(y, x))); % } Subtracting 1 is no good for h far outside of [0.5, 2], so we can't use log1p() in all cases. t2+t4 for the log() case is just x*x + y*y evaluated as accurately as possible (saves an ulp or 2 in the final result. % float complex % clogf(float complex z) % { % ... % /* Avoid underflow when ax is not small. Also handle zero args. */ % if (ay < FLT_MAX / 0x1p24F && ay * 0x1p24F <= ax) % return (cpackf(logf(ax), atan2f(y, x))); Everything is lexically identical with log() if possible. Magic numbers are always different. I should have converted FOO_MAX to magic. FP constants in float functions must have float type to avoid defeating half the point of having float functions by evaluating some expressions in non-float precision. % /* Avoid underflow when ax is small. */ % if (ay < 0x1p-39F) % return (cpackf(logf(hypotf(x * 0x1p39F, y * 0x1p39F)) - % 39 * ln2f_lo - 39 * ln2f_hi, atan2f(y, x))); -39 = -63 + 24 has the same bugs as -458 = -511 + 53 in double precision. % % /* Take extra care when the real part of the result is near 0. */ % #if 1 I develop in float precision for easier testing, and duplicate the algorithm under this part of the ifdef. % GET_FLOAT_WORD(hx, ax); % SET_FLOAT_WORD(xh, hx & 0xfffff000); % xl = ax - xh; % GET_FLOAT_WORD(hy, ay); % SET_FLOAT_WORD(yh, hy & 0xfffff000); % yl = ay - yh; The hi+lo decomposition is less painful in float precision. % ... % #else % dh = (double_t)ax * ax + (double_t)ay * ay; % if (dh > 0.5 && dh < 3) { % dh = (double_t)ax * ax - 1 + (double_t)ay * ay; % return (cpackf(log1pf(dh) * 0.5F, atan2f(y, x))); % } % return (cpackf(logf(dh) * 0.5F, atan2f(y, x))); More than doubled precision is avaiable in hardware on most supported arches and is easy to use. Your version had the -1 in the wrong place. double_t is used for minor efficiency reasons. Elswhere, clogf() uses uses float_t and clog() uses double_t to avoid compiler bugfeatures (norm() and spadd() don't work without them) and for efficiency. I made norm() and spadd() type-generic, and removed the STRICT_ASSIGN()' in them that were to work around the compiler bugs. It is now the responsibility of callers to pass them float_t or double_t. % #endif % } % % long double complex % clogl(long double complex z) % { % ... % /* Avoid spurious underflow, and reduce inaccuracies when ax is 1. */ % if (ax == 1) { % if (ay < 0x1p-8191L) % return (cpackl((ay * 0.5) * ay, atan2l(y, x))); % return (cpackl(log1pl(ay * ay) * 0.5, atan2l(y, x))); % } Magic numbers for ld80 only. I can translate this to ld128 in about 10 minutes while I rememeber what the magic numbers are for. Ifdefing them would take longer and give a mess. % /* Take extra care when the real part of the result is near 0. */ % xh = ax + 0x1p32 - 0x1p32; % xl = ax - xh; % yh = ay + 0x1p32 - 0x1p32; % yl = ay - yh; The hi+lo decomposition is painless in long double precision, because ilong double expressions are not evaluated in extra precision. % if (!using_log1p && h > 0.5F && h < 3) { % using_log1p = 1; % spadd(t2, t4, -1); % } Forgot to translate away the F suffix. I avoid F and L suffixes wherever possible, so that no translation or diffs for it are necessary wherever possible, but there are squllions of 0.5F's. % if (using_log1p) % return (cpackl(log1pl(t2 + t4) * 0.5, atan2l(y, x))); % return (cpackl(logl(t2 + t4) * 0.5, atan2l(y, x))); % } Bruce --0-1856759337-1344113148=:3101 Content-Type: TEXT/PLAIN; charset=US-ASCII; name="cplex.c" Content-Transfer-Encoding: BASE64 Content-ID: <20120805064548.W3101@besplex.bde.org> Content-Description: Content-Disposition: attachment; filename="cplex.c" I2luY2x1ZGUgPGNvbXBsZXguaD4NCiNpbmNsdWRlIDxmbG9hdC5oPg0KDQoj aW5jbHVkZSAiZnBtYXRoLmgiDQojaW5jbHVkZSAibWF0aC5oIg0KI2luY2x1 ZGUgIm1hdGhfcHJpdmF0ZS5oIg0KDQojdW5kZWYgbm9ybQ0KI2RlZmluZQlu b3JtKGEsIGIpIGRvIHsJCVwNCglfX3R5cGVvZihhKSBfX3c7CVwNCgkJCQlc DQoJaWYgKGZhYnMoYSkgPCBmYWJzKGIpKSB7IFwNCgkJX193ID0gKGEpOwlc DQoJCShhKSA9IChiKTsJXA0KCQkoYikgPSBfX3c7CVwNCgl9CQkJXA0KCV9f dyA9IChhKSArIChiKTsJXA0KCShiKSA9ICgoYSkgLSBfX3cpICsgKGIpOyBc DQoJKGEpID0gX193OwkJXA0KfSB3aGlsZSAoMCkNCg0KI3VuZGVmIHNwYWRk DQojZGVmaW5lCXNwYWRkKGEsIGIsIGMpIGRvIHsJXA0KCV9fdHlwZW9mKGEp IF9fdG1wOwlcDQoJCQkJXA0KCV9fdG1wID0gKGMpOwkJXA0KCW5vcm0oX190 bXAsIChhKSk7CVwNCgkoYikgKz0gKGEpOwkJXA0KCShhKSA9IF9fdG1wOwkJ XA0KfSB3aGlsZSAoMCkNCg0Kc3RhdGljIGNvbnN0IGRvdWJsZQ0KbG4yX2hp ID0gNi45MzE0NzE4MDU1ODI5ODcxZS0xLAkJLyogIDB4MTYyZTQyZmVmYTAw MDAuMHAtNTMgKi8NCmxuMl9sbyA9IDEuNjQ2NTk0OTU4Mjg5NzA4MmUtMTI7 CS8qICAweDFjZjc5YWJjOWUzYjNhLjBwLTkyICovDQoNCmRvdWJsZSBjb21w bGV4DQpjbG9nKGRvdWJsZSBjb21wbGV4IHopDQp7DQoJZG91YmxlX3QgaCwg dCwgdDEsIHQyLCB0MywgdDQsIHgsIHk7DQoJZG91YmxlX3QgYXgsIGF5LCB4 aCwgeGgyLCB4bCwgeWgsIHloMiwgeWw7DQoJdWludDY0X3QgYml0czsNCgl1 aW50MzJfdCBoeCwgaHksIGx4LCBseTsNCglpbnQgdXNpbmdfbG9nMXA7DQoN Cgl4ID0gY3JlYWwoeik7DQoJeSA9IGNpbWFnKHopOw0KDQoJLyogSGFuZGxl IE5hTnMgdXNpbmcgdGhlIGdlbmVyYWwgZm9ybXVsYSB0byBtaXggdGhlbSBy aWdodC4gKi8NCglpZiAoeCAhPSB4IHx8IHkgIT0geSkNCgkJcmV0dXJuIChj cGFjayhsb2coaHlwb3QoeCwgeSkpLCBhdGFuMih5LCB4KSkpOw0KDQoJYXgg PSBmYWJzKHgpOw0KCWF5ID0gZmFicyh5KTsNCglpZiAoYXggPCBheSkgew0K CQl0ID0gYXg7DQoJCWF4ID0gYXk7DQoJCWF5ID0gdDsNCgl9DQoNCgkvKiBB dm9pZCBzcHVyaW91cyB1bmRlcmZsb3csIGFuZCByZWR1Y2UgaW5hY2N1cmFj aWVzIHdoZW4gYXggaXMgMS4gKi8NCglpZiAoYXggPT0gMSkgew0KCQlpZiAo YXkgPCAweDFwLTUxMSkNCgkJCXJldHVybiAoY3BhY2soKGF5ICogMC41KSAq IGF5LCBhdGFuMih5LCB4KSkpOw0KCQlyZXR1cm4gKGNwYWNrKGxvZzFwKGF5 ICogYXkpICogMC41LCBhdGFuMih5LCB4KSkpOw0KCX0NCg0KCS8qIEF2b2lk IHVuZGVyZmxvdyB3aGVuIGF4IGlzIG5vdCBzbWFsbC4gIEFsc28gaGFuZGxl IHplcm8gYXJncy4gKi8NCglpZiAoYXkgPCBEQkxfTUFYIC8gMHgxcDUzICYm IGF5ICogMHgxcDUzIDw9IGF4KQ0KCQlyZXR1cm4gKGNwYWNrKGxvZyhheCks IGF0YW4yKHksIHgpKSk7DQoNCgkvKiBBdm9pZCBvdmVyZmxvdy4gIEFsc28g aGFuZGxlIGluZmluaXRlIGFyZ3MuICovDQoJaWYgKGF4ID4gMHgxcDEwMjMp DQoJCXJldHVybiAoY3BhY2sobG9nKGh5cG90KHggKiAwLjUsIHkgKiAwLjUp KSArIGxuMl9sbyArIGxuMl9oaSwNCgkJICAgIGF0YW4yKHksIHgpKSk7DQoJ aWYgKGF4ID4gMHgxcDUxMSkNCgkJcmV0dXJuIChjcGFjayhsb2coaHlwb3Qo eCwgeSkpLCBhdGFuMih5LCB4KSkpOw0KDQoJLyogQXZvaWQgdW5kZXJmbG93 IHdoZW4gYXggaXMgc21hbGwuICovDQoJaWYgKGF4IDwgMHgxcC00NjgpDQoJ CXJldHVybiAoY3BhY2sobG9nKGh5cG90KHggKiAweDFwNDY4LCB5ICogMHgx cDQ2OCkpIC0NCgkJICAgIDQ2OCAqIGxuMl9sbyAtIDQ2OCAqIGxuMl9oaSwg YXRhbjIoeSwgeCkpKTsNCg0KCS8qIFRha2UgZXh0cmEgY2FyZSB3aGVuIHRo ZSByZWFsIHBhcnQgb2YgdGhlIHJlc3VsdCBpcyBuZWFyIDAuICovDQoJRVhU UkFDVF9XT1JEUyhoeCwgbHgsIGF4KTsNCgliaXRzID0gKCh1aW50NjRfdClo eCA8PCAzMikgfCBseDsNCgliaXRzICs9IDB4MDQwMDAwMDA7DQoJaHggPSBi aXRzID4+IDMyOw0KCWx4ID0gYml0cyAmIDB4ZjgwMDAwMDA7DQoJSU5TRVJU X1dPUkRTKHhoLCBoeCwgbHgpOw0KCXhsID0gYXggLSB4aDsNCglFWFRSQUNU X1dPUkRTKGh5LCBseSwgYXkpOw0KCWJpdHMgPSAoKHVpbnQ2NF90KWh5IDw8 IDMyKSB8IGx5Ow0KCWJpdHMgKz0gMHgwNDAwMDAwMDsNCgloeSA9IGJpdHMg Pj4gMzI7DQoJbHkgPSBiaXRzICYgMHhmODAwMDAwMDsNCglJTlNFUlRfV09S RFMoeWgsIGh5LCBseSk7DQoJeWwgPSBheSAtIHloOw0KDQoJeGgyID0geGgg KiB4aDsNCgl5aDIgPSB5aCAqIHloOw0KDQoJaCA9IGF4ICogYXggKyBheSAq IGF5Ow0KCXVzaW5nX2xvZzFwID0gMDsNCglpZiAoaCA8IDEgLSAweDFwLTUg fHwgaCA+IDEgKyAweDFwLTQpIHsNCgkJdDIgPSB5aDI7DQoJCXQ0ID0geGwg KiB4bCArIHlsICogeWwgKyAyICogKHhoICogeGwgKyB5aCAqIHlsKTsNCgkJ bm9ybSh0MiwgdDQpOw0KCQlzcGFkZCh0MiwgdDQsIHhoMik7DQoJfSBlbHNl IHsNCgkJaWYgKHhoMiA+PSAwLjUgJiYgeGgyIDw9IDIpIHsNCgkJCXVzaW5n X2xvZzFwID0gMTsNCgkJCXhoMiAtPSAxOw0KCQl9IGVsc2UgaWYgKHhoMiA+ PSAwLjI1ICYmIHhoMiA8IDAuNSAmJiB5aDIgPj0gMC4yNSkgew0KCQkJdXNp bmdfbG9nMXAgPSAxOw0KCQkJeGgyIC09IDAuNTsNCgkJCXloMiAtPSAwLjU7 DQoJCX0NCgkJdDIgPSB4bCAqIHhsOw0KCQl0MSA9IDIgKiB4aCAqIHhsOw0K CQlub3JtKHQxLCB0Mik7DQoJCXNwYWRkKHQxLCB0MiwgeGgyKTsNCgkJbm9y bSh0MSwgdDIpOw0KCQl0MyA9IHQxOw0KCQl0NCA9IHQyOw0KDQoJCXQxID0g MiAqIHloICogeWw7DQoJCXQyID0geWwgKiB5bDsNCgkJbm9ybSh0MSwgdDIp Ow0KCQlzcGFkZCh0MSwgdDIsIHloMik7DQoJCW5vcm0odDEsIHQyKTsNCg0K CQlub3JtKHQyLCB0NCk7DQoJCW5vcm0odDEsIHQzKTsNCg0KCQlzcGFkZCh0 MiwgdDQsIHQzKTsNCgkJc3BhZGQodDIsIHQ0LCB0MSk7DQoJfQ0KCWlmICgh dXNpbmdfbG9nMXAgJiYgaCA+IDAuNSAmJiBoIDwgMykgew0KCQl1c2luZ19s b2cxcCA9IDE7DQoJCXNwYWRkKHQyLCB0NCwgLTEpOw0KCX0NCglpZiAodXNp bmdfbG9nMXApDQoJCXJldHVybiAoY3BhY2sobG9nMXAodDIgKyB0NCkgKiAw LjUsIGF0YW4yKHksIHgpKSk7DQoJcmV0dXJuIChjcGFjayhsb2codDIgKyB0 NCkgKiAwLjUsIGF0YW4yKHksIHgpKSk7DQp9DQoNCnN0YXRpYyBjb25zdCBm bG9hdA0KbG4yZl9oaSA9ICA2LjkzMTQ1NzUxOTVlLTEsCQkvKiAgMHhiMTcy MDAuMHAtMjQgKi8NCmxuMmZfbG8gPSAgMS40Mjg2MDY3NjUzZS02OwkJLyog IDB4YmZiZThlLjBwLTQzICovDQoNCmZsb2F0IGNvbXBsZXgNCmNsb2dmKGZs b2F0IGNvbXBsZXggeikNCnsNCglkb3VibGVfdCBkaDsNCglmbG9hdF90IGgs IHQsIHQxLCB0MiwgdDMsIHQ0LCB4LCB5Ow0KCWZsb2F0X3QgYXgsIGF5LCB4 aCwgeGgyLCB4bCwgeWgsIHloMiwgeWw7DQoJdWludDMyX3QgaHgsIGh5Ow0K CWludCB1c2luZ19sb2cxcDsNCg0KCXggPSBjcmVhbGYoeik7DQoJeSA9IGNp bWFnZih6KTsNCg0KCS8qIEhhbmRsZSBOYU5zIHVzaW5nIHRoZSBnZW5lcmFs IGZvcm11bGEgdG8gbWl4IHRoZW0gcmlnaHQuICovDQoJaWYgKHggIT0geCB8 fCB5ICE9IHkpDQoJCXJldHVybiAoY3BhY2tmKGxvZ2YoaHlwb3RmKHgsIHkp KSwgYXRhbjJmKHksIHgpKSk7DQoNCglheCA9IGZhYnNmKHgpOw0KCWF5ID0g ZmFic2YoeSk7DQoJaWYgKGF4IDwgYXkpIHsNCgkJdCA9IGF4Ow0KCQlheCA9 IGF5Ow0KCQlheSA9IHQ7DQoJfQ0KDQoJLyogQXZvaWQgc3B1cmlvdXMgdW5k ZXJmbG93LCBhbmQgcmVkdWNlIGluYWNjdXJhY2llcyB3aGVuIGF4IGlzIDEu ICovDQoJaWYgKGF4ID09IDEpIHsNCgkJaWYgKGF5IDwgMHgxcC02M0YpDQoJ CQlyZXR1cm4gKGNwYWNrZigoYXkgKiAwLjVGKSAqIGF5LCBhdGFuMmYoeSwg eCkpKTsNCgkJcmV0dXJuIChjcGFja2YobG9nMXBmKGF5ICogYXkpICogMC41 RiwgYXRhbjJmKHksIHgpKSk7DQoJfQ0KDQoJLyogQXZvaWQgdW5kZXJmbG93 IHdoZW4gYXggaXMgbm90IHNtYWxsLiAgQWxzbyBoYW5kbGUgemVybyBhcmdz LiAqLw0KCWlmIChheSA8IEZMVF9NQVggLyAweDFwMjRGICYmIGF5ICogMHgx cDI0RiA8PSBheCkNCgkJcmV0dXJuIChjcGFja2YobG9nZihheCksIGF0YW4y Zih5LCB4KSkpOw0KDQoJLyogQXZvaWQgb3ZlcmZsb3cuICBBbHNvIGhhbmRs ZSBpbmZpbml0ZSBhcmdzLiAqLw0KCWlmIChheCA+IDB4MXAxMjdGKQ0KCQly ZXR1cm4gKGNwYWNrZigNCgkJICAgIGxvZ2YoaHlwb3RmKHggKiAwLjVGLCB5 ICogMC41RikpICsgbG4yZl9sbyArIGxuMmZfaGksDQoJCSAgICBhdGFuMmYo eSwgeCkpKTsNCglpZiAoYXggPiAweDFwNjNGKQ0KCQlyZXR1cm4gKGNwYWNr Zihsb2dmKGh5cG90Zih4LCB5KSksIGF0YW4yZih5LCB4KSkpOw0KDQoJLyog QXZvaWQgdW5kZXJmbG93IHdoZW4gYXggaXMgc21hbGwuICovDQoJaWYgKGF5 IDwgMHgxcC0zOUYpDQoJCXJldHVybiAoY3BhY2tmKGxvZ2YoaHlwb3RmKHgg KiAweDFwMzlGLCB5ICogMHgxcDM5RikpIC0NCgkJICAgIDM5ICogbG4yZl9s byAtIDM5ICogbG4yZl9oaSwgYXRhbjJmKHksIHgpKSk7DQoNCgkvKiBUYWtl IGV4dHJhIGNhcmUgd2hlbiB0aGUgcmVhbCBwYXJ0IG9mIHRoZSByZXN1bHQg aXMgbmVhciAwLiAqLw0KI2lmIDENCglHRVRfRkxPQVRfV09SRChoeCwgYXgp Ow0KCVNFVF9GTE9BVF9XT1JEKHhoLCBoeCAmIDB4ZmZmZmYwMDApOw0KCXhs ID0gYXggLSB4aDsNCglHRVRfRkxPQVRfV09SRChoeSwgYXkpOw0KCVNFVF9G TE9BVF9XT1JEKHloLCBoeSAmIDB4ZmZmZmYwMDApOw0KCXlsID0gYXkgLSB5 aDsNCg0KCXhoMiA9IHhoICogeGg7DQoJeWgyID0geWggKiB5aDsNCg0KCWgg PSBheCAqIGF4ICsgYXkgKiBheTsNCgl1c2luZ19sb2cxcCA9IDA7DQoJaWYg KGggPCAoZmxvYXQpKDEgLSAweDFwLTUpIHx8IGggPiAoZmxvYXQpKDEgKyAw eDFwLTQpKSB7DQoJCXQyID0geWgyOw0KCQl0NCA9IHhsICogeGwgKyB5bCAq IHlsICsgMiAqICh4aCAqIHhsICsgeWggKiB5bCk7DQoJCW5vcm0odDIsIHQ0 KTsNCgkJc3BhZGQodDIsIHQ0LCB4aDIpOw0KCX0gZWxzZSB7DQoJCWlmICh4 aDIgPj0gMC41RiAmJiB4aDIgPD0gMikgew0KCQkJdXNpbmdfbG9nMXAgPSAx Ow0KCQkJeGgyIC09IDE7DQoJCX0gZWxzZSBpZiAoeGgyID49IDAuMjVGICYm IHhoMiA8IDAuNUYgJiYgeWgyID49IDAuMjVGKSB7DQoJCQl1c2luZ19sb2cx cCA9IDE7DQoJCQl4aDIgLT0gMC41RjsNCgkJCXloMiAtPSAwLjVGOw0KCQl9 DQoJCXQyID0geGwgKiB4bDsNCgkJdDEgPSAyICogeGggKiB4bDsNCgkJbm9y bSh0MSwgdDIpOw0KCQlzcGFkZCh0MSwgdDIsIHhoMik7DQoJCW5vcm0odDEs IHQyKTsNCgkJdDMgPSB0MTsNCgkJdDQgPSB0MjsNCg0KCQl0MSA9IDIgKiB5 aCAqIHlsOw0KCQl0MiA9IHlsICogeWw7DQoJCW5vcm0odDEsIHQyKTsNCgkJ c3BhZGQodDEsIHQyLCB5aDIpOw0KCQlub3JtKHQxLCB0Mik7DQoNCgkJbm9y bSh0MiwgdDQpOw0KCQlub3JtKHQxLCB0Myk7DQoNCgkJc3BhZGQodDIsIHQ0 LCB0Myk7DQoJCXNwYWRkKHQyLCB0NCwgdDEpOw0KCX0NCglpZiAoIXVzaW5n X2xvZzFwICYmIGggPiAwLjVGICYmIGggPCAzKSB7DQoJCXVzaW5nX2xvZzFw ID0gMTsNCgkJc3BhZGQodDIsIHQ0LCAtMSk7DQoJfQ0KCWlmICh1c2luZ19s b2cxcCkNCgkJcmV0dXJuIChjcGFja2YobG9nMXBmKHQyICsgdDQpICogMC41 RiwgYXRhbjJmKHksIHgpKSk7DQoJcmV0dXJuIChjcGFja2YobG9nZih0MiAr IHQ0KSAqIDAuNUYsIGF0YW4yZih5LCB4KSkpOw0KI2Vsc2UNCglkaCA9IChk b3VibGVfdClheCAqIGF4ICsgKGRvdWJsZV90KWF5ICogYXk7DQoJaWYgKGRo ID4gMC41ICYmIGRoIDwgMykgew0KCQlkaCA9IChkb3VibGVfdClheCAqIGF4 IC0gMSArIChkb3VibGVfdClheSAqIGF5Ow0KCQlyZXR1cm4gKGNwYWNrZihs b2cxcGYoZGgpICogMC41RiwgYXRhbjJmKHksIHgpKSk7DQoJfQ0KCXJldHVy biAoY3BhY2tmKGxvZ2YoZGgpICogMC41RiwgYXRhbjJmKHksIHgpKSk7DQoj ZW5kaWYNCn0NCg0KbG9uZyBkb3VibGUgY29tcGxleA0KY2xvZ2wobG9uZyBk b3VibGUgY29tcGxleCB6KQ0Kew0KCWxvbmcgZG91YmxlIGgsIHQsIHQxLCB0 MiwgdDMsIHQ0LCB4LCB5Ow0KCWxvbmcgZG91YmxlIGF4LCBheSwgeGgsIHho MiwgeGwsIHloLCB5aDIsIHlsOw0KCWludCB1c2luZ19sb2cxcDsNCg0KCXgg PSBjcmVhbGwoeik7DQoJeSA9IGNpbWFnbCh6KTsNCg0KCS8qIEhhbmRsZSBO YU5zIHVzaW5nIHRoZSBnZW5lcmFsIGZvcm11bGEgdG8gbWl4IHRoZW0gcmln aHQuICovDQoJaWYgKHggIT0geCB8fCB5ICE9IHkpDQoJCXJldHVybiAoY3Bh Y2tsKGxvZ2woaHlwb3RsKHgsIHkpKSwgYXRhbjJsKHksIHgpKSk7DQoNCglh eCA9IGZhYnNsKHgpOw0KCWF5ID0gZmFic2woeSk7DQoJaWYgKGF4IDwgYXkp IHsNCgkJdCA9IGF4Ow0KCQlheCA9IGF5Ow0KCQlheSA9IHQ7DQoJfQ0KDQoJ LyogQXZvaWQgc3B1cmlvdXMgdW5kZXJmbG93LCBhbmQgcmVkdWNlIGluYWNj dXJhY2llcyB3aGVuIGF4IGlzIDEuICovDQoJaWYgKGF4ID09IDEpIHsNCgkJ aWYgKGF5IDwgMHgxcC04MTkxTCkNCgkJCXJldHVybiAoY3BhY2tsKChheSAq IDAuNSkgKiBheSwgYXRhbjJsKHksIHgpKSk7DQoJCXJldHVybiAoY3BhY2ts KGxvZzFwbChheSAqIGF5KSAqIDAuNSwgYXRhbjJsKHksIHgpKSk7DQoJfQ0K DQoJLyogQXZvaWQgdW5kZXJmbG93IHdoZW4gYXggaXMgbm90IHNtYWxsLiAg QWxzbyBoYW5kbGUgemVybyBhcmdzLiAqLw0KCWlmIChheSA8IExEQkxfTUFY IC8gMHgxcDY0ICYmIGF5ICogMHgxcDY0IDw9IGF4KQ0KCQlyZXR1cm4gKGNw YWNrbChsb2dsKGF4KSwgYXRhbjJsKHksIHgpKSk7DQoNCgkvKiBBdm9pZCBv dmVyZmxvdy4gIEFsc28gaGFuZGxlIGluZmluaXRlIGFyZ3MuICovDQoJaWYg KGF4ID4gMHgxcDE2MzgzTCkNCgkJcmV0dXJuIChjcGFja2wobG9nbChoeXBv dGwoeCAqIDAuNSwgeSAqIDAuNSkpICsgbG4yX2xvICsgbG4yX2hpLA0KCQkg ICAgYXRhbjJsKHksIHgpKSk7DQoJaWYgKGF4ID4gMHgxcDgxOTEpDQoJCXJl dHVybiAoY3BhY2tsKGxvZ2woaHlwb3RsKHgsIHkpKSwgYXRhbjJsKHksIHgp KSk7DQoNCgkvKiBBdm9pZCB1bmRlcmZsb3cgd2hlbiBheCBpcyBzbWFsbC4g Ki8NCglpZiAoYXggPCAweDFwLTgxMjdMKQ0KCQlyZXR1cm4gKGNwYWNrbChs b2dsKGh5cG90bCh4ICogMHgxcDgxMjdMLCB5ICogMHgxcDgxMjdMKSkgLQ0K CQkgICAgODEyNyAqIGxuMl9sbyAtIDgxMjcgKiBsbjJfaGksIGF0YW4ybCh5 LCB4KSkpOw0KDQoJLyogVGFrZSBleHRyYSBjYXJlIHdoZW4gdGhlIHJlYWwg cGFydCBvZiB0aGUgcmVzdWx0IGlzIG5lYXIgMC4gKi8NCgl4aCA9IGF4ICsg MHgxcDMyIC0gMHgxcDMyOw0KCXhsID0gYXggLSB4aDsNCgl5aCA9IGF5ICsg MHgxcDMyIC0gMHgxcDMyOw0KCXlsID0gYXkgLSB5aDsNCg0KCXhoMiA9IHho ICogeGg7DQoJeWgyID0geWggKiB5aDsNCg0KCWggPSBheCAqIGF4ICsgYXkg KiBheTsNCgl1c2luZ19sb2cxcCA9IDA7DQoJaWYgKGggPCAxIC0gMHgxcC01 IHx8IGggPiAxICsgMHgxcC00KSB7DQoJCXQyID0geWgyOw0KCQl0NCA9IHhs ICogeGwgKyB5bCAqIHlsICsgMiAqICh4aCAqIHhsICsgeWggKiB5bCk7DQoJ CW5vcm0odDIsIHQ0KTsNCgkJc3BhZGQodDIsIHQ0LCB4aDIpOw0KCX0gZWxz ZSB7DQoJCWlmICh4aDIgPj0gMC41ICYmIHhoMiA8PSAyKSB7DQoJCQl1c2lu Z19sb2cxcCA9IDE7DQoJCQl4aDIgLT0gMTsNCgkJfSBlbHNlIGlmICh4aDIg Pj0gMC4yNSAmJiB4aDIgPCAwLjUgJiYgeWgyID49IDAuMjUpIHsNCgkJCXVz aW5nX2xvZzFwID0gMTsNCgkJCXhoMiAtPSAwLjU7DQoJCQl5aDIgLT0gMC41 Ow0KCQl9DQoJCXQyID0geGwgKiB4bDsNCgkJdDEgPSAyICogeGggKiB4bDsN CgkJbm9ybSh0MSwgdDIpOw0KCQlzcGFkZCh0MSwgdDIsIHhoMik7DQoJCW5v cm0odDEsIHQyKTsNCgkJdDMgPSB0MTsNCgkJdDQgPSB0MjsNCg0KCQl0MSA9 IDIgKiB5aCAqIHlsOw0KCQl0MiA9IHlsICogeWw7DQoJCW5vcm0odDEsIHQy KTsNCgkJc3BhZGQodDEsIHQyLCB5aDIpOw0KCQlub3JtKHQxLCB0Mik7DQoN CgkJbm9ybSh0MiwgdDQpOw0KCQlub3JtKHQxLCB0Myk7DQoNCgkJc3BhZGQo dDIsIHQ0LCB0Myk7DQoJCXNwYWRkKHQyLCB0NCwgdDEpOw0KCX0NCglpZiAo IXVzaW5nX2xvZzFwICYmIGggPiAwLjVGICYmIGggPCAzKSB7DQoJCXVzaW5n X2xvZzFwID0gMTsNCgkJc3BhZGQodDIsIHQ0LCAtMSk7DQoJfQ0KCWlmICh1 c2luZ19sb2cxcCkNCgkJcmV0dXJuIChjcGFja2wobG9nMXBsKHQyICsgdDQp ICogMC41LCBhdGFuMmwoeSwgeCkpKTsNCglyZXR1cm4gKGNwYWNrbChsb2ds KHQyICsgdDQpICogMC41LCBhdGFuMmwoeSwgeCkpKTsNCn0NCg== --0-1856759337-1344113148=:3101--