From owner-freebsd-numerics@FreeBSD.ORG Fri Mar 13 12:16:57 2015 Return-Path: Delivered-To: freebsd-numerics@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 5EFBF5BE for ; Fri, 13 Mar 2015 12:16:57 +0000 (UTC) Received: from mail109.syd.optusnet.com.au (mail109.syd.optusnet.com.au [211.29.132.80]) by mx1.freebsd.org (Postfix) with ESMTP id E4B5DECC for ; Fri, 13 Mar 2015 12:16:56 +0000 (UTC) Received: from c211-30-166-197.carlnfd1.nsw.optusnet.com.au (c211-30-166-197.carlnfd1.nsw.optusnet.com.au [211.30.166.197]) by mail109.syd.optusnet.com.au (Postfix) with ESMTPS id 30DF2D6372E; Fri, 13 Mar 2015 23:16:42 +1100 (AEDT) Date: Fri, 13 Mar 2015 23:16:41 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Bruce Evans Subject: Re: libm functions sensitive to current rounding mode In-Reply-To: <20150313192625.B885@besplex.bde.org> Message-ID: <20150313224213.W2994@besplex.bde.org> References: <20150313192625.B885@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=Za4kaKlA c=1 sm=1 tr=0 a=KA6XNC2GZCFrdESI5ZmdjQ==:117 a=PO7r1zJSAAAA:8 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=CCpqsmhAAAAA:8 a=04n_wW2SkaRwE1C0LGgA:9 a=CjuIK1q_8ugA:10 Cc: enh , freebsd-numerics@freebsd.org X-BeenThere: freebsd-numerics@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: "Discussions of high quality implementation of libm functions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Mar 2015 12:16:57 -0000 On Fri, 13 Mar 2015, Bruce Evans wrote: > On Thu, 12 Mar 2015, enh via freebsd-numerics wrote: > ... > - ... but FreeBSD software sin has an enormous error for DOWNWARDS on > 1.570796 -- it rounds in the wrong direction by 0x11553 ulps :-(. > About 1 Mulp (1 Mulp = 1 mega-ulp). > > >> (glibc's fixed bugs here in the past. see >> https://sourceware.org/bugzilla/show_bug.cgi?id=3976 for example.) > > This is interesting. Vincent L certainly knows what he is doing. As > stated in the thread, the the library can do anything it wants in > non-default rounding modes provided it documents this. It apparently > uses the bad fix of changing the rounding mode to FE_TONEAREST. This > is slow and guarantees an error of 1 ulp in most cases, so to avoid > much larger errors in hopefully most cases. > > Long ago, I all of the most common functions for such errors, > exhaustively for float functions, and didn't find any of more than an > ulp or 2. I might have only done this on i386. i386 protects against > such errors in float precision by evaluating intermediate results in > higher precision. In non-FreeBSD, the default rounding precision is > not double, so i386 should also protect agains such errors in double > precision. Special cases like multiples of Pi/2 are too hard to find > without special tests except in float precision. A quick rerun of the test finds problems in much the same areas as the glibc thread. I must have put off fixing this since most functions work OK and there are larger bugs to fix (like the multi-Pulp errors given by using the i387). Test of most float functions relative to double, on i386, testing 1/256 of possible values. X acos: max_er = 0x129b24c8 0.5814, avg_er = 0.170, #>=1:0.5 = 0:2724 X acos: max_er = 0x228da264 1.0797, avg_er = 0.299, #>=1:0.5 = 1334:7453322 X acos: max_er = 0x2266570c 1.0750, avg_er = 0.197, #>=1:0.5 = 1324:869716 X acos: max_er = 0x228da264 1.0797, avg_er = 0.299, #>=1:0.5 = 1334:7453322 The 4 cases are FP_RN, FP_RZ, RP_RP, FP_RM (ieeefp has better names). The errors are given in bits, then in ulps. With perfect rounding, max_er would be at most 0x0fffffff (0.5- ulps) for FP_RN and 0x1fffffff (1.0- ulps) for the other rounding modes. This is acheived mainly by sqrtf. Less than 0.5 ulps more than that is OK. A few functions (e.g., cos) try to get less than 0.01 or 0.1 ulps more, and some functions should be exact. X acosh: max_er = 0x2bf776b9 1.3740, avg_er = 0.063, #>=1:0.5 = 62:29183 X acosh: max_er = 0x517afdff 2.5462, avg_er = 0.128, #>=1:0.5 = 57310:2149989 X acosh: max_er = 0x619b49f6 3.0503, avg_er = 0.129, #>=1:0.5 = 60042:2153013 X acosh: max_er = 0x517afdff 2.5462, avg_er = 0.128, #>=1:0.5 = 57310:2149989 X asin: max_er = 0x1359931a 0.6047, avg_er = 0.012, #>=1:0.5 = 0:12924 X asin: max_er = 0x22938df5 1.0805, avg_er = 0.022, #>=1:0.5 = 3592:367188 X asin: max_er = 0x2459bbfe 1.1360, avg_er = 0.024, #>=1:0.5 = 6373:393217 X asin: max_er = 0x23dc2d74 1.1206, avg_er = 0.023, #>=1:0.5 = 6325:393217 X asinh: max_er = 0x2f6f84c3 1.4824, avg_er = 0.141, #>=1:0.5 = 2644:244920 X asinh: max_er = 0x5d1b55c0 2.9095, avg_er = 0.301, #>=1:0.5 = 464816:4941040 X asinh: max_er = 0x5ee3fd1a 2.9654, avg_er = 0.301, #>=1:0.5 = 464616:4938660 X asinh: max_er = 0x5d1b55c0 2.9095, avg_er = 0.301, #>=1:0.5 = 464816:4941040 X atan: max_er = 0x1089602a 0.5168, avg_er = 0.185, #>=1:0.5 = 0:2956 X atan: max_er = 0x208b8696 1.0170, avg_er = 0.325, #>=1:0.5 = 936:7876354 X atan: max_er = 0x208b8696 1.0171, avg_er = 0.275, #>=1:0.5 = 1439:4587521 X atan: max_er = 0x208b8696 1.0170, avg_er = 0.274, #>=1:0.5 = 1439:4587521 X atanh: max_er = 0x2e56025d 1.4480, avg_er = 0.016, #>=1:0.5 = 2584:180474 X atanh: max_er = 0x5e75b4c4 2.9518, avg_er = 0.045, #>=1:0.5 = 359898:667576 X atanh: max_er = 0x5e325a52 2.9437, avg_er = 0.032, #>=1:0.5 = 180535:437158 X atanh: max_er = 0x5e75b4c4 2.9518, avg_er = 0.031, #>=1:0.5 = 182422:434622 X cbrt: max_er = 0xffffec5 0.5000, avg_er = 0.250, #>=1:0.5 = 0:0 X cbrt: max_er = 0x1fffffff 0.9999, avg_er = 0.497, #>=1:0.5 = 1724:8326990 X cbrt: max_er = 0x1fffffff 1.0000, avg_er = 0.499, #>=1:0.5 = 1636:8355750 X cbrt: max_er = 0x1fffffff 0.9999, avg_er = 0.498, #>=1:0.5 = 1380:8355494 X ceil: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X ceil: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X ceil: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X ceil: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 The above errors are a bit larger than for the default FP_RN, but not too bad. X cos: max_er = 0x10074700 0.5009, avg_er = 0.138, #>=1:0.5 = 0:2512 X cos: max_er = 0x1bb65457418000 14529186.7267, avg_er = 43.479, #>=1:0.5 = 458712:4761870 X cos: max_er = 0x1bb65477410000 14529187.7267, avg_er = 43.508, #>=1:0.5 = 525970:4867436 X cos: max_er = 0x1bb65457418000 14529186.7267, avg_er = 43.486, #>=1:0.5 = 488650:4793876 The algorithm breaks down for cosf. This is with my i386 system not using the i387 for sin like -current still does (neither uses it for sinf). Otherwise these errors of a mere 14 Mulps would be masked by errors of 17 Gulps. That is 17 Gulps of error in 24-bit precision from the reference function. Multiply by 2**29 for the error in double precision. X cosh: max_er = 0x104af206 0.5091, avg_er = 0.019, #>=1:0.5 = 0:2180 X cosh: max_er = 0x380000001fffffff 7516192768.9999, avg_er = 809.671, #>=1:0.5 = 8088592:8619926 X cosh: max_er = 0x1d9252a8f 14.7858, avg_er = 0.516, #>=1:0.5 = 6750688:8175486 X cosh: max_er = 0x380000001fffffff 7516192768.9999, avg_er = 809.671, #>=1:0.5 = 8088592:8619926 The algorithm breaks down for cosh, presumably due to its breakdown for exp. This is with my i386 system not using the i387 for exp like -current still does (neither uses it for expf). The glibc thread says that the 32-bit exp worked but the 64-bit exp didn't work. This would be explained by algorithmic errors in the non-i387 version only, and glibc using the i387 in 32-bit mode only for the same bad/historical reasons as FreeBSD. X erf: max_er = 0x14faf158 0.6556, avg_er = 0.126, #>=1:0.5 = 0:109523 X erf: max_er = 0x4c3e1ab7 2.3825, avg_er = 0.260, #>=1:0.5 = 153763:4375088 X erf: max_er = 0x4041d2d3 2.0081, avg_er = 0.253, #>=1:0.5 = 67182:4227700 X erf: max_er = 0x4a70f434 2.3262, avg_er = 0.252, #>=1:0.5 = 78494:4227994 X exp: max_er = 0x1047a00a 0.5087, avg_er = 0.033, #>=1:0.5 = 0:2032 X exp: max_er = 0x3800000020000000 7516192769.0000, avg_er = 1343.574, #>=1:0.5 = 7259405:8747670 X exp: max_er = 0x39a138016 28.8149, avg_er = 1.162, #>=1:0.5 = 7256189:8776222 X exp: max_er = 0x3800000020000000 7516192769.0000, avg_er = 1343.574, #>=1:0.5 = 7259405:8747670 X exp2: max_er = 0x1006a936 0.5008, avg_er = 0.033, #>=1:0.5 = 0:581 X exp2: max_er = 0x380000001fffffff 7516192768.9999, avg_er = 1056.147, #>=1:0.5 = 6412147:8351665 X exp2: max_er = 0x20669100 1.0126, avg_er = 0.500, #>=1:0.5 = 6429089:8371868 X exp2: max_er = 0x380000001fffffff 7516192768.9999, avg_er = 1056.147, #>=1:0.5 = 6412147:8351665 X expm1: max_er = 0x105e998d 0.5115, avg_er = 0.031, #>=1:0.5 = 0:508 X expm1: max_er = 0x380000001fffffff 7516192768.9999, avg_er = 1342.318, #>=1:0.5 = 3985247:4976979 X expm1: max_er = 0x231f0351 1.0976, avg_er = 0.061, #>=1:0.5 = 1888:1014283 X expm1: max_er = 0x380000001fffffff 7516192768.9999, avg_er = 1342.317, #>=1:0.5 = 3986102:4957995 All the functions that are exp*f or use exp*f have problems, except exp*f in RP mode. coshf has lesser problems in RP mode. Its errors are then large but not enormous. Probably a small error in expf turns into a larger one after addition. (My coshf uses an expf kernel like coshl uses an expl kernel in -current.) X fabs: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X fabs: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X fabs: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X fabs: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X floor: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X floor: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X floor: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X floor: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X j0: max_er = 0x2f51e6f4f1c 6056.9511, avg_er = 0.390, #>=1:0.5 = 2401634:5079654 X j0: max_er = 0x109227b28a731a 8687933.5794, avg_er = 17.443, #>=1:0.5 = 6602082:9050730 X j0: max_er = 0x80108bbf4ba8d7e8 17188543994.3644, avg_er = 29.127, #>=1:0.5 = 5773002:7488366 X j0: max_er = 0x109227b28a731a 8687933.5794, avg_er = 17.492, #>=1:0.5 = 6521176:9046266 X j1: max_er = 0x1b08a6a7544 3460.3255, avg_er = 0.395, #>=1:0.5 = 2415186:5077432 X j1: max_er = 0x800eac41530e07cc 17187561994.5954, avg_er = 28.930, #>=1:0.5 = 4387856:7844188 X j1: max_er = 0x64e89d1202af4 3306574.5352, avg_er = 16.048, #>=1:0.5 = 6010996:7994846 X j1: max_er = 0x800eac41530e07ca 17187561994.5954, avg_er = 29.129, #>=1:0.5 = 5858116:8158408 X lgamma:max_er = 0x1d957c2a8000 60587.8802, avg_er = 0.232, #>=1:0.5 = 140016:899042 X lgamma:max_er = 0x65ee2cf0b75af4 53440871.5223, avg_er = 1682.939, #>=1:0.5 = 1398802:7250814 X lgamma:max_er = 0xd6b9b6e60000 439757.7156, avg_er = 0.476, #>=1:0.5 = 920256:6777199 X lgamma:max_er = 0x65ee2cf0b75af4 53440871.5223, avg_er = 1682.914, #>=1:0.5 = 1097301:6940713 Besel and gamma functions have large errors even in RN. These get worse in other rounding modes. X log: max_er = 0xfffffe6 0.5000, avg_er = 0.124, #>=1:0.5 = 0:0 X log: max_er = 0x1fffffdd 0.9999, avg_er = 0.249, #>=1:0.5 = 0:4178615 X log: max_er = 0x1fffffdd 1.0000, avg_er = 0.249, #>=1:0.5 = 0:4177376 X log: max_er = 0x1fffff83 0.9999, avg_er = 0.249, #>=1:0.5 = 0:4178462 X log10: max_er = 0xfffff6d 0.5000, avg_er = 0.125, #>=1:0.5 = 0:0 X log10: max_er = 0x1fffffff 0.9999, avg_er = 0.249, #>=1:0.5 = 6:4177256 X log10: max_er = 0x1fffffff 1.0000, avg_er = 0.250, #>=1:0.5 = 6:4177785 X log10: max_er = 0x1fffffff 0.9999, avg_er = 0.249, #>=1:0.5 = 6:4178059 X log1p: max_er = 0x10288a73 0.5049, avg_er = 0.088, #>=1:0.5 = 0:604 X log1p: max_er = 0x2028aaef 1.0049, avg_er = 0.174, #>=1:0.5 = 341:2883620 X log1p: max_er = 0x2025c234 1.0047, avg_er = 0.175, #>=1:0.5 = 251:2896319 X log1p: max_er = 0x2028aaef 1.0049, avg_er = 0.173, #>=1:0.5 = 300:2870804 X log2: max_er = 0x100128e0 0.5001, avg_er = 0.125, #>=1:0.5 = 0:17 X log2: max_er = 0x20002e79 1.0000, avg_er = 0.249, #>=1:0.5 = 5:4177477 X log2: max_er = 0x20005cf1 1.0001, avg_er = 0.250, #>=1:0.5 = 10:4187131 X log2: max_er = 0x20011f0c 1.0001, avg_er = 0.248, #>=1:0.5 = 1:4168439 log and log10 still use the i387, for the test and reference function, so are deceptively consistent/accurate. The others use software that calls the hardware log. Apparently the software has no algorithmic problems. X logb: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X logb: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X logb: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X logb: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X rint: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X rint: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X rint: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X rint: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X round: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X round: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X round: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X round: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X significand:max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X significand:max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X significand:max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X significand:max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X sin: max_er = 0x10073f96 0.5009, avg_er = 0.137, #>=1:0.5 = 0:2436 X sin: max_er = 0x117c97fcf3c000 9168063.9047, avg_er = 39.254, #>=1:0.5 = 457300:4795627 X sin: max_er = 0x117c981cf34000 9168064.9047, avg_er = 39.281, #>=1:0.5 = 524127:4853666 X sin: max_er = 0x117c97fcf3c000 9168063.9047, avg_er = 39.260, #>=1:0.5 = 486966:4806289 X sinh: max_er = 0x10431a04 0.5082, avg_er = 0.019, #>=1:0.5 = 0:1734 X sinh: max_er = 0x380000001fffffff 7516192768.9999, avg_er = 809.674, #>=1:0.5 = 8089378:8598420 X sinh: max_er = 0x380000001fffffff 7516192769.0000, avg_er = 1428.864, #>=1:0.5 = 4108592:4646473 X sinh: max_er = 0x380000001fffffff 7516192768.9999, avg_er = 1428.893, #>=1:0.5 = 4123299:4654026 X sqrt: max_er = 0xffffc47 0.5000, avg_er = 0.124, #>=1:0.5 = 0:0 X sqrt: max_er = 0x1ffffcca 0.9999, avg_er = 0.247, #>=1:0.5 = 0:4147423 X sqrt: max_er = 0x1fffd9d7 1.0000, avg_er = 0.250, #>=1:0.5 = 0:4192033 X sqrt: max_er = 0x1ffffcca 0.9999, avg_er = 0.247, #>=1:0.5 = 0:4147423 X tan: max_er = 0x1997ff78 0.7998, avg_er = 0.150, #>=1:0.5 = 0:1188178 X tan: max_er = 0xe0f2b5b350a325 117937581.6035, avg_er = 1715.855, #>=1:0.5 = 1378705:5016746 X tan: max_er = 0xe0f2b5b35198ca 117937581.6038, avg_er = 1715.881, #>=1:0.5 = 1385644:5030977 X tan: max_er = 0xe0f2b5b35479ad 117937581.6040, avg_er = 1715.880, #>=1:0.5 = 1385644:5030977 X tanh: max_er = 0x11710b6d 0.5450, avg_er = 0.016, #>=1:0.5 = 0:5702 X tanh: max_er = 0x349ef98c7 26.3104, avg_er = 0.981, #>=1:0.5 = 14807030:16230146 X tanh: max_er = 0x367cb0c6c 27.2436, avg_er = 0.272, #>=1:0.5 = 3378729:4287978 X tanh: max_er = 0x369ef98c7 27.3104, avg_er = 0.516, #>=1:0.5 = 7436992:8380189 X trunc: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X trunc: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X trunc: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X trunc: max_er = 0 0.0000, avg_er = 0.000, #>=1:0.5 = 0:0 X y0: max_er = 0x830232be000 16769.0991, avg_er = 0.315, #>=1:0.5 = 1356271:4560040 X y0: max_er = 0x1a77ca8c125e1f 13876820.3772, avg_er = 10.160, #>=1:0.5 = 5678376:7876031 X y0: max_er = 0x800c3393437ab82d 17186266266.1088, avg_er = 1038.278, #>=1:0.5 = 4674471:7301111 X y0: max_er = 0x1a77cacc125e17 13876822.3772, avg_er = 9.955, #>=1:0.5 = 3116737:4399584 X y1: max_er = 0x859fa2773502b 4378577.2328, avg_er = 1.307, #>=1:0.5 = 1472099:4660542 X y1: max_er = 0xc45f308dc9c883 102955396.4308, avg_er = 1577.846, #>=1:0.5 = 4247855:7239448 X y1: max_er = 0xc45f308dc9c883 102955396.4309, avg_er = 1572.738, #>=1:0.5 = 4781943:7380419 X y1: max_er = 0x19ddad63a68667 13561195.1140, avg_er = 14.622, #>=1:0.5 = 2746030:3875201 The other trig, hyperbolic and Bessel functions are consistently broken. pow() is probably broken like exp(). My quick test didn't reach it. It is mentioned in the glibc thread. The glibc fixes for exp seem to be more subtle than clobbering the rounding mode. Bruce