From owner-svn-src-head@freebsd.org Mon Nov 19 14:40:18 2018 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 165F111054D8; Mon, 19 Nov 2018 14:40:18 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au [211.29.132.42]) by mx1.freebsd.org (Postfix) with ESMTP id 4ED8477A3A; Mon, 19 Nov 2018 14:40:11 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id C8B193D6534; Tue, 20 Nov 2018 01:40:00 +1100 (AEDT) Date: Tue, 20 Nov 2018 01:39:59 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Warner Losh cc: Andriy Gapon , "Rodney W. Grimes" , Allan Jude , Warner Losh , src-committers , svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r340450 - head/sys/sys In-Reply-To: Message-ID: <20181120005944.B1050@besplex.bde.org> References: <201811190104.wAJ14CaE059062@pdx.rh.CN85.dnsmgr.net> <5e227743-6463-d395-f2ba-da8d4ba248ca@FreeBSD.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=FNpr/6gs c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=6I5d2MoRAAAA:8 a=LIQbiBchzmXMh4q9X1oA:9 a=Qyu579rOfU3iaBEE:21 a=csjIOMLcv_6KWWZE:21 a=CjuIK1q_8ugA:10 a=IjZwj45LgO3ly-622nXo:22 X-Rspamd-Queue-Id: 4ED8477A3A X-Spamd-Result: default: False [-0.86 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.83)[-0.827,0]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:211.29.132.0/23]; FREEMAIL_FROM(0.00)[optusnet.com.au]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[optusnet.com.au]; NEURAL_SPAM_SHORT(0.28)[0.285,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; MX_GOOD(-0.01)[extmail.optusnet.com.au]; RCPT_COUNT_SEVEN(0.00)[8]; IP_SCORE(-0.01)[country: AU(-0.04)]; RCVD_NO_TLS_LAST(0.10)[]; RCVD_IN_DNSWL_LOW(-0.10)[42.132.29.211.list.dnswl.org : 127.0.5.1]; R_DKIM_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[optusnet.com.au]; ASN(0.00)[asn:4804, ipnet:211.28.0.0/14, country:AU]; RCVD_COUNT_TWO(0.00)[2]; FROM_EQ_ENVFROM(0.00)[] X-Rspamd-Server: mx1.freebsd.org X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Nov 2018 14:40:18 -0000 On Mon, 19 Nov 2018, Warner Losh wrote: > On Mon, Nov 19, 2018 at 12:31 AM Andriy Gapon wrote: > >> On 19/11/2018 03:38, Warner Losh wrote: >>> I'll talk to Allan to see if he can test that. the bare 1 should be >> handled >>> properly because of C's promotion rules. 1ull << 32 is an unsigned long >> long. >>> What I really wanted was "~(uint32_t)0" but that construct has bit me in >> the past. Using the long long abomination is only one of the bugs. Not a critical one. The critical one in this constant is that it is garbage. The correct constant is 2**64 divided by a power of 10. For nanoseconds, the power of 10 os 10**9, and 2**64/10**9 resembles the garbage constant. It is 4.29 times larger. This 4.29 rounded down is SBT_1NS, and it also works to add SBT_1NS outside of the shift... >> I think that you could just do (unsigned int)-1 or UINT_MAX. > > Perhaps. This avoids using the abomination, but still uses a garbage constant and the garbage constant is off by 1 relative to the intended garbage constant. >> As a side note, I wonder if those functions are ever used on negative >> values, >> given the type of the argument, and if anyone checked their correctness in >> that >> case. > > I don't think so, but an assert would be good to make sure. I checked them on all valid values. They obviously can't work, since the constant is garbage. Unfortunately, I lost the test program. IIRC, it finds about 80 million out of 1 billion wrong results for nsec precision, 32 out of 1 million wrong for usec precision, and 0 out of 1000 wrong for msec precision. > > But I think I understand the problem now. > > mstosbt(1000) is overflowing with my change, but not without because we're > adding 2^32 to a number that's ~900 away from overflowing changing a very This value is just invalid. Negative values are obviously invalid. Positive values of more than 1 second are invalid. In the old version, values of precisely 1 second don't overflow since the scale factor is rounded down; the result is just 1 unit lower than 1 second. Overflow (actually truncation) occurs at 1 unit above 1 second. E.g., sbttoms(mstosbt(1000)) was 999 and sbttoms(mstosbt(1001)) was 0. Now in my fixed version, sbttoms(mstosbt(1000)) is truncated to 0 and sbttoms(mstosbt(1001)) is truncated to 1. The test program also showed that in the old version, all valid args except 0 are reduced by precisely 1 by conversion to sbt and back. > large sleep of 1 second to a tiny sleep of 0 (which is 1ms at the default > Hz). I think we get this in the mlx code because there's a msleep(1000) in > at least one place. msleep(1001) would have failed before my change. Now I > think msleep(999) works, but msleep(1000) fails. Since the code is waiting > a second for hardware to initialize, a 1ms instead is likely to catch the > hardware before it's finished. I think this will cause problems no matter > cold or not, since it's all pause_sbt() under the covers and a delay of 0 > is still 0 either way. Bug in the mlx code. The seconds part must be converted separately, as in tstosbt(). > The fix I think is something like: > > diff --git a/sys/sys/time.h b/sys/sys/time.h > index 2207767813f2..00c6bb4a20fb 100644 > --- a/sys/sys/time.h > +++ b/sys/sys/time.h > @@ -204,8 +204,12 @@ sbttoms(sbintime_t _sbt) > static __inline sbintime_t > mstosbt(int64_t _ms) > { > + sbintime_t sb; > > - return ((_ms * (((uint64_t)1 << 63) / 500) + (1ull << 32) - 1) >> > 32); > + sb = (_ms / 1000) * SBT_1S; > + _ms = _ms % 1000; > + sb += (_ms * (((uint64_t)1 << 42) / 1000) + 1023) >> 10; > + return (sb); > } sbt never supported this. This pessimizes correct code and bloats the functions with divisions. I know too much about this because I wrote similar code for {ts,tv}<->bt conversions but phk disapproved of it and only committed a comment in sys/time.h saying not to do it. The comment forbids doing it for sbt too. However, the comment allows truncation to correct values. The sbt values are truncated to far below the floor of the infinite-precision values, because the scaling is sloppy. The scaling is much sloppier for sbt than for bt since the scale factor is in fixed point with 64 bits for bt but only 32 bits for sbt. 32 bits is barely enough for the error to be only 1 after converting back. This is because SBT_1NS is 4, so nanonseconds can be represented to an accuracy of about 1/4 using sbt; rounding down loses at most 1/4 on the nsec scale so converting back loses at most 1 provided scaling and other rounding steps don't lose several quarters. Here is a working version with too many comments. XX Index: time.h XX =================================================================== XX --- time.h (revision 340473) XX +++ time.h (working copy) XX @@ -97,11 +97,11 @@ XX { XX uint64_t _p1, _p2; XX XX - _p1 = (_bt->frac & 0xffffffffull) * _x; XX + _p1 = (_bt->frac & 0xffffffff) * _x; XX _p2 = (_bt->frac >> 32) * _x + (_p1 >> 32); XX _bt->sec *= _x; XX _bt->sec += (_p2 >> 32); XX - _bt->frac = (_p2 << 32) | (_p1 & 0xffffffffull); XX + _bt->frac = (_p2 << 32) | (_p1 & 0xffffffff); XX } XX XX static __inline void Fix other instances of the long long abomination. These were especially gratuitous, since the natural constants are 32 bits. The natural promotions of these are also correct. Using the abomination doesn't simplify analysis of conversions. long long may be larger than 64 bits. Then you have to analyze demotions instead of promotions. On LP64 systems, long long has higher rank than uint64_t, so you still must analyze demotions. XX @@ -162,9 +162,56 @@ XX * large roundoff errors which sbttons() and nstosbt() avoid. Millisecond and XX * microsecond functions are also provided for completeness. XX * XX - * These functions return the smallest sbt larger or equal to the number of XX - * seconds requested so that sbttoX(Xtosbt(y)) == y. The 1 << 32 - 1 term added XX - * transforms the >> 32 from floor() to ceil(). XX + * Violate the comment about "Background information": XX + * XX + * All these functions do sloppy scaling and/or rounding, but it is arranged XX + * that the errors compensate for each other so that sbttoX(Xtosbt(x)) == x XX + * for all valid args. XX + * XX + * Sloppy scaling in the decimal->sbt functions tends to give a result that XX + * is significantly far below the floor of the correctly rounded result (call XX + * this the "true floor"). Similary sloppy scaling in the sbt->decimal XX + * functions reduces further below the true floor. The result was that XX + * sbttoX(Xtosbt(x)) was 1 too small for all cases except x = 0. XX + * XX + * Doing correct even scaling and rounding up to the true ceiling in the XX + * decimal->sbt functions alone is useless for fixing this, since, the XX + * sbt->decimal functions reduc even true ceilings to below true floors. XX + * XX + * Instead of correcting all the functions, return a fake ceiling in the XX + * decimal->sbt functions. This ceiling is approx. 1 decimal unit above the XX + * sloppily scaled value. It is thus above the true ceiling but not 1 XX + * decimal unit or more above. It is unobvious that the sbt->decimal XX + * functions don't reduce even this fake ceiling to below the true floor. XX + * Exhaustive testing shows that they don't. The fake ceiling must be XX + * significantly more than 1 sbt unit above the true ceiling for the XX + * reduction to be not too large. Fortunately, there are no numerical XX + * accidents that give a too-low fake ceiling. XX + * XX + * The previous version claimed to calculate true ceilings. These never XX + * work. But it actually calculated fake ceilings of 2**32-1 sbt units XX + * above the sloppily scaled value. This value is nonsense, but worked XX + * accidentally in some cases. The decimal unit of 1 nsec is XX + * 2**64/10**9 ~= 18e9 sbt units. 2**32-1 is thus about 4.29 times too XX + * small to work non-accidentally. In practice, it worked accidentally in XX + * about 92% of cases. For conversions of microseconds, the value used was XX + * about 4.29e3 times too small, but worked accidentally in all except 32 XX + * cases. For conversions of milliseconds, the value used was about 4.29e6 XX + * times too small, but worked accidentally in all cases. XX + * XX + * Note that the range of valid args is significantly limited, since the XX + * efficient sloppy scaling and rounding methods used here only work up to 1 XX + * second. E.g., _ms must be between 0 and 999 inclusive, since XX + * ustosbt(1000) = 0 due to overflow of of the multiplication to almost XX + * exactly 0. This allows exhaustive testing that the sloppy methods work XX + * It is unclear if similar sloppy scaling would work above 1 second, but XX + * non-sloppy scaling is very easy for whole seconds (see tstosbt()). XX + * XX + * Uncommitted code for ts/tv<->bt was more careful, but was disapproved and XX + * the comment about "Background information" was added to inhibit commits XX + * in this area. It does non-sloppy scaling of fractional parts, then rounds XX + * to nearest. I don't remember doing exhaustive testing of it, and now don't XX + * see why rounding to nearest is enough. XX */ XX static __inline int64_t XX sbttons(sbintime_t _sbt) XX @@ -177,7 +224,79 @@ XX nstosbt(int64_t _ns) XX { XX XX - return ((_ns * (((uint64_t)1 << 63) / 500000000) + (1ull << 32) - 1) >> 32); XX + /* XX + * The adjustment is essentially to add SBT_1NS to the result of XX + * sloppy scaling and rounding down. That happens to work too, but XX + * use the slightly less sloppy method of combining shifts and XX + * subtracting 1. The result in most cses is much higher than the XX + * true ceiling, but sloppy rounding for the reverse conversion XX + * discards the extras. XX + * XX + * E.g., 2**64/10**9 has a fractional part of ~0.71. Multiplying XX + * this by the maximum valid _ns of 10**9-1 gives ~0.71e9, so XX + * rounding down after scaling gives a raw sbt that is ~0.71e9 below XX + * the true floor (only ~0.17 below the true floor in seconds). XX + * Replacing _ns by (_ns + 1) gives a raw sbt that is ~0.29e9 above XX + * the true ceiling (only ~0.068 above the true ceiling in seconds). XX + * Subtracting 1 from this makes no significant difference. Sloppy XX + * scaling and rounding in the reverse conversion recovers the XX + * original value: the sloppy scaling reduces the value from above the XX + * true ceiling to below the true ceiling (but not below the true XX + * floor); then rounding down recovers the true floor. XX + * XX + * Similarly for _all_ valid args and for the us and ms conversions. XX + * It is fairly obvious that the result of conversion back to ns is XX + * never 1 too low (since we add 1), but not obvious that the result XX + * is never 1 too high (since adding 1 risks this). XX + * XX + * Conversion from sbt to decimal and back has no chance of always XX + * restoring the original values since the decimal formats have less XX + * precision. Adding 1 is too sloppy to do anything good for this XX + * direction. The uncommitted ts/tv<->bt conversions are supposed to XX + * work as well as possible in both directions. Rounding to nearest XX + * works better and non-sloppy scaling is obviously needed to work in XX + * both directions. However, non-sloppy scaling doesn't work with XX + * mixed or sloppy rounding. In the above example with _ns = 10**9-1, XX + * it gives a raw sbt that is just 1 below the true floor. Adding 1 XX + * to this gives the true ceiling in the sbt, but unless the reverse XX + * conversion is changed similarly, it reduces from the true ceiling XX + * to below the true floor, so the result of converting back has much XX + * the same off-by-1 errors as simpler method (always 1 too low except XX + * for an arg of 0). XX + * XX + * Further examples: XX + * - when _ns = 0, sbt is SBT_1NS, but converting this back to ns XX + * gives 0. If we start with this sbt, oue conversion functions XX + * are still unusable for printing it in nsec. XX + * - when _ns = 10**9-1, we are close to overflow. Adding 1 to it XX + * doesn't quite reach the overflow threshold. XX + * - in previous versions, when _ns = 10**9, overflow was not quite XX + * reached, and the result of a double conversion had the usual XX + * off-by-1 error. XX + * - when _ns = 10**9, overflow now occurs and the result of a double XX + * conversion is 0 XX + * - when _ns = 10**9+1, overflow occurs in previous versions and the XX + * result of a double conversion is 0 (overflow gives an off-by-10**9 XX + * error and then the usual off-by-1-error occurs. XX + */ XX +#if 1 XX + /* XX + * This version does non-sloppy scaling which after rounding down is XX + * close to or equal 1 below the true ceiling for all nonzero args. XX + * Then adding 1 gives an adequate true or fake ceiling. Some of the XX + * above comments are wrong about the true ceiling not working. XX + * Unfortunately, this is about 30% slower (a whole cycle extra in XX + * the test). The unnecessary subtraction of 1 in the main version XX + * costs about 1/2 of a cycle in the test. XX + */ XX + uint64_t factor, frac; XX + XX + factor = ((uint64_t)1 << 63) / 500000000; XX + frac = ((0 - factor * 1000000000) << 32) / 1000000000; XX + return (((_ns * factor + ((_ns * frac) >> 32)) >> 32) + 1); XX +#else XX + return (((_ns + 1) * (((uint64_t)1 << 63) / 500000000) - 1) >> 32); XX +#endif XX } XX XX static __inline int64_t XX @@ -191,7 +310,7 @@ XX ustosbt(int64_t _us) XX { XX XX - return ((_us * (((uint64_t)1 << 63) / 500000) + (1ull << 32) - 1) >> 32); XX + return (((_us + 1) * (((uint64_t)1 << 63) / 500000) - 1) >> 32); XX } XX XX static __inline int64_t XX @@ -205,7 +324,7 @@ XX mstosbt(int64_t _ms) XX { XX XX - return ((_ms * (((uint64_t)1 << 63) / 500) + (1ull << 32) - 1) >> 32); XX + return (((_ms + 1) * (((uint64_t)1 << 63) / 500) - 1) >> 32); XX } XX XX /*- Here is my old code for the corresponding disapproved scaling for bt. This uses rounding to nearest in all functions, and does less sloppy scaling, but has wrong constants. It seemed to work, but I don't remember do exhaustive testing of it. The rounding to nearest probably compensates for errors in the constants. YY Index: time.h YY =================================================================== YY RCS file: /home/ncvs/src/sys/sys/time.h,v YY retrieving revision 1.50 YY diff -u -2 -r1.50 time.h YY --- time.h 8 Feb 2002 03:55:37 -0000 1.50 YY +++ time.h 7 Mar 2002 10:04:34 -0000 YY @@ -127,5 +125,6 @@ YY YY ts->tv_sec = bt->sec; YY - ts->tv_nsec = (1000000000ULL * (u_int32_t)(bt->frac >> 32)) >> 32; YY + ts->tv_nsec = ((uint64_t)1000000000 * (uint32_t)(bt->frac >> 32) + YY + 0x80000000) >> 32; YY } YY The power-of-2 constant is correct in this direction. YY @@ -133,7 +132,17 @@ YY timespec2bintime(struct timespec *ts, struct bintime *bt) YY { YY + uint64_t factor, frac; YY YY bt->sec = ts->tv_sec; YY - bt->frac = ts->tv_nsec * 18446744073ULL; /* int(2^64 / 1000000000) */ YY + YY + /* YY + * Even the imperfect rounding here is much more expensive than the YY + * perfect rounding in bintime2timespec(), but this is not a problem YY + * because this function is rarely used. In fact, it is not used at YY + * all (yet?) (only timeval2bintime() is used). YY + */ YY + factor = (uint64_t)(-1) / 1000000000; YY + frac = (((uint64_t)(-1) - factor * 100000000 + 1) << 32) / 1000000000; YY + bt->frac = ts->tv_nsec * factor + ((ts->tv_nsec * frac) >> 32); YY } YY This shows how to do the scaling accurately to within 1 bt unit, except the constant is wrong in the new code (for frac) -- one of the powers of 10 is missing an 0 digit :-(. As the comment says, this doesn't bother to round bt->frac specially, so does rounding down. bt has so much more precision than ts (about 2**34 times more) that the rounding doesn't matter, provided you round to nearest in the reverse direction. Perfect rounding down in both directions of course gives a reduction by 1 unit for the double conversion for all args except 0. Unlike for sbt, the bt conversion functions support args larger than 1 second, so there are too many args for exhaustive testing, but the conversions don't lose precision for whole seconds so there is no need to test args larger than 1 second. YY @@ -143,5 +152,6 @@ YY YY tv->tv_sec = bt->sec; YY - tv->tv_usec = (1000000ULL * (u_int32_t)(bt->frac >> 32)) >> 32; YY + tv->tv_usec = ((uint64_t)1000000 * (uint32_t)(bt->frac >> 32) + YY + 0x80000000) >> 32; YY } YY YY @@ -149,10 +159,13 @@ YY timeval2bintime(struct timeval *tv, struct bintime *bt) YY { YY + uint64_t factor, frac; YY YY bt->sec = tv->tv_sec; YY - bt->frac = tv->tv_usec * 18446744073709ULL; /* int(2^64 / 1000000) */ YY -} YY YY -/* end of struct bintime stuff */ YY + /* See above rounding. */ YY + factor = (uint64_t)(-1) / 1000000; YY + frac = (((uint64_t)(-1) - factor * 100000 + 1) << 32) / 1000000; This is also missing an 0 digit. YY + bt->frac = tv->tv_usec * factor + ((tv->tv_usec * frac) >> 32); YY +} YY YY #ifdef _KERNEL sys/time.h still uses the long long abominations and the obfuscated constant 1844... in the code. Only verbose comments which spell this constant constant correctly (in pseudo-code) were committed. These bugs are missing for the corresponding constants in sbt conversions. Bruce