From owner-svn-src-all@freebsd.org Sat Oct 26 11:51:32 2019 Return-Path: Delivered-To: svn-src-all@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id E841116DFAC; Sat, 26 Oct 2019 11:51:32 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au [211.29.132.249]) by mx1.freebsd.org (Postfix) with ESMTP id 470fTJ21RSz44BY; Sat, 26 Oct 2019 11:51:31 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id BA4E53A057E; Sat, 26 Oct 2019 22:51:27 +1100 (AEDT) Date: Sat, 26 Oct 2019 22:51:25 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Andriy Gapon cc: Ian Lepore , src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r354076 - head/sys/dev/ow In-Reply-To: <2fde5b7b-6186-0969-08a5-83524b6aa274@FreeBSD.org> Message-ID: <20191026204953.A894@besplex.bde.org> References: <201910251538.x9PFc9ii028313@repo.freebsd.org> <2fde5b7b-6186-0969-08a5-83524b6aa274@FreeBSD.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=P6RKvmIu c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=kj9zAlcOel0A:10 a=6I5d2MoRAAAA:8 a=La-k9SJ9Jmyq5Ap1hsAA:9 a=7Zwj6sZBwVKJAoWSPKxL6X1jA+E=:19 a=CjuIK1q_8ugA:10 a=IjZwj45LgO3ly-622nXo:22 X-Rspamd-Queue-Id: 470fTJ21RSz44BY X-Spamd-Bar: ----- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-5.99 / 15.00]; NEURAL_HAM_MEDIUM(-0.99)[-0.986,0]; REPLY(-4.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0] X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 26 Oct 2019 11:51:33 -0000 On Fri, 25 Oct 2019, Andriy Gapon wrote: > On 25/10/2019 18:46, Ian Lepore wrote: >> On Fri, 2019-10-25 at 15:38 +0000, Andriy Gapon wrote: >>> Author: avg >>> Date: Fri Oct 25 15:38:09 2019 >>> New Revision: 354076 >>> URL: https://svnweb.freebsd.org/changeset/base/354076 >>> >>> Log: >>> owc_gpiobus_read_data: compare times in sbintime_t units >>> >>> Previously the code used sbttous() before microseconds comparison >>> in one >>> place, sbttons() and nanoseconds in another, division by SBT_1US >>> and >>> microseconds in yet another. >>> >>> Now the code consistently uses multiplication by SBT_1US to convert >>> microseconds to sbintime_t before comparing them with periods >>> between >>> calls to sbinuptime(). This is fast, this is precise enough (below >>> 0.03%) and the periods defined by the protocol cannot overflow. >>> >>> Reviewed by: imp (D22108) >>> MFC after: 2 weeks >>> >>> Modified: >>> head/sys/dev/ow/owc_gpiobus.c >>> >>> Modified: head/sys/dev/ow/owc_gpiobus.c >>> ===================================================================== >>> ========= >>> --- head/sys/dev/ow/owc_gpiobus.c Fri Oct 25 15:02:50 2019 (r354 >>> 075) >>> +++ head/sys/dev/ow/owc_gpiobus.c Fri Oct 25 15:38:09 2019 (r354 >>> 076) >>> @@ -296,10 +296,10 @@ owc_gpiobus_read_data(device_t dev, struct >>> ow_timing * >>> do { >>> now = sbinuptime(); >>> GETPIN(sc, &sample); >>> - } while (sbttous(now - then) < t->t_rdv + 2 && sample == 0); >>> + } while (now - then < (t->t_rdv + 2) * SBT_1US && sample == 0); >>> critical_exit(); >>> >>> - if (sbttons(now - then) < t->t_rdv * 1000) >>> + if (now - then < t->t_rdv * SBT_1US) >>> *bit = 1; >>> else >>> *bit = 0; >>> @@ -307,7 +307,7 @@ owc_gpiobus_read_data(device_t dev, struct >>> ow_timing * >>> /* Wait out the rest of t_slot */ >>> do { >>> now = sbinuptime(); >>> - } while ((now - then) / SBT_1US < t->t_slot); >>> + } while (now - then < t->t_slot * SBT_1US); >>> >>> RELBUS(sc); >>> >> >> Unit conversions with sbt times should be done using the macros that >> carefully avoid roundoff errors. I don't understand why you've changed >> the code that correctly did use those macros to inline math. > > I think that the commit message explains it: > This is fast, this is precise enough (below 0.03%) and the periods defined by > the protocol cannot overflow. > > Do you disagree? > Could you please explain in which of the three lines changed the new code is > worse and why? The old code is worse. This is partly because the inline functions are over-engineered and poorly implemented and don't even work. The first thing improved in the change is: >>> do { >>> now = sbinuptime(); >>> GETPIN(sc, &sample); >>> - } while (sbttous(now - then) < t->t_rdv + 2 && sample == 0); >>> + } while (now - then < (t->t_rdv + 2) * SBT_1US && sample == 0); sbintime_t is signed so that negative differences work. Scaling negative differences by SBT_1*S works right. Scaling of negative difference by the inline functions attempts to panic, but even the panic is broken (unreachable). Here 'now' is an uptime and 'then' is presumably a previous uptime, so now < then "can't happen". Similar code with non-monotonic or untrusted times might produce a negative difference. For efficiency, t_rdv + 2 could be kept as an sbintime. Scaling it at runtime gives the cleaner code: } while (now - then < ustosbt(t->t_rdv + 2) && sample == 0); Sign extension bugs are further away in this expression. The arg (t->t_rdv + 2) is more obviously >= 0 since it is a limit, and all of the inline functions return a signed type (sbintime_t or int64_t) so that signed differences work. The next thing improved is: >>> - if (sbttons(now - then) < t->t_rdv * 1000) >>> + if (now - then < t->t_rdv * SBT_1US) This preserves full accuracy and signs for 'now - then' but loses a little more accuracy than needed for conversion of t_rdv. But accuracy of t_rdv is apparently unimportant. If the last bit in it were important, then it should be kept as an sbintime and difference like 'now - then' should never be scaled since even perfect rounding would lose bits. The final thing improved is: >>> - } while ((now - then) / SBT_1US < t->t_slot); >>> + } while (now - then < t->t_slot * SBT_1US); This already used the unscaled 'now - then', but with a style bug (excessive parentheses. imp still hasn't replied to my mails that fix some of the bugs in the overengineerined committed version. My version is even more overengineered. I maintain the following version but only use the non-comment parts of sbttons(). First look at the bugs in the committed sbtons(): XX /* XX * Large comment. The function is not very overengineered except in the XX * comment. Half the detals are wrong. XX */ XX static __inline int64_t XX sbttons(sbintime_t _sbt) XX { XX uint64_t ns; XX XX #ifdef KASSERT XX KASSERT(_sbt >= 0, ("Negative values illegal for sbttons: %jx", _sbt)); XX #endif Multiplication by SBT_1NS used to be the proper conversion KPI. It loses some accuracy, but works for negative _sbt (since SBT_1NS is signed). Here the conversion is more complicated. It still uses mostly signed types (0xffffffffu gets promoted), but doesn't work if )sbt is negative. The above attempts to detect and panic for this, but has no effect in most cases, since due tp standard namespace pollution in , is always included before , so KASSERT in only defined here for kernel files for kernel files that have style bugs in their #include order or selection. XX ns = _sbt; Bad style (use ns as scratch for an sbtbintime_t variable). XX if (ns >= SBT_1S) XX ns = (ns >> 32) * 1000000000; XX else XX ns = 0; This avoids overflow for ns large and < 0. For ns large and < 0, it doesn't reduce adjust ns, so later code adds a garbage bs (unscaled ns = full _sbt) to the fractional part of _sbt. XX XX return (ns + (1000000000 * (_sbt & 0xffffffffu) >> 32)); XX } The large buggy comment is attached only to this function, but only applies to function converting in the other direction. In this direction, the complications and overengineering are small. The functions just avoid overflow for large ns. XX --- /home/bde/sys13/sys/time.h 2019-09-19 17:44:48.507873000 +0000 XX +++ time.h 2018-11-16 07:19:07.542131000 +0000 XX @@ -30,5 +30,5 @@ XX * XX * @(#)time.h 8.5 (Berkeley) 5/4/95 XX - * $FreeBSD: head/sys/sys/time.h 346176 2019-04-13 04:46:35Z imp $ XX + * $FreeBSD: head/sys/sys/time.h 340450 2018-11-15 16:02:13Z imp $ XX */ XX XX @@ -98,9 +98,9 @@ XX uint64_t _p1, _p2; XX XX - _p1 = (_bt->frac & 0xffffffffull) * _x; XX + _p1 = (_bt->frac & 0xffffffff) * _x; XX _p2 = (_bt->frac >> 32) * _x + (_p1 >> 32); XX _bt->sec *= _x; XX _bt->sec += (_p2 >> 32); XX - _bt->frac = (_p2 << 32) | (_p1 & 0xffffffffull); XX + _bt->frac = (_p2 << 32) | (_p1 & 0xffffffff); XX } XX Vestiges of my old fixes for similar problems converting between bintimes and timevals/timespecs. Although bintimes have much more precision, they can't represent timevals or timespecs exactly, so conversions to bintimes and back tend to lose a bit. E.g., 1 second survives, but 999999 usec becomes 999998 usec. The use of ull is just a style bug here. We just want to extract low bits from _bt->frac. The type of the sub-expression remains as uint64_t. XX @@ -163,38 +163,60 @@ XX * microsecond functions are also provided for completeness. XX * XX - * These functions return the smallest sbt larger or equal to the XX - * number of seconds requested so that sbttoX(Xtosbt(y)) == y. Unlike These functions are not as accurate as claimed. XX - * top of second computations below, which require that we tick at the XX - * top of second, these need to be rounded up so we do whatever for at XX - * least as long as requested. XX - * XX - * The naive computation we'd do is this XX - * ((unit * 2^64 / SIFACTOR) + 2^32-1) >> 32 XX - * However, that overflows. Instead, we compute XX - * ((unit * 2^63 / SIFACTOR) + 2^31-1) >> 32 XX - * and use pre-computed constants that are the ceil of the 2^63 / SIFACTOR XX - * term to ensure we are using exactly the right constant. We use the lesser This method is routine and shouldn't be documented here. Howver, there are some complications and shortcuts in the details that should be documented or avoided. XX - * evil of ull rather than a uint64_t cast to ensure we have well defined XX - * right shift semantics. With these changes, we get all the ns, us and ms XX - * conversions back and forth right. Using ull instead of the correct cast is just a gross style bug. Using any unsigned type asks for unsign extension bugs but may be needed for technical reasons (we start with int64_t types and must handle negative values before combinining with uint64_t constants. XX - * Note: This file is used for both kernel and userland includes, so we can't XX - * rely on KASSERT being defined, nor can we pollute the namespace by including XX - * assert.h. Remove nonense about KASSERT. XX + * Violate the comment about "Background information": phk didn't like my rounding fixes for bintimes and responded by adding the the "Background information" comment. This demands always perfectly rounding down. The sbintime conversion functions violate this by always rounding up. Multiplication by SBT1_*S rounds down a bit too much, but always perfectly rounding down would do much the same after a few iterations. XX + * XX + * All these functions do sloppy scaling and/or rounding, but it is arranged XX + * that the errors compensate for each other so that sbttoX(Xtosbt(x)) == x XX + * for all valid args. XX + * XX + * Sloppy scaling in the decimal->sbt functions tends to give a result that XX + * is significantly far below the floor of the correctly rounded result (call XX + * this the "true floor"). Similary sloppy scaling in the sbt->decimal XX + * functions reduces further below the true floor. The result was that XX + * sbttoX(Xtosbt(x)) was 1 too small for all cases except x = 0. XX + * XX + * Doing correct even scaling and rounding up to the true ceiling in the XX + * decimal->sbt functions alone is useless for fixing this, since, the XX + * sbt->decimal functions reduc even true ceilings to below true floors. XX + * XX + * Instead of correcting all the functions, return a fake ceiling in the XX + * decimal->sbt functions. This ceiling is approx. 1 decimal unit above the XX + * sloppily scaled value. It is thus above the true ceiling but not 1 XX + * decimal unit or more above. It is unobvious that the sbt->decimal XX + * functions don't reduce even this fake ceiling to below the true floor. XX + * Exhaustive testing shows that they don't. The fake ceiling must be XX + * significantly more than 1 sbt unit above the true ceiling for the XX + * reduction to be not too large. Fortunately, there are no numerical XX + * accidents that give a too-low fake ceiling. XX + * XX + * The previous version claimed to calculate true ceilings. These never XX + * work. But it actually calculated fake ceilings of 2**32-1 sbt units XX + * above the sloppily scaled value. This value is nonsense, but worked XX + * accidentally in some cases. The decimal unit of 1 nsec is XX + * 2**64/10**9 ~= 18e9 sbt units. 2**32-1 is thus about 4.29 times too XX + * small to work non-accidentally. In practice, it worked accidentally in XX + * about 92% of cases. For conversions of microseconds, the value used was XX + * about 4.29e3 times too small, but worked accidentally in all except 32 XX + * cases. For conversions of milliseconds, the value used was about 4.29e6 XX + * times too small, but worked accidentally in all cases. XX + * XX + * Note that the range of valid args is significantly limited, since the XX + * efficient sloppy scaling and rounding methods used here only work up to 1 XX + * second. E.g., _ms must be between 0 and 999 inclusive, since XX + * ustosbt(1000) = 0 due to overflow of of the multiplication to almost XX + * exactly 0. This allows exhaustive testing that the sloppy methods work XX + * It is unclear if similar sloppy scaling would work above 1 second, but XX + * non-sloppy scaling is very easy for whole seconds (see tstosbt()). XX + * XX + * Uncommitted code for ts/tv<->bt was more careful, but was disapproved and XX + * the comment about "Background information" was added to inhibit commits XX + * in this area. It does non-sloppy scaling of fractional parts, then rounds XX + * to nearest. I don't remember doing exhaustive testing of it, and now don't XX + * see why rounding to nearest is enough. XX */ XX static __inline int64_t XX sbttons(sbintime_t _sbt) XX { XX - uint64_t ns; XX - XX -#ifdef KASSERT XX - KASSERT(_sbt >= 0, ("Negative values illegal for sbttons: %jx", _sbt)); XX -#endif XX - ns = _sbt; XX - if (ns >= SBT_1S) XX - ns = (ns >> 32) * 1000000000; XX - else XX - ns = 0; XX XX - return (ns + (1000000000 * (_sbt & 0xffffffffu) >> 32)); XX + return ((1000000000 * _sbt) >> 32); XX } In this direction, I just removed the buggy overflow handling (but kept the accurate scaling). Overflow occurs after only ~2 seconds for nanoseconds. XX XX @@ -202,16 +224,78 @@ XX nstosbt(int64_t _ns) XX { XX - sbintime_t sb = 0; XX XX -#ifdef KASSERT XX - KASSERT(_ns >= 0, ("Negative values illegal for nstosbt: %jd", _ns)); XX + /* XX + * The adjustment is essentially to add SBT_1NS to the result of XX + * sloppy scaling and rounding down. That happens to work too, but XX + * use the slightly less sloppy method of combining shifts and XX + * subtracting 1. The result in most cses is much higher than the XX + * true ceiling, but sloppy rounding for the reverse conversion XX + * discards the extras. XX + * XX + * E.g., 2**64/10**9 has a fractional part of ~0.71. Multiplying XX + * this by the maximum valid _ns of 10**9-1 gives ~0.71e9, so XX + * rounding down after scaling gives a raw sbt that is ~0.71e9 below XX + * the true floor (only ~0.17 below the true floor in seconds). XX + * Replacing _ns by (_ns + 1) gives a raw sbt that is ~0.29e9 above XX + * the true ceiling (only ~0.068 above the true ceiling in seconds). XX + * Subtracting 1 from this makes no significant difference. Sloppy XX + * scaling and rounding in the reverse conversion recovers the XX + * original value: the sloppy scaling reduces the value from above the XX + * true ceiling to below the true ceiling (but not below the true XX + * floor); then rounding down recovers the true floor. XX + * XX + * Similarly for _all_ valid args and for the us and ms conversions. XX + * It is fairly obvious that the result of conversion back to ns is XX + * never 1 too low (since we add 1), but not obvious that the result XX + * is never 1 too high (since adding 1 risks this). XX + * XX + * Conversion from sbt to decimal and back has no chance of always XX + * restoring the original values since the decimal formats have less XX + * precision. Adding 1 is too sloppy to do anything good for this XX + * direction. The uncommitted ts/tv<->bt conversions are supposed to XX + * work as well as possible in both directions. Rounding to nearest XX + * works better and non-sloppy scaling is obviously needed to work in XX + * both directions. However, non-sloppy scaling doesn't work with XX + * mixed or sloppy rounding. In the above example with _ns = 10**9-1, XX + * it gives a raw sbt that is just 1 below the true floor. Adding 1 XX + * to this gives the true ceiling in the sbt, but unless the reverse XX + * conversion is changed similarly, it reduces from the true ceiling XX + * to below the true floor, so the result of converting back has much XX + * the same off-by-1 errors as simpler method (always 1 too low except XX + * for an arg of 0). XX + * XX + * Further examples: XX + * - when _ns = 0, sbt is SBT_1NS, but converting this back to ns XX + * gives 0. If we start with this sbt, oue conversion functions XX + * are still unusable for printing it in nsec. XX + * - when _ns = 10**9-1, we are close to overflow. Adding 1 to it XX + * doesn't quite reach the overflow threshold. XX + * - in previous versions, when _ns = 10**9, overflow was not quite XX + * reached, and the result of a double conversion had the usual XX + * off-by-1 error. XX + * - when _ns = 10**9, overflow now occurs and the result of a double XX + * conversion is 0 XX + * - when _ns = 10**9+1, overflow occurs in previous versions and the XX + * result of a double conversion is 0 (overflow gives an off-by-10**9 XX + * error and then the usual off-by-1-error occurs. XX + */ XX +#if 1 XX + /* XX + * This version does non-sloppy scaling which after rounding down is XX + * close to or equal 1 below the true ceiling for all nonzero args. XX + * Then adding 1 gives an adequate true or fake ceiling. Some of the XX + * above comments are wrong about the true ceiling not working. XX + * Unfortunately, this is about 30% slower (a whole cycle extra in XX + * the test). The unnecessary subtraction of 1 in the main version XX + * costs about 1/2 of a cycle in the test. XX + */ XX + uint64_t factor, frac; XX + XX + factor = ((uint64_t)1 << 63) / 500000000; XX + frac = ((0 - factor * 1000000000) << 32) / 1000000000; XX + return (((_ns * factor + ((_ns * frac) >> 32)) >> 32) + 1); This is the fully overengineered version. It is too slow to use. XX +#else XX + return (((_ns + 1) * (((uint64_t)1 << 63) / 500000000) - 1) >> 32); XX #endif XX - if (_ns >= SBT_1S) { XX - sb = (_ns / 1000000000) * SBT_1S; XX - _ns = _ns % 1000000000; XX - } XX - /* 9223372037 = ceil(2^63 / 1000000000) */ XX - sb += ((_ns * 9223372037ull) + 0x7fffffff) >> 31; Rounding to the ceiling gives an intger scale factor which is a bit too high; after multiplying by _ns the error in nanoseconds can be more than 1 and so can the error in sbintimes. Using (_ns + 1) instead of _ns is supposed to fix this. The version that I actually use is: sb += (__ns << 32) / 1000000000 + (_ns == 0 : 0 : 1) This does simpler scaling to first round down, then violates the "Background information" comment by always rounding up except for _ns == 0, so that it it is clear that the rounding is as intended (don't do extra work to avoid rounding up if _ns is exactly representable. XX - return (sb); XX } XX XX @@ -226,16 +310,6 @@ XX ustosbt(int64_t _us) XX { XX - sbintime_t sb = 0; XX XX -#ifdef KASSERT XX - KASSERT(_us >= 0, ("Negative values illegal for ustosbt: %jd", _us)); XX -#endif XX - if (_us >= SBT_1S) { XX - sb = (_us / 1000000) * SBT_1S; XX - _us = _us % 1000000; XX - } XX - /* 9223372036855 = ceil(2^63 / 1000000) */ XX - sb += ((_us * 9223372036855ull) + 0x7fffffff) >> 31; XX - return (sb); XX + return (((_us + 1) * (((uint64_t)1 << 63) / 500000) - 1) >> 32); XX } XX XX @@ -250,16 +324,6 @@ XX mstosbt(int64_t _ms) XX { XX - sbintime_t sb = 0; XX XX -#ifdef KASSERT XX - KASSERT(_ms >= 0, ("Negative values illegal for mstosbt: %jd", _ms)); XX -#endif XX - if (_ms >= SBT_1S) { XX - sb = (_ms / 1000) * SBT_1S; XX - _ms = _ms % 1000; XX - } XX - /* 9223372036854776 = ceil(2^63 / 1000) */ XX - sb += ((_ms * 9223372036854776ull) + 0x7fffffff) >> 31; XX - return (sb); XX + return (((_ms + 1) * (((uint64_t)1 << 63) / 500) - 1) >> 32); XX } XX Bruce