Date: Wed, 21 Nov 2018 02:35:32 +1100 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Warner Losh <imp@bsdimp.com> Cc: Bruce Evans <brde@optusnet.com.au>, Andriy Gapon <avg@freebsd.org>, "Rodney W. Grimes" <rgrimes@freebsd.org>, Allan Jude <allanjude@freebsd.org>, Warner Losh <imp@freebsd.org>, src-committers <src-committers@freebsd.org>, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r340450 - head/sys/sys Message-ID: <20181120225901.H1386@besplex.bde.org> In-Reply-To: <CANCZdfp-e2UQ=xPt_kfCcqzWsYsUaaP%2BDYyQf4uy574-VuQ4kA@mail.gmail.com> References: <CANCZdfqr=R-MCpDEGWDqGYJbcQ46Hqw7PMrVinAsYTRPjLjJPA@mail.gmail.com> <201811190104.wAJ14CaE059062@pdx.rh.CN85.dnsmgr.net> <CANCZdfoGKKcL70ESKow=hfNABpO=5%2BQtUYcNmpt4gReRkeUvrA@mail.gmail.com> <5e227743-6463-d395-f2ba-da8d4ba248ca@FreeBSD.org> <CANCZdfo4=TknaHR0%2B3fTCDzKO_EHF0Vfmkooi92ZaGEnVfq2bA@mail.gmail.com> <20181120005944.B1050@besplex.bde.org> <20181120014951.J1250@besplex.bde.org> <20181120023823.M1428@besplex.bde.org> <CANCZdfp-e2UQ=xPt_kfCcqzWsYsUaaP%2BDYyQf4uy574-VuQ4kA@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 19 Nov 2018, Warner Losh wrote: > On Mon, Nov 19, 2018 at 9:05 AM Bruce Evans <brde@optusnet.com.au> wrote: > >> On Tue, 20 Nov 2018, Bruce Evans wrote: >> >>> On Tue, 20 Nov 2018, Bruce Evans wrote: >>> >>>> On Mon, 19 Nov 2018, Warner Losh wrote: >>> ... >>> I found my test program. >>> >>>>> But I think I understand the problem now. >>>>> >>>>> mstosbt(1000) is overflowing with my change, but not without because >> we're >>>>> adding 2^32 to a number that's ~900 away from overflowing changing a >> very >>>> >>>> This value is just invalid. Negative values are obviously invalid. >>>> Positive >>>> values of more than 1 second are invalid. In the old version, values of >>>> precisely 1 second don't overflow since the scale factor is rounded >> down; >>>> the result is just 1 unit lower than 1 second. Overflow (actually >>>> truncation) >>>> occurs at 1 unit above 1 second. E.g., sbttoms(mstosbt(1000)) was 999 >> and >>>> sbttoms(mstosbt(1001)) was 0. Now in my fixed version, >>>> sbttoms(mstosbt(1000)) >>>> is truncated to 0 and sbttoms(mstosbt(1001)) is truncated to 1. This remains very broken. I just noticed that the inverse functions don't support sbt's much larger than 1 second either. E.g., sbttons multiplies by 1 billion, so it overflows for sbt's larger than 4.29 seconds. It is better to not support times larger than 1 second in these functions than to pessimize them all. You only pessimized some of them. The inverse functions still overflow at small values; the direct functions still overflow at large values. >> Here are all uses of these functions in kern outside of sys/time.h: >> >> XX arm/arm/mpcore_timer.c: sc->et.et_min_period = nstosbt(20); >> >> OK since 20 ns is less than 1 second. >> >> XX cddl/compat/opensolaris/sys/kcondvar.h: return >> (cv_timedwait_sbt(cvp, mp, nstosbt(tim), nstosbt(res), 0)); >> XX cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c: >> pause_sbt("dmu_tx_delay", nstosbt(wakeup), >> XX cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c: >> nstosbt(zfs_delay_resolution_ns), C_ABSOLUTE); >> XX cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c: sbintime_t sleep = >> nstosbt((zilog->zl_last_lwb_latency * pct) / 100); >> XX compat/linuxkpi/common/include/linux/delay.h: >> pause_sbt("lnxsleep", mstosbt(ms), 0, C_HARDCLOCK); >> XX compat/linuxkpi/common/src/linux_hrtimer.c: >> nstosbt(hrtimer->expires), nstosbt(hrtimer->precision), 0); >> XX compat/linuxkpi/common/src/linux_hrtimer.c: >> callout_reset_sbt(&hrtimer->callout, nstosbt(ktime_to_ns(time)), >> XX compat/linuxkpi/common/src/linux_hrtimer.c: nstosbt(nsec), >> hrtimer_call_handler, hrtimer, 0); >> XX compat/linuxkpi/common/src/linux_hrtimer.c: >> callout_reset_sbt(&hrtimer->callout, nstosbt(ktime_to_ns(interval)), >> XX compat/linuxkpi/common/src/linux_hrtimer.c: >> nstosbt(hrtimer->precision), hrtimer_call_handler, hrtimer, 0); >> XX compat/linuxkpi/common/src/linux_schedule.c: ret = >> -pause_sbt("lnxsleep", mstosbt(ms), 0, C_HARDCLOCK | C_CATCH); >> >> All of the above might are broken unless their timeout arg is restricted to >> less than 1 second. >> >> Also, at least the sleep and pause calls in the above probably have a >> style bug in knowing about sbt's at all. More important functions >> like msleep() hide the use of sbt's and use fuzzier scale factors like >> tick_sbt. > > Yes. It's for these users I'm fixing the >= 1s cases. These functions are broken in FreeBSD-11 and FreeBSD-12 too. They were last correct in FreeBSD-10, where there weren't so many linuxkpi functions, but where the zfs code is similar except it is missing the overflow bugs. E.g., where the above does nstosbt(wakeup), FreeBSD-10 does wakeup * SBT_1NS. SBT_1NS is 4, but the infinite-precision scale factor is 4.294967296. The sloppy scale factor is gives large absolute errors for large args, but at least it doesn't overflow for merely large args; it does overflow for huge args. Maybe zfs's wakeup arg is always less that 1 second, so it wasn't broken by blind conversion to use nstosbt(), but that is unclear. Conversions in the other direction are very rare, at least using the conversion functions instead of divisions by SBT_*. sbttous() is used once in dev/ow and once in kern_sysctl.c; sbttoms() is used once in kern_sysctl.c; sbttons() is never used. Thus the overflow in sbttons() doesn't matter, and overflow in kern_sysctl.c's misuse of sbttous() only occurs at 4294+ seconds. > >> XX dev/iicbus/nxprtc.c: pause_sbt("nxpotp", mstosbt(100), >> mstosbt(10), 0); >> XX dev/iicbus/nxprtc.c: pause_sbt("nxpbat", mstosbt(100), 0, 0); >> >> OK since 100 ms is less than 1 second. >> >> XX kern/kern_sysctl.c: sb = ustosbt(tt); >> XX kern/kern_sysctl.c: sb = mstosbt(tt); >> >> Recently added bugs. The args is supplied by applications so can be far >> above >> 1 second. >> >> Also, these functions have a bogus API that takes uint64_t args. sbt's >> can't represent that high, and the API doesn't support failure, so callers >> have the burden of passing valid values. It is easiest to only support >> args >> up to 1 second and require the caller to handle seconds. > > It's just as easy to cope. No, it is not. In your next committed versions, all functions still overflow, just at larger values. >> Lots of itimer and select() code handle the corresponding problem for >> timevals and timespecs almost correctly. Timevals and timespecs already >> have the seconds part separate and itimer and select() syscalls do validity >> checking on the fractional seconds, so there is no problem converting the >> fractional sections to an sbt. Howver, the seconds part has type time_t >> and this can be int64_t, which sbt's cannot represent. Also, POSIX doesn't >> permit failure for seconds values that are representable by time_t. The >> almost correct handling is to split up large seconds values in some cases >> and break POSIX conformance by silently reducing the seconds value in >> other cases. The reduction is usually to INT32_MAX / 2. This allows >> adding 2 limited values but it is not obvious that this is enough since >> rounding up twice or just adding 2 more would give overflow. > > Right, that's beyond the scope of what I'm fixing. Better fix the overflows in FreeBSD-11 and FreeBSD-12 (or get their author to fix them) before adding complications. > https://reviews.freebsd.org/D18051 > > Contains the changes. They address all the actual problems you've raised, > but I'm going to disagree that 'ull' is bogus. For FreeBSD, it's fine and > it's a lot more readable than the alternatives. I can't quote it in mail from there, but saw a committed version which I don't like, though it passes my tests. I said that long long is an abomination. 'ull' doesn't even have any effect in this case. It was last needed for gcc -std=before-c99 on i386, since C before C99 doesn't have the abomination and gcc warns about constants that are so large that there type is [unsigned] long long. There are no such constants on 64-bit arches, but there are some on 32-bit arches. clang is too broken to warn about this. Now kern.pre.mk enforces std=c99 although this is wrong (gnu99 is needed, as in userland), so the warning never occurs even for gcc. If you don't use explicitly typed constants, then constants automatically have the correct type. This is long long or unsigned long long for large constants on i386 -- there is no way to avoid the abomination then. But on amd64, the correct type for large constants is long or unsigned long. You might need ull constants to get unsigned arithmetic instead of signed arithmetic, where the natural type of the large constants is only long or long long. Your comments allude to this. But the conversions are too simple or bogusly typed to need this here -- almost all variable types are uint64_t although things overflow at about UINT32_MAX, so the natural arithmetic is uint64_t except on exotic unsupported/unsupportable arches where uint64_t has lower rank than int (then uint64_t promotes to only int, so there may be sign extension bugs, but using explicit unsigned long long constants gives some further promotions to unsigned long long). Using 'ull' is also a lexical style bug. Old code consistently uses LL. This was needed for the obfuscation given by using large decimal constants 10-15 years ago before -std was c99. This gives signed constants, and old code also consistently bogusly casts these constants to uint64_t to ensure unsigned arithmetic (except on the exotic arches). Bugs in the committed version: XX Log: XX Ensure that all values of ns, us and ms work for {n,u,m}stosbt All values don't work. E.g., since nstosbt multiplies a 64-bit unsigned value by approx. 4 and the result is a signed 64-bit value, values larger than about 1/8 of the maximum in ns overflow to a garbage negative sbt value. XX Also, use a shift one fewer left to avoid integer overflow causing XX incorrect results, and adjust the equasion accordingly. Document this. I'm not sure how this helps. When I tried it with my most accurate version, it reduced the accuracy of sbt values from +-1 in sbt units to +-2, for the obvious reason than the sloppy shift loses 1 bit. XX A test program to test all cases will be committed shortly. The test XX exaustively tries every value (thanks to bde for the test). Testing all 2**64 cases given by pretending to support times above 1 second makes exhaustive testing impossible. XX Modified: head/sys/sys/time.h XX ============================================================================== XX --- head/sys/sys/time.h Tue Nov 20 01:59:57 2018 (r340663) XX +++ head/sys/sys/time.h Tue Nov 20 07:11:23 2018 (r340664) XX @@ -162,9 +162,24 @@ sbttobt(sbintime_t _sbt) XX * large roundoff errors which sbttons() and nstosbt() avoid. Millisecond and XX * microsecond functions are also provided for completeness. XX * XX - * These functions return the smallest sbt larger or equal to the number of XX - * seconds requested so that sbttoX(Xtosbt(y)) == y. The 1 << 32 - 1 term added XX - * transforms the >> 32 from floor() to ceil(). XX + * These functions return the smallest sbt larger or equal to the XX + * number of seconds requested so that sbttoX(Xtosbt(y)) == y. Unlike No, these functions return an sbt that is much larger than the perfectly rounded one. See my old mail for the details, and for a variant that does perfect rounding. XX + * The naive computation we'd do is this XX + * ((unit * 2^64 / SIFACTOR) + 2^32-1) >> 32 XX + * However, that overflows. Instead, we compute XX + * ((unit * 2^63 / SIFACTOR) + 2^31-1) >> 32 XX + * and use pre-computed constants that are the ceil of the 2^63 / SIFACTOR XX + * term to ensure we are using exactly the right constant. We use the lesser XX + * evil of ull rather than a uint64_t cast to ensure we have well defined XX + * right shift semantics. With these changes, we get all the ns, us and ms XX + * conversions back and forth right. XX + * Note: This file is used for both kernel and userland includes, so we can't XX + * rely on KASSERT being defined, nor can we pollute the namespace by including XX + * assert.h. Many style bugs: - this comment has fancy formatting, but it is not protected by '/*-'. - this comment misformats sentence breaks as 1 spaces. Old comments in this file were carefully formatted to use 2 spaces. There is no need to calculate these constants manually. ISTR spending many mails instructing ian@ about their previous correct caclulation. The KASSERTs are never defined in the kernel either. See below. XX */ XX static __inline int64_t XX sbttons(sbintime_t _sbt) XX @@ -176,8 +191,18 @@ sbttons(sbintime_t _sbt) XX static __inline sbintime_t XX nstosbt(int64_t _ns) XX { XX + sbintime_t sb = 0; Style bug: initialization in declaration. XX XX - return ((_ns * (((uint64_t)1 << 63) / 500000000) + (1ull << 32) - 1) >> 32); XX +#ifdef KASSERT XX + KASSERT(_ns >= 0, ("Negative values illegal for nstosbt: %jd", _ns)); XX +#endif KASSERT is never defined here in correct kernel code, due to standard namespace pollution. Correct kernel code includes <sys/param.h> first, and in the kernel <sys/param.h> includes this file. KASSERT is not defined then. Correct kernel code includes <sys/systm.h> second. It takes chumminess with the misimplementation to get KASSERT defined here. <sys/systm.h> has grown much more massive and disgusting pollution than <sys/param.h>, but not enough for it to be self-sufficient. First you have to provide a replacement for <sys/param.h> that includes almost everything in it except <sys/time.h>. Include that and then <sys/systm.h>. <sys/systm.h> defines KASSERT early. It includes <sys/param.h> in is grosser pollution section later. This includes <sys/time.h>, and KASSERT is defined before it is used in this case. XX + if (_ns >= SBT_1S) { XX + sb = (_ns / 1000000000) * SBT_1S; XX + _ns = _ns % 1000000000; XX + } Not too pessimal since this case is rarely reached. XX + /* 9223372037 = ceil(2^63 / 1000000000) */ XX + sb += ((_ns * 9223372037ull) + 0x7fffffff) >> 31; This is harder to understand than my versions. First I fix the obfuscations and other style bugs in it: - contrary to what I said above, everything isn't uint64_t, so there is a sign extension bug in practice which the explicit unsigned long long constant fixes in an ugly way. _ns is only int64_t, and the natural constant is not unsigned, so the multiplication by the natural constant overflows to a negative value at about 0.5 seconds and then the shift is signed and messes up the negative value some more - however, there is no need to reduce the powers of 2 by 1. Cleaned up version with manually calculated multiplicative constant: /* 18446744973 = ceil(2**64 / 10**9) */ sb += (_ns * (uint64_t)18446744973 + 0xffffffff) >> 32; (Also remove comments about reducing the powers by 1.) Cleaned up version with the compiler calculating the multiplicative constant: sb += (_ns * (((uint64_t)1 << 63) / 500000000 + 1) + 0xffffffff) >> 32; The additive constant is harder to understand. Rounding up the multiplicative constant gives an increment that is higher than the infinite-precision value before the addition and the shift, but the shift often loses the extras. The additive constant keeps enough extras. The resulting sbt is often too high, but the important case when _ns = 0 is not converted to a nonzero sbt like some of my versions do. Since rounding up to the ceiling increases the multiplier by ~0.3 in units of 2**32, when _ns = 10**9-1, the increase from the scaling is ~300e8 and the increase from the additive constant is ~2**32 for a total increase of not much more than 1.1*2**32 in units of 2**32 or 1.1 in sbt units. This means that the sbt value is often 1 higher than the correctly rounded ceiling. The versions that round down the multiplier reduce it by ~0.7 in units of 2**32 and need an additive correction of about 4 sbt units to compensate. Most or all of my versions added this even when _ns = 0 when it is too much. It is not obvious that converting sbt values that are 1.1 units too high back to nanoseconds restores the original value (by rounding down). Increasing the multiplier by 1 more gives sbt values that are only 0.2 units higher than 1.1 when _ns = 10**9-1, but breaks the conversion back in 85% of cases (not just for large _ns). XX + return (sb); XX } XX XX static __inline int64_t XX ... Similarly. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20181120225901.H1386>