From owner-svn-src-head@freebsd.org Mon Nov 19 16:05:42 2018 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BCD21110804C; Mon, 19 Nov 2018 16:05:42 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au [211.29.132.42]) by mx1.freebsd.org (Postfix) with ESMTP id 445B57C598; Mon, 19 Nov 2018 16:05:41 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id 439063D7C56; Tue, 20 Nov 2018 03:05:38 +1100 (AEDT) Date: Tue, 20 Nov 2018 03:05:37 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Bruce Evans cc: Warner Losh , Andriy Gapon , "Rodney W. Grimes" , Allan Jude , Warner Losh , src-committers , svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r340450 - head/sys/sys In-Reply-To: <20181120014951.J1250@besplex.bde.org> Message-ID: <20181120023823.M1428@besplex.bde.org> References: <201811190104.wAJ14CaE059062@pdx.rh.CN85.dnsmgr.net> <5e227743-6463-d395-f2ba-da8d4ba248ca@FreeBSD.org> <20181120005944.B1050@besplex.bde.org> <20181120014951.J1250@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=FNpr/6gs c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=6I5d2MoRAAAA:8 a=4dtU_Np7U4CNcjolt7IA:9 a=CjuIK1q_8ugA:10 a=IjZwj45LgO3ly-622nXo:22 X-Rspamd-Queue-Id: 445B57C598 X-Spamd-Result: default: False [-1.64 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.83)[-0.828,0]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; FREEMAIL_FROM(0.00)[optusnet.com.au]; R_SPF_ALLOW(-0.20)[+ip4:211.29.132.0/23]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[optusnet.com.au]; TO_MATCH_ENVRCPT_SOME(0.00)[]; MX_GOOD(-0.01)[cached: extmail.optusnet.com.au]; NEURAL_HAM_SHORT(-0.50)[-0.499,0]; RCPT_COUNT_SEVEN(0.00)[9]; IP_SCORE(-0.01)[country: AU(-0.04)]; FREEMAIL_TO(0.00)[optusnet.com.au]; RCVD_NO_TLS_LAST(0.10)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[optusnet.com.au]; ASN(0.00)[asn:4804, ipnet:211.28.0.0/14, country:AU]; RCVD_COUNT_TWO(0.00)[2]; RCVD_IN_DNSWL_LOW(-0.10)[42.132.29.211.list.dnswl.org : 127.0.5.1] X-Rspamd-Server: mx1.freebsd.org X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Nov 2018 16:05:43 -0000 On Tue, 20 Nov 2018, Bruce Evans wrote: > On Tue, 20 Nov 2018, Bruce Evans wrote: > >> On Mon, 19 Nov 2018, Warner Losh wrote: >> >>> On Mon, Nov 19, 2018 at 12:31 AM Andriy Gapon wrote: > ... > I found my test program. > >>> But I think I understand the problem now. >>> >>> mstosbt(1000) is overflowing with my change, but not without because we're >>> adding 2^32 to a number that's ~900 away from overflowing changing a very >> >> This value is just invalid. Negative values are obviously invalid. >> Positive >> values of more than 1 second are invalid. In the old version, values of >> precisely 1 second don't overflow since the scale factor is rounded down; >> the result is just 1 unit lower than 1 second. Overflow (actually >> truncation) >> occurs at 1 unit above 1 second. E.g., sbttoms(mstosbt(1000)) was 999 and >> sbttoms(mstosbt(1001)) was 0. Now in my fixed version, >> sbttoms(mstosbt(1000)) >> is truncated to 0 and sbttoms(mstosbt(1001)) is truncated to 1. >> >> The test program also showed that in the old version, all valid args except >> 0 are reduced by precisely 1 by conversion to sbt and back. >> >>> large sleep of 1 second to a tiny sleep of 0 (which is 1ms at the default >>> Hz). I think we get this in the mlx code because there's a msleep(1000) in >>> at least one place. msleep(1001) would have failed before my change. Now I >>> think msleep(999) works, but msleep(1000) fails. Since the code is waiting >>> a second for hardware to initialize, a 1ms instead is likely to catch the >>> hardware before it's finished. I think this will cause problems no matter >>> cold or not, since it's all pause_sbt() under the covers and a delay of 0 >>> is still 0 either way. >> >> Bug in the mlx code. The seconds part must be converted separately, as in >> tstosbt(). > > mlx doesn't seem to use sbt directly, or even msleep(), so the bug does > seem to be central. > > mlx actually uses mtx_sleep() with timo = hz(). mtx_sleep() is actually > an obfuscated macro wrapping _sleep(). The conversion to sbt is done by > the macro and is sloppy. It multiplies timo by tick_sbt. > > tick_sbt is SBT_1S / hz, so the sbt is SBT_1S / hz * hz which is SBT_1S > reduced a little. This is not affected by the new code, and it still isn't > converted back to 1 second in ms, us or ns. Even if it is converted back > and then forth to sbt using the new code, it remains less than 1 second as > an sbt so shouldn't cause any new problems. Here are all uses of these functions in kern outside of sys/time.h: XX arm/arm/mpcore_timer.c: sc->et.et_min_period = nstosbt(20); OK since 20 ns is less than 1 second. XX cddl/compat/opensolaris/sys/kcondvar.h: return (cv_timedwait_sbt(cvp, mp, nstosbt(tim), nstosbt(res), 0)); XX cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c: pause_sbt("dmu_tx_delay", nstosbt(wakeup), XX cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c: nstosbt(zfs_delay_resolution_ns), C_ABSOLUTE); XX cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c: sbintime_t sleep = nstosbt((zilog->zl_last_lwb_latency * pct) / 100); XX compat/linuxkpi/common/include/linux/delay.h: pause_sbt("lnxsleep", mstosbt(ms), 0, C_HARDCLOCK); XX compat/linuxkpi/common/src/linux_hrtimer.c: nstosbt(hrtimer->expires), nstosbt(hrtimer->precision), 0); XX compat/linuxkpi/common/src/linux_hrtimer.c: callout_reset_sbt(&hrtimer->callout, nstosbt(ktime_to_ns(time)), XX compat/linuxkpi/common/src/linux_hrtimer.c: nstosbt(nsec), hrtimer_call_handler, hrtimer, 0); XX compat/linuxkpi/common/src/linux_hrtimer.c: callout_reset_sbt(&hrtimer->callout, nstosbt(ktime_to_ns(interval)), XX compat/linuxkpi/common/src/linux_hrtimer.c: nstosbt(hrtimer->precision), hrtimer_call_handler, hrtimer, 0); XX compat/linuxkpi/common/src/linux_schedule.c: ret = -pause_sbt("lnxsleep", mstosbt(ms), 0, C_HARDCLOCK | C_CATCH); All of the above might are broken unless their timeout arg is restricted to less than 1 second. Also, at least the sleep and pause calls in the above probably have a style bug in knowing about sbt's at all. More important functions like msleep() hide the use of sbt's and use fuzzier scale factors like tick_sbt. XX dev/iicbus/nxprtc.c: pause_sbt("nxpotp", mstosbt(100), mstosbt(10), 0); XX dev/iicbus/nxprtc.c: pause_sbt("nxpbat", mstosbt(100), 0, 0); OK since 100 ms is less than 1 second. XX kern/kern_sysctl.c: sb = ustosbt(tt); XX kern/kern_sysctl.c: sb = mstosbt(tt); Recently added bugs. The args is supplied by applications so can be far above 1 second. Also, these functions have a bogus API that takes uint64_t args. sbt's can't represent that high, and the API doesn't support failure, so callers have the burden of passing valid values. It is easiest to only support args up to 1 second and require the caller to handle seconds. Lots of itimer and select() code handle the corresponding problem for timevals and timespecs almost correctly. Timevals and timespecs already have the seconds part separate and itimer and select() syscalls do validity checking on the fractional seconds, so there is no problem converting the fractional sections to an sbt. Howver, the seconds part has type time_t and this can be int64_t, which sbt's cannot represent. Also, POSIX doesn't permit failure for seconds values that are representable by time_t. The almost correct handling is to split up large seconds values in some cases and break POSIX conformance by silently reducing the seconds value in other cases. The reduction is usually to INT32_MAX / 2. This allows adding 2 limited values but it is not obvious that this is enough since rounding up twice or just adding 2 more would give overflow. XX kern/subr_rtc.c: sbt = nstosbt(waitns); This is correct, since waitns is the nanoseconds part of a timespec. Bruce