From owner-freebsd-hackers@freebsd.org Fri Mar 1 19:42:31 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4FE1715227CD; Fri, 1 Mar 2019 19:42:31 +0000 (UTC) (envelope-from kib@freebsd.org) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id B4F1A8EC03; Fri, 1 Mar 2019 19:42:30 +0000 (UTC) (envelope-from kib@freebsd.org) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id x21JgHCF094212 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Fri, 1 Mar 2019 21:42:20 +0200 (EET) (envelope-from kib@freebsd.org) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua x21JgHCF094212 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id x21JgHeH094211; Fri, 1 Mar 2019 21:42:17 +0200 (EET) (envelope-from kib@freebsd.org) X-Authentication-Warning: tom.home: kostik set sender to kib@freebsd.org using -f Date: Fri, 1 Mar 2019 21:42:17 +0200 From: Konstantin Belousov To: Bruce Evans Cc: Mark Millard , freebsd-hackers Hackers , FreeBSD PowerPC ML Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed] Message-ID: <20190301194217.GB68879@kib.kiev.ua> References: <20190228145542.GT2420@kib.kiev.ua> <20190228150811.GU2420@kib.kiev.ua> <962D78C3-65BE-40C1-BB50-A0088223C17B@yahoo.com> <28C2BB0A-3DAA-4D18-A317-49A8DD52778F@yahoo.com> <20190301112717.GW2420@kib.kiev.ua> <20190302043936.A4444@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190302043936.A4444@besplex.bde.org> User-Agent: Mutt/1.11.3 (2019-02-01) X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on tom.home X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Mar 2019 19:42:31 -0000 On Sat, Mar 02, 2019 at 05:40:58AM +1100, Bruce Evans wrote: > On Fri, 1 Mar 2019, Konstantin Belousov wrote: > > > On Thu, Feb 28, 2019 at 11:39:06PM -0800, Mark Millard wrote: > >> [The new, trial code also has truncation occurring.] > > In fact no, I do not think it is. > > > >> An example observation of diff_scaled having an overflowed > >> value was: > >> > >> scale_factor == 0x80da2067ac > >> scale_factor*freq overflows unsigned, 64 bit representation. > >> tim_offset == 0x3da0eaeb > >> tim_cnt == 0x42dea3c4 > >> tim_diff == 0x53db8d9 > >> For reference: 0x1fc9d43 == 0xffffffffffffffffull/scale_factor > >> scaled_diff == 0xA353A5BF3FF780CC (truncated to 64 bits) > >> > >> But for the new, trail code: > >> > >> 0x80da2067ac is 40 bits > >> 0x53db8d9 is 27 bits > >> So 67 bits, more than 63. Then: > >> > >> x > >> == (0x80da2067ac>>32) * 0x53db8d9 > >> == 0x80 * 0x53db8d9 > >> == 0x29EDC6C80 > >> > >> x>>32 > >> == 0x2 > >> > >> x<<32 > >> == 0x9EDC6C8000000000 (limited to 64 bits) > >> Note the truncation of: 0x29EDC6C8000000000. > > Right, this is how the patch is supposed to work. Note that the overflow > > bits 'lost' due to overflow of the left shift are the same bits that as > > used to increment bt->sec: > > bt->sec += x >> 32; > > So the 2 seconds are accounted for. > > > >> > >> Thus the "bintime_addx(bt, x << 32)" is still > >> based on a truncated value. > > > > I must admit that 2 seconds of interval where the timehands where > > not updated is too much. This might be the real cause of all ppc > > troubles. I tried to see if the overflow case is possible on amd64, > > and did not get a single case of the '> 63' branch executed during the > > /usr/tests/lib/libc run. > > The algorithm requires the update interval to be less than 1 second. > th_scale is 2**64 / tc_frequency, so whatever tc_frequency is, after > 1 second the value of the multiplication is approximately 2**64 so > it overflows about then (depending on rounding). > > The most useful timecounters are TSC's, and these give another overflow > in tc_delta() after 1 second when their frequency is 4 GHz (except the > bogus TSC-low timecounter reduces the frequency to below 2 binary GHz, > so the usual case is overflow after 2 seconds). As I said, I was unable to trigger the overflow on amd64. > > > Actually, the same overflow-prone code exists in libc, so below is the > > updated patch: > > - I added __predict_false() > > - libc multiplication is also done separately for high-order bits. > > (fftclock counterpart is still pending). > > > > diff --git a/lib/libc/sys/__vdso_gettimeofday.c b/lib/libc/sys/__vdso_gettimeofday.c > > index 3749e0473af..a14576988ff 100644 > > --- a/lib/libc/sys/__vdso_gettimeofday.c > > +++ b/lib/libc/sys/__vdso_gettimeofday.c > > @@ -32,6 +32,8 @@ __FBSDID("$FreeBSD$"); > > #include > > #include > > #include > > +#include > > +#include > > #include > > #include > > #include "libc_private.h" > > @@ -62,6 +64,7 @@ binuptime(struct bintime *bt, struct vdso_timekeep *tk, int abs) > > { > > struct vdso_timehands *th; > > uint32_t curr, gen; > > + uint64_t scale, x; > > u_int delta; > > int error; > > > > @@ -78,7 +81,14 @@ binuptime(struct bintime *bt, struct vdso_timekeep *tk, int abs) > > continue; > > if (error != 0) > > return (error); > > - bintime_addx(bt, th->th_scale * delta); > > + scale = th->th_scale; > > + if (__predict_false(fls(scale) + fls(delta) > 63)) { > > This is unnecessarily pessimal. Updates must be frequent enough to prevent > tc_delta() overflowing, and it is even easier to arrange that this > multiplication doesn't overflow (since the necessary update interval for > the latter is constant). > > `scale' is 64 bits, so fls(scale) is broken on 32-bit arches, and > flls(scale) is an especially large pessimization. I saw this on my > calcru1() fixes -- the flls()s take almost as long as long long > divisions when the use the pessimal C versions. Ok, fixed. > > The algorithm requires tc_delta() to be only 32 bits, since otherwise the > multiplication would be 64 x 64 bits so would be much slower and harder > to write. > > If tc_freqency is far above 4GHz, then th_scale is far below 4G, so the > scaling is not so accurate. But 0.25 parts per billion is much more than > enough. Even 1 part per million is enough for a TSC, since TSC instability > is more than 1ppm. The overflows could be pushed off to 1024 seconds by > dividing by 1024 at suitable places. A 32-bit scale times a 64-bit delta > would be simple compared with both 64 bits (much like the code here). > > The code here can be optimized using values calculated at initialization > time instead of fls*(). The overflow threshold for delta is approximately > 2**64 / tc_frequency. > > > + x = (scale >> 32) * delta; > > + scale &= UINT_MAX; > > + bt->sec += x >> 32; > > + bintime_addx(bt, x << 32); > > + } > > + bintime_addx(bt, scale * delta); > > if (abs) > > bintime_add(bt, &th->th_boottime); > > When the timecounter is the i8254, as it often was when timecounters > were new, tc_windup() had to be called more often than every i8254 rollover > (in practice once every hardclock tick), partly to keep tc_delta() small > (since rollover gives a form of overflow). This was not so easy to arrange. > It requires not losing any hardclock ticks and also not having any with > high latency and also complications to detect the rollover when there is > small latency. Most hardware is easier to handle now. With tickless > kernels, hardclock() is often not called for about 1 second, but it must > be called at least that often to prevent the overflow here. Updated patch. diff --git a/lib/libc/sys/__vdso_gettimeofday.c b/lib/libc/sys/__vdso_gettimeofday.c index 3749e0473af..fdefda08e39 100644 --- a/lib/libc/sys/__vdso_gettimeofday.c +++ b/lib/libc/sys/__vdso_gettimeofday.c @@ -32,6 +32,8 @@ __FBSDID("$FreeBSD$"); #include #include #include +#include +#include #include #include #include "libc_private.h" @@ -62,7 +64,8 @@ binuptime(struct bintime *bt, struct vdso_timekeep *tk, int abs) { struct vdso_timehands *th; uint32_t curr, gen; - u_int delta; + uint64_t scale, x; + u_int delta, scale_bits; int error; do { @@ -78,7 +81,19 @@ binuptime(struct bintime *bt, struct vdso_timekeep *tk, int abs) continue; if (error != 0) return (error); - bintime_addx(bt, th->th_scale * delta); + scale = th->th_scale; +#ifdef _LP64 + scale_bits = ffsl(scale); +#else + scale_bits = ffsll(scale); +#endif + if (__predict_false(scale_bits + fls(delta) > 63)) { + x = (scale >> 32) * delta; + scale &= UINT_MAX; + bt->sec += x >> 32; + bintime_addx(bt, x << 32); + } + bintime_addx(bt, scale * delta); if (abs) bintime_add(bt, &th->th_boottime); diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c index 2656fb4d22f..eedea5183c0 100644 --- a/sys/kern/kern_tc.c +++ b/sys/kern/kern_tc.c @@ -72,6 +72,7 @@ struct timehands { struct timecounter *th_counter; int64_t th_adjustment; uint64_t th_scale; + u_int th_scale_bits; u_int th_offset_count; struct bintime th_offset; struct bintime th_bintime; @@ -355,13 +356,22 @@ void binuptime(struct bintime *bt) { struct timehands *th; - u_int gen; + uint64_t scale, x; + u_int delta, gen; do { th = timehands; gen = atomic_load_acq_int(&th->th_generation); *bt = th->th_offset; - bintime_addx(bt, th->th_scale * tc_delta(th)); + scale = th->th_scale; + delta = tc_delta(th); + if (__predict_false(th->th_scale_bits + fls(delta) > 63)) { + x = (scale >> 32) * delta; + scale &= UINT_MAX; + bt->sec += x >> 32; + bintime_addx(bt, x << 32); + } + bintime_addx(bt, scale * delta); atomic_thread_fence_acq(); } while (gen == 0 || gen != th->th_generation); } @@ -388,13 +398,22 @@ void bintime(struct bintime *bt) { struct timehands *th; - u_int gen; + uint64_t scale, x; + u_int delta, gen; do { th = timehands; gen = atomic_load_acq_int(&th->th_generation); *bt = th->th_bintime; - bintime_addx(bt, th->th_scale * tc_delta(th)); + scale = th->th_scale; + delta = tc_delta(th); + if (__predict_false(th->th_scale_bits + fls(delta) > 63)) { + x = (scale >> 32) * delta; + scale &= UINT_MAX; + bt->sec += x >> 32; + bintime_addx(bt, x << 32); + } + bintime_addx(bt, scale * delta); atomic_thread_fence_acq(); } while (gen == 0 || gen != th->th_generation); } @@ -1464,6 +1483,11 @@ tc_windup(struct bintime *new_boottimebin) scale += (th->th_adjustment / 1024) * 2199; scale /= th->th_counter->tc_frequency; th->th_scale = scale * 2; +#ifdef _LP64 + th->th_scale_bits = ffsl(th->th_scale); +#else + th->th_scale_bits = ffsll(th->th_scale); +#endif /* * Now that the struct timehands is again consistent, set the new