Date: Fri, 1 Mar 2019 21:42:17 +0200 From: Konstantin Belousov <kib@freebsd.org> To: Bruce Evans <brde@optusnet.com.au> Cc: Mark Millard <marklmi@yahoo.com>, freebsd-hackers Hackers <freebsd-hackers@freebsd.org>, FreeBSD PowerPC ML <freebsd-ppc@freebsd.org> Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed] Message-ID: <20190301194217.GB68879@kib.kiev.ua> In-Reply-To: <20190302043936.A4444@besplex.bde.org> References: <D3D7E9F4-9A5E-4320-B3C8-EC5CEF4A2764@yahoo.com> <20190228145542.GT2420@kib.kiev.ua> <20190228150811.GU2420@kib.kiev.ua> <962D78C3-65BE-40C1-BB50-A0088223C17B@yahoo.com> <28C2BB0A-3DAA-4D18-A317-49A8DD52778F@yahoo.com> <20190301112717.GW2420@kib.kiev.ua> <20190302043936.A4444@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Mar 02, 2019 at 05:40:58AM +1100, Bruce Evans wrote: > On Fri, 1 Mar 2019, Konstantin Belousov wrote: > > > On Thu, Feb 28, 2019 at 11:39:06PM -0800, Mark Millard wrote: > >> [The new, trial code also has truncation occurring.] > > In fact no, I do not think it is. > > > >> An example observation of diff_scaled having an overflowed > >> value was: > >> > >> scale_factor == 0x80da2067ac > >> scale_factor*freq overflows unsigned, 64 bit representation. > >> tim_offset == 0x3da0eaeb > >> tim_cnt == 0x42dea3c4 > >> tim_diff == 0x53db8d9 > >> For reference: 0x1fc9d43 == 0xffffffffffffffffull/scale_factor > >> scaled_diff == 0xA353A5BF3FF780CC (truncated to 64 bits) > >> > >> But for the new, trail code: > >> > >> 0x80da2067ac is 40 bits > >> 0x53db8d9 is 27 bits > >> So 67 bits, more than 63. Then: > >> > >> x > >> == (0x80da2067ac>>32) * 0x53db8d9 > >> == 0x80 * 0x53db8d9 > >> == 0x29EDC6C80 > >> > >> x>>32 > >> == 0x2 > >> > >> x<<32 > >> == 0x9EDC6C8000000000 (limited to 64 bits) > >> Note the truncation of: 0x29EDC6C8000000000. > > Right, this is how the patch is supposed to work. Note that the overflow > > bits 'lost' due to overflow of the left shift are the same bits that as > > used to increment bt->sec: > > bt->sec += x >> 32; > > So the 2 seconds are accounted for. > > > >> > >> Thus the "bintime_addx(bt, x << 32)" is still > >> based on a truncated value. > > > > I must admit that 2 seconds of interval where the timehands where > > not updated is too much. This might be the real cause of all ppc > > troubles. I tried to see if the overflow case is possible on amd64, > > and did not get a single case of the '> 63' branch executed during the > > /usr/tests/lib/libc run. > > The algorithm requires the update interval to be less than 1 second. > th_scale is 2**64 / tc_frequency, so whatever tc_frequency is, after > 1 second the value of the multiplication is approximately 2**64 so > it overflows about then (depending on rounding). > > The most useful timecounters are TSC's, and these give another overflow > in tc_delta() after 1 second when their frequency is 4 GHz (except the > bogus TSC-low timecounter reduces the frequency to below 2 binary GHz, > so the usual case is overflow after 2 seconds). As I said, I was unable to trigger the overflow on amd64. > > > Actually, the same overflow-prone code exists in libc, so below is the > > updated patch: > > - I added __predict_false() > > - libc multiplication is also done separately for high-order bits. > > (fftclock counterpart is still pending). > > > > diff --git a/lib/libc/sys/__vdso_gettimeofday.c b/lib/libc/sys/__vdso_gettimeofday.c > > index 3749e0473af..a14576988ff 100644 > > --- a/lib/libc/sys/__vdso_gettimeofday.c > > +++ b/lib/libc/sys/__vdso_gettimeofday.c > > @@ -32,6 +32,8 @@ __FBSDID("$FreeBSD$"); > > #include <sys/time.h> > > #include <sys/vdso.h> > > #include <errno.h> > > +#include <limits.h> > > +#include <strings.h> > > #include <time.h> > > #include <machine/atomic.h> > > #include "libc_private.h" > > @@ -62,6 +64,7 @@ binuptime(struct bintime *bt, struct vdso_timekeep *tk, int abs) > > { > > struct vdso_timehands *th; > > uint32_t curr, gen; > > + uint64_t scale, x; > > u_int delta; > > int error; > > > > @@ -78,7 +81,14 @@ binuptime(struct bintime *bt, struct vdso_timekeep *tk, int abs) > > continue; > > if (error != 0) > > return (error); > > - bintime_addx(bt, th->th_scale * delta); > > + scale = th->th_scale; > > + if (__predict_false(fls(scale) + fls(delta) > 63)) { > > This is unnecessarily pessimal. Updates must be frequent enough to prevent > tc_delta() overflowing, and it is even easier to arrange that this > multiplication doesn't overflow (since the necessary update interval for > the latter is constant). > > `scale' is 64 bits, so fls(scale) is broken on 32-bit arches, and > flls(scale) is an especially large pessimization. I saw this on my > calcru1() fixes -- the flls()s take almost as long as long long > divisions when the use the pessimal C versions. Ok, fixed. > > The algorithm requires tc_delta() to be only 32 bits, since otherwise the > multiplication would be 64 x 64 bits so would be much slower and harder > to write. > > If tc_freqency is far above 4GHz, then th_scale is far below 4G, so the > scaling is not so accurate. But 0.25 parts per billion is much more than > enough. Even 1 part per million is enough for a TSC, since TSC instability > is more than 1ppm. The overflows could be pushed off to 1024 seconds by > dividing by 1024 at suitable places. A 32-bit scale times a 64-bit delta > would be simple compared with both 64 bits (much like the code here). > > The code here can be optimized using values calculated at initialization > time instead of fls*(). The overflow threshold for delta is approximately > 2**64 / tc_frequency. > > > + x = (scale >> 32) * delta; > > + scale &= UINT_MAX; > > + bt->sec += x >> 32; > > + bintime_addx(bt, x << 32); > > + } > > + bintime_addx(bt, scale * delta); > > if (abs) > > bintime_add(bt, &th->th_boottime); > > When the timecounter is the i8254, as it often was when timecounters > were new, tc_windup() had to be called more often than every i8254 rollover > (in practice once every hardclock tick), partly to keep tc_delta() small > (since rollover gives a form of overflow). This was not so easy to arrange. > It requires not losing any hardclock ticks and also not having any with > high latency and also complications to detect the rollover when there is > small latency. Most hardware is easier to handle now. With tickless > kernels, hardclock() is often not called for about 1 second, but it must > be called at least that often to prevent the overflow here. Updated patch. diff --git a/lib/libc/sys/__vdso_gettimeofday.c b/lib/libc/sys/__vdso_gettimeofday.c index 3749e0473af..fdefda08e39 100644 --- a/lib/libc/sys/__vdso_gettimeofday.c +++ b/lib/libc/sys/__vdso_gettimeofday.c @@ -32,6 +32,8 @@ __FBSDID("$FreeBSD$"); #include <sys/time.h> #include <sys/vdso.h> #include <errno.h> +#include <limits.h> +#include <strings.h> #include <time.h> #include <machine/atomic.h> #include "libc_private.h" @@ -62,7 +64,8 @@ binuptime(struct bintime *bt, struct vdso_timekeep *tk, int abs) { struct vdso_timehands *th; uint32_t curr, gen; - u_int delta; + uint64_t scale, x; + u_int delta, scale_bits; int error; do { @@ -78,7 +81,19 @@ binuptime(struct bintime *bt, struct vdso_timekeep *tk, int abs) continue; if (error != 0) return (error); - bintime_addx(bt, th->th_scale * delta); + scale = th->th_scale; +#ifdef _LP64 + scale_bits = ffsl(scale); +#else + scale_bits = ffsll(scale); +#endif + if (__predict_false(scale_bits + fls(delta) > 63)) { + x = (scale >> 32) * delta; + scale &= UINT_MAX; + bt->sec += x >> 32; + bintime_addx(bt, x << 32); + } + bintime_addx(bt, scale * delta); if (abs) bintime_add(bt, &th->th_boottime); diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c index 2656fb4d22f..eedea5183c0 100644 --- a/sys/kern/kern_tc.c +++ b/sys/kern/kern_tc.c @@ -72,6 +72,7 @@ struct timehands { struct timecounter *th_counter; int64_t th_adjustment; uint64_t th_scale; + u_int th_scale_bits; u_int th_offset_count; struct bintime th_offset; struct bintime th_bintime; @@ -355,13 +356,22 @@ void binuptime(struct bintime *bt) { struct timehands *th; - u_int gen; + uint64_t scale, x; + u_int delta, gen; do { th = timehands; gen = atomic_load_acq_int(&th->th_generation); *bt = th->th_offset; - bintime_addx(bt, th->th_scale * tc_delta(th)); + scale = th->th_scale; + delta = tc_delta(th); + if (__predict_false(th->th_scale_bits + fls(delta) > 63)) { + x = (scale >> 32) * delta; + scale &= UINT_MAX; + bt->sec += x >> 32; + bintime_addx(bt, x << 32); + } + bintime_addx(bt, scale * delta); atomic_thread_fence_acq(); } while (gen == 0 || gen != th->th_generation); } @@ -388,13 +398,22 @@ void bintime(struct bintime *bt) { struct timehands *th; - u_int gen; + uint64_t scale, x; + u_int delta, gen; do { th = timehands; gen = atomic_load_acq_int(&th->th_generation); *bt = th->th_bintime; - bintime_addx(bt, th->th_scale * tc_delta(th)); + scale = th->th_scale; + delta = tc_delta(th); + if (__predict_false(th->th_scale_bits + fls(delta) > 63)) { + x = (scale >> 32) * delta; + scale &= UINT_MAX; + bt->sec += x >> 32; + bintime_addx(bt, x << 32); + } + bintime_addx(bt, scale * delta); atomic_thread_fence_acq(); } while (gen == 0 || gen != th->th_generation); } @@ -1464,6 +1483,11 @@ tc_windup(struct bintime *new_boottimebin) scale += (th->th_adjustment / 1024) * 2199; scale /= th->th_counter->tc_frequency; th->th_scale = scale * 2; +#ifdef _LP64 + th->th_scale_bits = ffsl(th->th_scale); +#else + th->th_scale_bits = ffsll(th->th_scale); +#endif /* * Now that the struct timehands is again consistent, set the new
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20190301194217.GB68879>