Date: Sun, 3 Mar 2019 13:19:31 +0200 From: Konstantin Belousov <kostikbel@gmail.com> To: Bruce Evans <brde@optusnet.com.au> Cc: Mark Millard <marklmi@yahoo.com>, freebsd-hackers Hackers <freebsd-hackers@freebsd.org>, FreeBSD PowerPC ML <freebsd-ppc@freebsd.org> Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed] Message-ID: <20190303111931.GI68879@kib.kiev.ua> In-Reply-To: <20190303041441.V4781@besplex.bde.org> References: <962D78C3-65BE-40C1-BB50-A0088223C17B@yahoo.com> <28C2BB0A-3DAA-4D18-A317-49A8DD52778F@yahoo.com> <20190301112717.GW2420@kib.kiev.ua> <20190302043936.A4444@besplex.bde.org> <20190301194217.GB68879@kib.kiev.ua> <20190302071425.G5025@besplex.bde.org> <20190302105140.GC68879@kib.kiev.ua> <20190302225513.W3408@besplex.bde.org> <20190302142521.GE68879@kib.kiev.ua> <20190303041441.V4781@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Mar 03, 2019 at 04:43:20AM +1100, Bruce Evans wrote: > On Sat, 2 Mar 2019, Konstantin Belousov wrote: > > > On Sun, Mar 03, 2019 at 12:03:18AM +1100, Bruce Evans wrote: > >> On Sat, 2 Mar 2019, Konstantin Belousov wrote: > >>> ... > >>> So I am able to reproduce it with some surprising ease on HPET running > >>> on Haswell. > >> > >> So what is the cause of it? Maybe the tickless code doesn't generate > >> fake clock ticks right. Or it is just a library bug. The kernel has > >> to be slightly real-time to satisfy the requirement of 1 update per. > >> Applications are further from being real-time. But isn't it enough > >> for the kernel to ensure that the timehands cycle more than once per > >> second? > > No, I entered ddb as you suggested. > > But using ddb is not normal. It is convenient that this fixes HPET and > ACPI timecounters after using ddb, but this method doesn't help for > timecounters that wrap fast. TSC-low at 2GHz wraps in 2 seconds, and > i8254 wraps in a few milliseconds. > > >> I don't changing this at all this. binuptime() was carefully written > >> to not need so much 64-bit arithmetic. > >> > >> If this pessimization is allowed, then it can also handle a 64-bit > >> deltas. Using the better kernel method: > >> > >> if (__predict_false(delta >= th->th_large_delta)) { > >> bt->sec += (scale >> 32) * (delta >> 32); > >> x = (scale >> 32) * (delta & 0xffffffff); > >> bt->sec += x >> 32; > >> bintime_addx(bt, x << 32); > >> x = (scale & 0xffffffff) * (delta >> 32); > >> bt->sec += x >> 32; > >> bintime_addx(bt, x << 32); > >> bintime_addx(bt, (scale & 0xffffffff) * > >> (delta & 0xffffffff)); > >> } else > >> bintime_addx(bt, scale * (delta & 0xffffffff)); > > This only makes sense if delta is extended to uint64_t, which requires > > the pass over timecounters. > > Yes, that was its point. It is a bit annoying to have a hardware > timecounter like the TSC that doesn't wrap naturally, but then make it > wrap by masking high bits. > > The masking step is also a bit wasteful. For the TSC, it is 1 step to > discard high bids at the register level, then another step to apply the > nask to discard th high bits again. rdtsc-low is implemented in the natural way, after RDTSC, no register combining into 64bit value is done, instead shrd operates on %edx:%eax to get the final result into %eax. I am not sure what you refer to. > > >> I just noticed that there is a 64 x 32 -> 64 bit multiplication in the > >> current method. This can be changed to do expicit 32 x 32 -> 64 bit > >> multiplications and fix the overflow problem at small extra cost on > >> 32-bit arches: > >> > >> /* 32-bit arches did the next multiplication implicitly. */ > >> x = (scale >> 32) * delta; > >> /* > >> * And they did the following shifts and most of the adds > >> * implicitly too. Except shifting x left by 32 lost the > >> * seconds part that the next line handles. The next line > >> * is the only extra cost for them. > >> */ > >> bt->sec += x >> 32; > >> bintime_addx(bt, (x << 32) + (scale & 0xffffffff) * delta); > > > > Ok, what about the following. > > I'm not sure that I really want this, even if the pessimization is done. > But it avoids using fls*(), so is especially good for 32-bit systems and > OK for 64-bit systems too, especially in userland where fls*() is in the > fast path. For userland I looked at the generated code, and BSR usage seems to be good enough, for default compilation settings with clang. > > > > > diff --git a/lib/libc/sys/__vdso_gettimeofday.c b/lib/libc/sys/__vdso_gettimeofday.c > > index 3749e0473af..cfe3d96d001 100644 > > --- a/lib/libc/sys/__vdso_gettimeofday.c > > +++ b/lib/libc/sys/__vdso_gettimeofday.c > > @@ -32,6 +32,8 @@ __FBSDID("$FreeBSD$"); > > #include <sys/time.h> > > #include <sys/vdso.h> > > #include <errno.h> > > +#include <limits.h> > > Not needed with 0xffffffff instead of UINT_MAX. > > The userland part is otherwise little changed. Yes, see above. If ABI for shared page going to be changed in some future, I will export th_large_delta as well. > > > diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c > > index 2656fb4d22f..2e28f872229 100644 > > --- a/sys/kern/kern_tc.c > > +++ b/sys/kern/kern_tc.c > > ... > > @@ -351,17 +352,44 @@ fbclock_getmicrotime(struct timeval *tvp) > > } while (gen == 0 || gen != th->th_generation); > > } > > #else /* !FFCLOCK */ > > + > > +static void > > +bintime_helper(struct bintime *bt, uint64_t *scale, u_int delta) > > +{ > > + uint64_t x; > > + > > + x = (*scale >> 32) * delta; > > + *scale &= 0xffffffff; > > + bt->sec += x >> 32; > > + bintime_addx(bt, x << 32); > > +} > > It is probably best to not inline the slow path, but clang tends to > inline everything anyway. It does not matter if it inlines it, as far as it is moved out of the linear sequence for the fast path. > > I prefer my way of writing this in 3 lines. Modifying 'scale' for > the next step is especially ugly and pessimal when the next step is > in the caller and this function is not inlined. Can you show exactly what do you want ? > > > + > > void > > binuptime(struct bintime *bt) > > { > > struct timehands *th; > > - u_int gen; > > + uint64_t scale; > > + u_int delta, gen; > > > > do { > > th = timehands; > > gen = atomic_load_acq_int(&th->th_generation); > > *bt = th->th_offset; > > - bintime_addx(bt, th->th_scale * tc_delta(th)); > > + scale = th->th_scale; > > + delta = tc_delta(th); > > +#ifdef _LP64 > > + /* Avoid overflow for scale * delta. */ > > + if (__predict_false(th->th_large_delta <= delta)) > > + bintime_helper(bt, &scale, delta); > > + bintime_addx(bt, scale * delta); > > +#else > > + /* > > + * Also avoid (uint64_t, uint32_t) -> uint64_t > > + * multiplication on 32bit arches. > > + */ > > "Also avoid overflow for ..." > > > + bintime_helper(bt, &scale, delta); > > + bintime_addx(bt, (u_int)scale * delta); > > The cast should be to uint32_t, but better write it as & 0xffffffff as > elsewhere. > > bintime_helper() already reduced 'scale' to 32 bits. The cast might be > needed to tell the compiler this, especially when the function is not > inlined. Better not do it in the function. The function doesn't even > use the reduced value. I used cast to use 32x32 multiplication. I am not sure that all (or any) compilers are smart enough to deduce that they can use 32 bit mul. > > bintime_helper() is in the fast path in this case, so should be inlined. > > > +#endif > > atomic_thread_fence_acq(); > > } while (gen == 0 || gen != th->th_generation); > > } > > This needs lots of testing of course. Current kernel-only part of the change is below, see the question about your preference for binuptime_helper(). diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c index 2656fb4d22f..6c41ab22288 100644 --- a/sys/kern/kern_tc.c +++ b/sys/kern/kern_tc.c @@ -72,6 +71,7 @@ struct timehands { struct timecounter *th_counter; int64_t th_adjustment; uint64_t th_scale; + uint64_t th_large_delta; u_int th_offset_count; struct bintime th_offset; struct bintime th_bintime; @@ -351,17 +351,45 @@ fbclock_getmicrotime(struct timeval *tvp) } while (gen == 0 || gen != th->th_generation); } #else /* !FFCLOCK */ + +static void +bintime_helper(struct bintime *bt, uint64_t *scale, u_int delta) +{ + uint64_t x; + + x = (*scale >> 32) * delta; + *scale &= 0xffffffff; + bt->sec += x >> 32; + bintime_addx(bt, x << 32); +} + void binuptime(struct bintime *bt) { struct timehands *th; - u_int gen; + uint64_t scale; + u_int delta, gen; do { th = timehands; gen = atomic_load_acq_int(&th->th_generation); *bt = th->th_offset; - bintime_addx(bt, th->th_scale * tc_delta(th)); + scale = th->th_scale; + delta = tc_delta(th); +#ifdef _LP64 + /* Avoid overflow for scale * delta. */ + if (__predict_false(th->th_large_delta <= delta)) + bintime_helper(bt, &scale, delta); + bintime_addx(bt, scale * delta); +#else + /* + * Avoid both overflow as above and + * (uint64_t, uint32_t) -> uint64_t + * multiplication on 32bit arches. + */ + bintime_helper(bt, &scale, delta); + bintime_addx(bt, (uint32_t)scale * delta); +#endif atomic_thread_fence_acq(); } while (gen == 0 || gen != th->th_generation); } @@ -388,13 +416,29 @@ void bintime(struct bintime *bt) { struct timehands *th; - u_int gen; + uint64_t scale; + u_int delta, gen; do { th = timehands; gen = atomic_load_acq_int(&th->th_generation); *bt = th->th_bintime; - bintime_addx(bt, th->th_scale * tc_delta(th)); + scale = th->th_scale; + delta = tc_delta(th); +#ifdef _LP64 + /* Avoid overflow for scale * delta. */ + if (__predict_false(th->th_large_delta <= delta)) + bintime_helper(bt, &scale, delta); + bintime_addx(bt, scale * delta); +#else + /* + * Avoid both overflow as above and + * (uint64_t, uint32_t) -> uint64_t + * multiplication on 32bit arches. + */ + bintime_helper(bt, &scale, delta); + bintime_addx(bt, (uint32_t)scale * delta); +#endif atomic_thread_fence_acq(); } while (gen == 0 || gen != th->th_generation); } @@ -1464,6 +1508,7 @@ tc_windup(struct bintime *new_boottimebin) scale += (th->th_adjustment / 1024) * 2199; scale /= th->th_counter->tc_frequency; th->th_scale = scale * 2; + th->th_large_delta = ((uint64_t)1 << 63) / scale; /* * Now that the struct timehands is again consistent, set the new
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20190303111931.GI68879>