Date: Sun, 3 Mar 2019 00:03:18 +1100 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Konstantin Belousov <kostikbel@gmail.com> Cc: Mark Millard <marklmi@yahoo.com>, freebsd-hackers Hackers <freebsd-hackers@freebsd.org>, FreeBSD PowerPC ML <freebsd-ppc@freebsd.org> Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed] Message-ID: <20190302225513.W3408@besplex.bde.org> In-Reply-To: <20190302105140.GC68879@kib.kiev.ua> References: <D3D7E9F4-9A5E-4320-B3C8-EC5CEF4A2764@yahoo.com> <20190228145542.GT2420@kib.kiev.ua> <20190228150811.GU2420@kib.kiev.ua> <962D78C3-65BE-40C1-BB50-A0088223C17B@yahoo.com> <28C2BB0A-3DAA-4D18-A317-49A8DD52778F@yahoo.com> <20190301112717.GW2420@kib.kiev.ua> <20190302043936.A4444@besplex.bde.org> <20190301194217.GB68879@kib.kiev.ua> <20190302071425.G5025@besplex.bde.org> <20190302105140.GC68879@kib.kiev.ua>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 2 Mar 2019, Konstantin Belousov wrote: > On Sat, Mar 02, 2019 at 07:38:20AM +1100, Bruce Evans wrote: >> On Fri, 1 Mar 2019, Konstantin Belousov wrote: >>> + scale = th->th_scale; >>> + delta = tc_delta(th); >>> + if (__predict_false(th->th_scale_bits + fls(delta) > 63)) { >> >> Better, but shouldn't be changed (and the bug that causes the large intervals >> remains unlocated), and if it is changed then it should use: >> >> if (delta >= th->th_large_delta) >> >>> @@ -1464,6 +1483,11 @@ tc_windup(struct bintime *new_boottimebin) >>> scale += (th->th_adjustment / 1024) * 2199; >>> scale /= th->th_counter->tc_frequency; >>> th->th_scale = scale * 2; >>> +#ifdef _LP64 >>> + th->th_scale_bits = ffsl(th->th_scale); >>> +#else >>> + th->th_scale_bits = ffsll(th->th_scale); >>> +#endif >> >> th->th_large_delta = ((uint64_t)1 << 63) / scale; > > So I am able to reproduce it with some surprising ease on HPET running > on Haswell. So what is the cause of it? Maybe the tickless code doesn't generate fake clock ticks right. Or it is just a library bug. The kernel has to be slightly real-time to satisfy the requirement of 1 update per. Applications are further from being real-time. But isn't it enough for the kernel to ensure that the timehands cycle more than once per second? > I looked at the generated code for libc which still uses ffsll() on 32bit, > due to the ABI issues. At least clang generates two BSF instructions for > this code, so I think that forking vdso_timehands ABI for this is not > reasonable right now. > > diff --git a/lib/libc/sys/__vdso_gettimeofday.c b/lib/libc/sys/__vdso_gettimeofday.c > index 3749e0473af..3c3c71207bd 100644 > --- a/lib/libc/sys/__vdso_gettimeofday.c > +++ b/lib/libc/sys/__vdso_gettimeofday.c > @@ -32,6 +32,8 @@ __FBSDID("$FreeBSD$"); > #include <sys/time.h> > #include <sys/vdso.h> > #include <errno.h> > +#include <limits.h> > +#include <strings.h> > #include <time.h> > #include <machine/atomic.h> > #include "libc_private.h" > @@ -62,7 +64,8 @@ binuptime(struct bintime *bt, struct vdso_timekeep *tk, int abs) > { > struct vdso_timehands *th; > uint32_t curr, gen; > - u_int delta; > + uint64_t scale, x; > + u_int delta, scale_bits; > int error; > > do { > @@ -78,7 +81,19 @@ binuptime(struct bintime *bt, struct vdso_timekeep *tk, int abs) > continue; > if (error != 0) > return (error); > - bintime_addx(bt, th->th_scale * delta); > + scale = th->th_scale; > +#ifdef _LP64 > + scale_bits = ffsl(scale); > +#else > + scale_bits = ffsll(scale); > +#endif I see that there is an ABI problem adding th_large_delta. > + if (__predict_false(scale_bits + fls(delta) > 63)) { > + x = (scale >> 32) * delta; > + scale &= UINT_MAX; Should be UINT32_MAX or better 0xffffffff. > + bt->sec += x >> 32; > + bintime_addx(bt, x << 32); > + } > + bintime_addx(bt, scale * delta); > if (abs) > bintime_add(bt, &th->th_boottime); I don't changing this at all this. binuptime() was carefully written to not need so much 64-bit arithmetic. If this pessimization is allowed, then it can also handle a 64-bit deltas. Using the better kernel method: if (__predict_false(delta >= th->th_large_delta)) { bt->sec += (scale >> 32) * (delta >> 32); x = (scale >> 32) * (delta & 0xffffffff); bt->sec += x >> 32; bintime_addx(bt, x << 32); x = (scale & 0xffffffff) * (delta >> 32); bt->sec += x >> 32; bintime_addx(bt, x << 32); bintime_addx(bt, (scale & 0xffffffff) * (delta & 0xffffffff)); } else bintime_addx(bt, scale * (delta & 0xffffffff)); I just noticed that there is a 64 x 32 -> 64 bit multiplication in the current method. This can be changed to do expicit 32 x 32 -> 64 bit multiplications and fix the overflow problem at small extra cost on 32-bit arches: /* 32-bit arches did the next multiplication implicitly. */ x = (scale >> 32) * delta; /* * And they did the following shifts and most of the adds * implicitly too. Except shifting x left by 32 lost the * seconds part that the next line handles. The next line * is the only extra cost for them. */ bt->sec += x >> 32; bintime_addx(bt, (x << 32) + (scale & 0xffffffff) * delta); Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20190302225513.W3408>