Date: Fri, 1 Mar 2019 12:15:35 -0800 From: Mark Millard <marklmi@yahoo.com> To: Bruce Evans <brde@optusnet.com.au> Cc: Konstantin Belousov <kib@freebsd.org>, freebsd-hackers Hackers <freebsd-hackers@freebsd.org>, FreeBSD PowerPC ML <freebsd-ppc@freebsd.org> Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed] Message-ID: <908615FE-0638-4A80-A3C9-6FC36219ECC3@yahoo.com> In-Reply-To: <20190302043936.A4444@besplex.bde.org> References: <D3D7E9F4-9A5E-4320-B3C8-EC5CEF4A2764@yahoo.com> <20190228145542.GT2420@kib.kiev.ua> <20190228150811.GU2420@kib.kiev.ua> <962D78C3-65BE-40C1-BB50-A0088223C17B@yahoo.com> <28C2BB0A-3DAA-4D18-A317-49A8DD52778F@yahoo.com> <20190301112717.GW2420@kib.kiev.ua> <20190302043936.A4444@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2019-Mar-1, at 10:40, Bruce Evans <brde at optusnet.com.au> wrote: > On Fri, 1 Mar 2019, Konstantin Belousov wrote: >=20 >> On Thu, Feb 28, 2019 at 11:39:06PM -0800, Mark Millard wrote: >>> [The new, trial code also has truncation occurring.] >> In fact no, I do not think it is. >>=20 >>> An example observation of diff_scaled having an overflowed >>> value was: >>>=20 >>> scale_factor =3D=3D 0x80da2067ac >>> scale_factor*freq overflows unsigned, 64 bit representation. >>> tim_offset =3D=3D 0x3da0eaeb >>> tim_cnt =3D=3D 0x42dea3c4 >>> tim_diff =3D=3D 0x53db8d9 >>> For reference: 0x1fc9d43 =3D=3D = 0xffffffffffffffffull/scale_factor >>> scaled_diff =3D=3D 0xA353A5BF3FF780CC (truncated to 64 bits) >>>=20 >>> But for the new, trail code: >>>=20 >>> 0x80da2067ac is 40 bits >>> 0x53db8d9 is 27 bits >>> So 67 bits, more than 63. Then: >>>=20 >>> x >>> =3D=3D (0x80da2067ac>>32) * 0x53db8d9 >>> =3D=3D 0x80 * 0x53db8d9 >>> =3D=3D 0x29EDC6C80 >>>=20 >>> x>>32 >>> =3D=3D 0x2 >>>=20 >>> x<<32 >>> =3D=3D 0x9EDC6C8000000000 (limited to 64 bits) >>> Note the truncation of: 0x29EDC6C8000000000. >> Right, this is how the patch is supposed to work. Note that the = overflow >> bits 'lost' due to overflow of the left shift are the same bits that = as >> used to increment bt->sec: >> bt->sec +=3D x >> 32; >> So the 2 seconds are accounted for. >>=20 >>>=20 >>> Thus the "bintime_addx(bt, x << 32)" is still >>> based on a truncated value. >>=20 >> I must admit that 2 seconds of interval where the timehands where >> not updated is too much. This might be the real cause of all ppc >> troubles. I tried to see if the overflow case is possible on amd64, >> and did not get a single case of the '> 63' branch executed during = the >> /usr/tests/lib/libc run. >=20 > The algorithm requires the update interval to be less than 1 second. > th_scale is 2**64 / tc_frequency, so whatever tc_frequency is, after > 1 second the value of the multiplication is approximately 2**64 so > it overflows about then (depending on rounding). I've not tracked down evidence below binuptime on powerpc64 yet. I'm not sure how much of the substructure I can investigate. The context here is 2 sockets, each with 2 cores. > The most useful timecounters are TSC's, and these give another = overflow > in tc_delta() after 1 second when their frequency is 4 GHz (except the > bogus TSC-low timecounter reduces the frequency to below 2 binary GHz, > so the usual case is overflow after 2 seconds). The wording suggests a amd64/i386 context but my report was for powerpc64, specifically for old PowerMac G5's. (I currently have\access to only one ut I've seen the beuavior on others in the last.) FreeBSD reports: # sysctl kern.timecounter kern.timecounter.tc.timebase.quality: 0 kern.timecounter.tc.timebase.frequency: 33333333 kern.timecounter.tc.timebase.counter: 1831468476 kern.timecounter.tc.timebase.mask: 4294967295 kern.timecounter.stepwarnings: 0 kern.timecounter.alloweddeviation: 5 kern.timecounter.hardware: timebase kern.timecounter.choice: timebase(0) dummy(-1000000) kern.timecounter.tick: 1 kern.timecounter.fast_gettime: 1 FreeBSD uses the lower 32 bits of the tbr (via mftb). As I understand there are problems with how close the tbr's can be kept. >> Actually, the same overflow-prone code exists in libc, so below is = the >> updated patch: >> - I added __predict_false() >> - libc multiplication is also done separately for high-order bits. >> (fftclock counterpart is still pending). >>=20 >> diff --git a/lib/libc/sys/__vdso_gettimeofday.c = b/lib/libc/sys/__vdso_gettimeofday.c >> index 3749e0473af..a14576988ff 100644 >> --- a/lib/libc/sys/__vdso_gettimeofday.c >> +++ b/lib/libc/sys/__vdso_gettimeofday.c >> @@ -32,6 +32,8 @@ __FBSDID("$FreeBSD$"); >> #include <sys/time.h> >> #include <sys/vdso.h> >> #include <errno.h> >> +#include <limits.h> >> +#include <strings.h> >> #include <time.h> >> #include <machine/atomic.h> >> #include "libc_private.h" >> @@ -62,6 +64,7 @@ binuptime(struct bintime *bt, struct vdso_timekeep = *tk, int abs) >> { >> struct vdso_timehands *th; >> uint32_t curr, gen; >> + uint64_t scale, x; >> u_int delta; >> int error; >>=20 >> @@ -78,7 +81,14 @@ binuptime(struct bintime *bt, struct vdso_timekeep = *tk, int abs) >> continue; >> if (error !=3D 0) >> return (error); >> - bintime_addx(bt, th->th_scale * delta); >> + scale =3D th->th_scale; >> + if (__predict_false(fls(scale) + fls(delta) > 63)) { >=20 > This is unnecessarily pessimal. Updates must be frequent enough to = prevent > tc_delta() overflowing, and it is even easier to arrange that this > multiplication doesn't overflow (since the necessary update interval = for > the latter is constant). >=20 > `scale' is 64 bits, so fls(scale) is broken on 32-bit arches, and > flls(scale) is an especially large pessimization. I saw this on my > calcru1() fixes -- the flls()s take almost as long as long long > divisions when the use the pessimal C versions. Unlike i386's and amd64's: static __inline __pure2 int fls(int mask) { return (mask =3D=3D 0 ? mask : (int)bsrl((u_int)mask) + 1); } powerpc64's apparently use /usr/src/sys/libkern/fls.c 's: int fls(int mask) { int bit; if (mask =3D=3D 0) return (0); for (bit =3D 1; mask !=3D 1; bit++) mask =3D (unsigned int)mask >> 1; return (bit); } (At least I did not find an alternate for the powerpc64 context.) So, if I got that right, fls does a lot of looping on powerpc64. > The algorithm requires tc_delta() to be only 32 bits, since otherwise = the > multiplication would be 64 x 64 bits so would be much slower and = harder > to write. >=20 > If tc_freqency is far above 4GHz, then th_scale is far below 4G, so = the > scaling is not so accurate. But 0.25 parts per billion is much more = than > enough. Even 1 part per million is enough for a TSC, since TSC = instability > is more than 1ppm. The overflows could be pushed off to 1024 seconds = by > dividing by 1024 at suitable places. A 32-bit scale times a 64-bit = delta > would be simple compared with both 64 bits (much like the code here). >=20 > The code here can be optimized using values calculated at = initialization > time instead of fls*(). The overflow threshold for delta is = approximately > 2**64 / tc_frequency. >=20 >> + x =3D (scale >> 32) * delta; >> + scale &=3D UINT_MAX; >> + bt->sec +=3D x >> 32; >> + bintime_addx(bt, x << 32); >> + } >> + bintime_addx(bt, scale * delta); >> if (abs) >> bintime_add(bt, &th->th_boottime); >=20 > When the timecounter is the i8254, as it often was when timecounters > were new, tc_windup() had to be called more often than every i8254 = rollover > (in practice once every hardclock tick), partly to keep tc_delta() = small > (since rollover gives a form of overflow). This was not so easy to = arrange. > It requires not losing any hardclock ticks and also not having any = with > high latency and also complications to detect the rollover when there = is > small latency. Most hardware is easier to handle now. With tickless > kernels, hardclock() is often not called for about 1 second, but it = must > be called at least that often to prevent the overflow here. Thanks for the notes. I've been wondering around in unfamiliar = territory. =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?908615FE-0638-4A80-A3C9-6FC36219ECC3>