Date: Sat, 2 Mar 2019 16:25:21 +0200 From: Konstantin Belousov <kostikbel@gmail.com> To: Bruce Evans <brde@optusnet.com.au> Cc: Mark Millard <marklmi@yahoo.com>, freebsd-hackers Hackers <freebsd-hackers@freebsd.org>, FreeBSD PowerPC ML <freebsd-ppc@freebsd.org> Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed] Message-ID: <20190302142521.GE68879@kib.kiev.ua> In-Reply-To: <20190302225513.W3408@besplex.bde.org> References: <20190228145542.GT2420@kib.kiev.ua> <20190228150811.GU2420@kib.kiev.ua> <962D78C3-65BE-40C1-BB50-A0088223C17B@yahoo.com> <28C2BB0A-3DAA-4D18-A317-49A8DD52778F@yahoo.com> <20190301112717.GW2420@kib.kiev.ua> <20190302043936.A4444@besplex.bde.org> <20190301194217.GB68879@kib.kiev.ua> <20190302071425.G5025@besplex.bde.org> <20190302105140.GC68879@kib.kiev.ua> <20190302225513.W3408@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Mar 03, 2019 at 12:03:18AM +1100, Bruce Evans wrote:
> On Sat, 2 Mar 2019, Konstantin Belousov wrote:
> 
> > On Sat, Mar 02, 2019 at 07:38:20AM +1100, Bruce Evans wrote:
> >> On Fri, 1 Mar 2019, Konstantin Belousov wrote:
> >>> +		scale = th->th_scale;
> >>> +		delta = tc_delta(th);
> >>> +		if (__predict_false(th->th_scale_bits + fls(delta) > 63)) {
> >>
> >> Better, but shouldn't be changed (and the bug that causes the large intervals
> >> remains unlocated), and if it is changed then it should use:
> >>
> >>  		if (delta >= th->th_large_delta)
> >>
> >>> @@ -1464,6 +1483,11 @@ tc_windup(struct bintime *new_boottimebin)
> >>> 	scale += (th->th_adjustment / 1024) * 2199;
> >>> 	scale /= th->th_counter->tc_frequency;
> >>> 	th->th_scale = scale * 2;
> >>> +#ifdef _LP64
> >>> +	th->th_scale_bits = ffsl(th->th_scale);
> >>> +#else
> >>> +	th->th_scale_bits = ffsll(th->th_scale);
> >>> +#endif
> >>
> >>  	th->th_large_delta = ((uint64_t)1 << 63) / scale;
> >
> > So I am able to reproduce it with some surprising ease on HPET running
> > on Haswell.
> 
> So what is the cause of it?  Maybe the tickless code doesn't generate
> fake clock ticks right.  Or it is just a library bug.  The kernel has
> to be slightly real-time to satisfy the requirement of 1 update per.
> Applications are further from being real-time.  But isn't it enough
> for the kernel to ensure that the timehands cycle more than once per
> second?
No, I entered ddb as you suggested.
> 
> > I looked at the generated code for libc which still uses ffsll() on 32bit,
> > due to the ABI issues.  At least clang generates two BSF instructions for
> > this code, so I think that forking vdso_timehands ABI for this is not
> > reasonable right now.
> >
> > diff --git a/lib/libc/sys/__vdso_gettimeofday.c b/lib/libc/sys/__vdso_gettimeofday.c
> > index 3749e0473af..3c3c71207bd 100644
> > --- a/lib/libc/sys/__vdso_gettimeofday.c
> > +++ b/lib/libc/sys/__vdso_gettimeofday.c
> > @@ -32,6 +32,8 @@ __FBSDID("$FreeBSD$");
> > #include <sys/time.h>
> > #include <sys/vdso.h>
> > #include <errno.h>
> > +#include <limits.h>
> > +#include <strings.h>
> > #include <time.h>
> > #include <machine/atomic.h>
> > #include "libc_private.h"
> > @@ -62,7 +64,8 @@ binuptime(struct bintime *bt, struct vdso_timekeep *tk, int abs)
> > {
> > 	struct vdso_timehands *th;
> > 	uint32_t curr, gen;
> > -	u_int delta;
> > +	uint64_t scale, x;
> > +	u_int delta, scale_bits;
> > 	int error;
> >
> > 	do {
> > @@ -78,7 +81,19 @@ binuptime(struct bintime *bt, struct vdso_timekeep *tk, int abs)
> > 			continue;
> > 		if (error != 0)
> > 			return (error);
> > -		bintime_addx(bt, th->th_scale * delta);
> > +		scale = th->th_scale;
> > +#ifdef _LP64
> > +		scale_bits = ffsl(scale);
> > +#else
> > +		scale_bits = ffsll(scale);
> > +#endif
> 
> I see that there is an ABI problem adding th_large_delta.
> 
> > +		if (__predict_false(scale_bits + fls(delta) > 63)) {
> > +			x = (scale >> 32) * delta;
> > +			scale &= UINT_MAX;
> 
> Should be UINT32_MAX or better 0xffffffff.
Ok.
> 
> > +			bt->sec += x >> 32;
> > +			bintime_addx(bt, x << 32);
> > +		}
> > +		bintime_addx(bt, scale * delta);
> > 		if (abs)
> > 			bintime_add(bt, &th->th_boottime);
> 
> I don't changing this at all this.  binuptime() was carefully written
> to not need so much 64-bit arithmetic.
> 
> If this pessimization is allowed, then it can also handle a 64-bit
> deltas.  Using the better kernel method:
> 
>  		if (__predict_false(delta >= th->th_large_delta)) {
>  			bt->sec += (scale >> 32) * (delta >> 32);
>  			x = (scale >> 32) * (delta & 0xffffffff);
>  			bt->sec += x >> 32;
>  			bintime_addx(bt, x << 32);
>  			x = (scale & 0xffffffff) * (delta >> 32);
>  			bt->sec += x >> 32;
>  			bintime_addx(bt, x << 32);
>  			bintime_addx(bt, (scale & 0xffffffff) *
>  			    (delta & 0xffffffff));
>  		} else
>  			bintime_addx(bt, scale * (delta & 0xffffffff));
This only makes sense if delta is extended to uint64_t, which requires
the pass over timecounters.
> 
> I just noticed that there is a 64 x 32 -> 64 bit multiplication in the
> current method.  This can be changed to do expicit 32 x 32 -> 64 bit
> multiplications and fix the overflow problem at small extra cost on
> 32-bit arches:
> 
>  		/* 32-bit arches did the next multiplication implicitly. */
>  		x = (scale >> 32) * delta;
>  		/*
>  		 * And they did the following shifts and most of the adds
>  		 * implicitly too.  Except shifting x left by 32 lost the
>  		 * seconds part that the next line handles.  The next line
>  		 * is the only extra cost for them.
>  		 */
>  		bt->sec += x >> 32;
>  		bintime_addx(bt, (x << 32) + (scale & 0xffffffff) * delta);
Ok, what about the following.
diff --git a/lib/libc/sys/__vdso_gettimeofday.c b/lib/libc/sys/__vdso_gettimeofday.c
index 3749e0473af..cfe3d96d001 100644
--- a/lib/libc/sys/__vdso_gettimeofday.c
+++ b/lib/libc/sys/__vdso_gettimeofday.c
@@ -32,6 +32,8 @@ __FBSDID("$FreeBSD$");
 #include <sys/time.h>
 #include <sys/vdso.h>
 #include <errno.h>
+#include <limits.h>
+#include <strings.h>
 #include <time.h>
 #include <machine/atomic.h>
 #include "libc_private.h"
@@ -62,7 +64,8 @@ binuptime(struct bintime *bt, struct vdso_timekeep *tk, int abs)
 {
 	struct vdso_timehands *th;
 	uint32_t curr, gen;
-	u_int delta;
+	uint64_t scale, x;
+	u_int delta, scale_bits;
 	int error;
 
 	do {
@@ -78,7 +81,19 @@ binuptime(struct bintime *bt, struct vdso_timekeep *tk, int abs)
 			continue;
 		if (error != 0)
 			return (error);
-		bintime_addx(bt, th->th_scale * delta);
+		scale = th->th_scale;
+#ifdef _LP64
+		scale_bits = ffsl(scale);
+#else
+		scale_bits = ffsll(scale);
+#endif
+		if (__predict_false(scale_bits + fls(delta) > 63)) {
+			x = (scale >> 32) * delta;
+			scale &= 0xffffffff;
+			bt->sec += x >> 32;
+			bintime_addx(bt, x << 32);
+		}
+		bintime_addx(bt, scale * delta);
 		if (abs)
 			bintime_add(bt, &th->th_boottime);
 
diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c
index 2656fb4d22f..2e28f872229 100644
--- a/sys/kern/kern_tc.c
+++ b/sys/kern/kern_tc.c
@@ -72,6 +72,7 @@ struct timehands {
 	struct timecounter	*th_counter;
 	int64_t			th_adjustment;
 	uint64_t		th_scale;
+	uint64_t		th_large_delta;
 	u_int	 		th_offset_count;
 	struct bintime		th_offset;
 	struct bintime		th_bintime;
@@ -351,17 +352,44 @@ fbclock_getmicrotime(struct timeval *tvp)
 	} while (gen == 0 || gen != th->th_generation);
 }
 #else /* !FFCLOCK */
+
+static void
+bintime_helper(struct bintime *bt, uint64_t *scale, u_int delta)
+{
+	uint64_t x;
+
+	x = (*scale >> 32) * delta;
+	*scale &= 0xffffffff;
+	bt->sec += x >> 32;
+	bintime_addx(bt, x << 32);
+}
+
 void
 binuptime(struct bintime *bt)
 {
 	struct timehands *th;
-	u_int gen;
+	uint64_t scale;
+	u_int delta, gen;
 
 	do {
 		th = timehands;
 		gen = atomic_load_acq_int(&th->th_generation);
 		*bt = th->th_offset;
-		bintime_addx(bt, th->th_scale * tc_delta(th));
+		scale = th->th_scale;
+		delta = tc_delta(th);
+#ifdef _LP64
+		/* Avoid overflow for scale * delta. */
+		if (__predict_false(th->th_large_delta <= delta))
+			bintime_helper(bt, &scale, delta);
+		bintime_addx(bt, scale * delta);
+#else
+		/*
+		 * Also avoid (uint64_t, uint32_t) -> uint64_t
+		 * multiplication on 32bit arches.
+		 */
+		bintime_helper(bt, &scale, delta);
+		bintime_addx(bt, (u_int)scale * delta);
+#endif
 		atomic_thread_fence_acq();
 	} while (gen == 0 || gen != th->th_generation);
 }
@@ -388,13 +416,28 @@ void
 bintime(struct bintime *bt)
 {
 	struct timehands *th;
-	u_int gen;
+	uint64_t scale;
+	u_int delta, gen;
 
 	do {
 		th = timehands;
 		gen = atomic_load_acq_int(&th->th_generation);
 		*bt = th->th_bintime;
-		bintime_addx(bt, th->th_scale * tc_delta(th));
+		scale = th->th_scale;
+		delta = tc_delta(th);
+#ifdef _LP64
+		/* Avoid overflow for scale * delta. */
+		if (__predict_false(th->th_large_delta <= delta))
+			bintime_helper(bt, &scale, delta);
+		bintime_addx(bt, scale * delta);
+#else
+		/*
+		 * Also avoid (uint64_t, uint32_t) -> uint64_t
+		 * multiplication on 32bit arches.
+		 */
+		bintime_helper(bt, &scale, delta);
+		bintime_addx(bt, (u_int)scale * delta);
+#endif
 		atomic_thread_fence_acq();
 	} while (gen == 0 || gen != th->th_generation);
 }
@@ -1464,6 +1507,7 @@ tc_windup(struct bintime *new_boottimebin)
 	scale += (th->th_adjustment / 1024) * 2199;
 	scale /= th->th_counter->tc_frequency;
 	th->th_scale = scale * 2;
+	th->th_large_delta = ((uint64_t)1 << 63) / scale;
 
 	/*
 	 * Now that the struct timehands is again consistent, set the new
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20190302142521.GE68879>
