From owner-freebsd-hackers@freebsd.org Fri Mar 1 20:38:33 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 546DB1523E40; Fri, 1 Mar 2019 20:38:33 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246]) by mx1.freebsd.org (Postfix) with ESMTP id 9AE989082D; Fri, 1 Mar 2019 20:38:32 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 9FA6543C3C9; Sat, 2 Mar 2019 07:38:22 +1100 (AEDT) Date: Sat, 2 Mar 2019 07:38:20 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Konstantin Belousov cc: Bruce Evans , Mark Millard , freebsd-hackers Hackers , FreeBSD PowerPC ML Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed] In-Reply-To: <20190301194217.GB68879@kib.kiev.ua> Message-ID: <20190302071425.G5025@besplex.bde.org> References: <20190228145542.GT2420@kib.kiev.ua> <20190228150811.GU2420@kib.kiev.ua> <962D78C3-65BE-40C1-BB50-A0088223C17B@yahoo.com> <28C2BB0A-3DAA-4D18-A317-49A8DD52778F@yahoo.com> <20190301112717.GW2420@kib.kiev.ua> <20190302043936.A4444@besplex.bde.org> <20190301194217.GB68879@kib.kiev.ua> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=FNpr/6gs c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=qvjnQ3-ODA_JSq_lWvkA:9 a=CjuIK1q_8ugA:10 X-Rspamd-Queue-Id: 9AE989082D X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [-6.93 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; NEURAL_HAM_SHORT(-0.93)[-0.929,0]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; REPLY(-4.00)[] X-Mailman-Approved-At: Fri, 01 Mar 2019 21:20:17 +0000 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Mar 2019 20:38:33 -0000 On Fri, 1 Mar 2019, Konstantin Belousov wrote: > On Sat, Mar 02, 2019 at 05:40:58AM +1100, Bruce Evans wrote: >> On Fri, 1 Mar 2019, Konstantin Belousov wrote: >> >>> On Thu, Feb 28, 2019 at 11:39:06PM -0800, Mark Millard wrote: >>>> [The new, trial code also has truncation occurring.] >>> In fact no, I do not think it is. >>> >>>> An example observation of diff_scaled having an overflowed >>>> value was: >>>> >>>> scale_factor == 0x80da2067ac >>>> scale_factor*freq overflows unsigned, 64 bit representation. >>>> tim_offset == 0x3da0eaeb >>>> tim_cnt == 0x42dea3c4 >>>> tim_diff == 0x53db8d9 >>>> For reference: 0x1fc9d43 == 0xffffffffffffffffull/scale_factor >>>> scaled_diff == 0xA353A5BF3FF780CC (truncated to 64 bits) >>>> >>>> But for the new, trail code: >>>> >>>> 0x80da2067ac is 40 bits >>>> 0x53db8d9 is 27 bits >>>> So 67 bits, more than 63. Then: >>>> >>>> x >>>> == (0x80da2067ac>>32) * 0x53db8d9 >>>> == 0x80 * 0x53db8d9 >>>> == 0x29EDC6C80 >>>> >>>> x>>32 >>>> == 0x2 >>>> >>>> x<<32 >>>> == 0x9EDC6C8000000000 (limited to 64 bits) >>>> Note the truncation of: 0x29EDC6C8000000000. >>> Right, this is how the patch is supposed to work. Note that the overflow >>> bits 'lost' due to overflow of the left shift are the same bits that as >>> used to increment bt->sec: >>> bt->sec += x >> 32; >>> So the 2 seconds are accounted for. 67 bits is 4-8 seconds, and it is an error for the adjustment to be >= 1 second. >>> >>>> >>>> Thus the "bintime_addx(bt, x << 32)" is still >>>> based on a truncated value. >>> >>> I must admit that 2 seconds of interval where the timehands where >>> not updated is too much. This might be the real cause of all ppc >>> troubles. I tried to see if the overflow case is possible on amd64, >>> and did not get a single case of the '> 63' branch executed during the >>> /usr/tests/lib/libc run. >> >> The algorithm requires the update interval to be less than 1 second. >> th_scale is 2**64 / tc_frequency, so whatever tc_frequency is, after >> 1 second the value of the multiplication is approximately 2**64 so >> it overflows about then (depending on rounding). >> >> The most useful timecounters are TSC's, and these give another overflow >> in tc_delta() after 1 second when their frequency is 4 GHz (except the >> bogus TSC-low timecounter reduces the frequency to below 2 binary GHz, >> so the usual case is overflow after 2 seconds). > As I said, I was unable to trigger the overflow on amd64. Yes, amd64 doesn't have the bug. It gets tested more with fast TSCs that overflow after about 1 second for other reasons. Try amd64 with an ACPI or HPET timecounter. I sometimes do this to avoid the overflow in tc_delta() while stopped in ddb. This seemed to work. Actually, it doesn't work since the overflow under discussion still occurs. >* ... >>> @@ -78,7 +81,14 @@ binuptime(struct bintime *bt, struct vdso_timekeep *tk, int abs) >>> continue; >>> if (error != 0) >>> return (error); >>> - bintime_addx(bt, th->th_scale * delta); >>> + scale = th->th_scale; >>> + if (__predict_false(fls(scale) + fls(delta) > 63)) { >> >> This is unnecessarily pessimal. Updates must be frequent enough to prevent >> tc_delta() overflowing, and it is even easier to arrange that this >> multiplication doesn't overflow (since the necessary update interval for >> the latter is constant). >> >> `scale' is 64 bits, so fls(scale) is broken on 32-bit arches, and >> flls(scale) is an especially large pessimization. I saw this on my >> calcru1() fixes -- the flls()s take almost as long as long long >> divisions when the use the pessimal C versions. > Ok, fixed. It is still very slow. > Updated patch. > > diff --git a/lib/libc/sys/__vdso_gettimeofday.c b/lib/libc/sys/__vdso_gettimeofday.c > index 3749e0473af..fdefda08e39 100644 > --- a/lib/libc/sys/__vdso_gettimeofday.c > +++ b/lib/libc/sys/__vdso_gettimeofday.c > ... > @@ -78,7 +81,19 @@ binuptime(struct bintime *bt, struct vdso_timekeep *tk, int abs) > continue; > if (error != 0) > return (error); > - bintime_addx(bt, th->th_scale * delta); > + scale = th->th_scale; > +#ifdef _LP64 > + scale_bits = ffsl(scale); > +#else > + scale_bits = ffsll(scale); > +#endif > + if (__predict_false(scale_bits + fls(delta) > 63)) { Userland is still fully pessimized, except when the compiler auto-inlines ffs*(). > diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c > index 2656fb4d22f..eedea5183c0 100644 > --- a/sys/kern/kern_tc.c > +++ b/sys/kern/kern_tc.c > @@ -355,13 +356,22 @@ void > binuptime(struct bintime *bt) > { > struct timehands *th; > - u_int gen; > + uint64_t scale, x; > + u_int delta, gen; > > do { > th = timehands; > gen = atomic_load_acq_int(&th->th_generation); > *bt = th->th_offset; > - bintime_addx(bt, th->th_scale * tc_delta(th)); > + scale = th->th_scale; > + delta = tc_delta(th); > + if (__predict_false(th->th_scale_bits + fls(delta) > 63)) { Better, but shouldn't be changed (and the bug that causes the large intervals remains unlocated), and if it is changed then it should use: if (delta >= th->th_large_delta) > @@ -1464,6 +1483,11 @@ tc_windup(struct bintime *new_boottimebin) > scale += (th->th_adjustment / 1024) * 2199; > scale /= th->th_counter->tc_frequency; > th->th_scale = scale * 2; > +#ifdef _LP64 > + th->th_scale_bits = ffsl(th->th_scale); > +#else > + th->th_scale_bits = ffsll(th->th_scale); > +#endif th->th_large_delta = ((uint64_t)1 << 63) / scale; Bruce