From owner-freebsd-hackers@freebsd.org Fri Mar 1 20:15:46 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5FFB815234C7 for ; Fri, 1 Mar 2019 20:15:46 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic309-21.consmr.mail.ne1.yahoo.com (sonic309-21.consmr.mail.ne1.yahoo.com [66.163.184.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id D6B308FCD4 for ; Fri, 1 Mar 2019 20:15:44 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: UGCe2fUVM1mOKVylvpwuM8JocMdtjeD0oRJaclN8uxKfmEBx4AadKxCY2Pni1Ls 49RyrKL8qw_63XkklBTeXthPOrX4LvgJy_ucnBzSuJBFR7sp2j7RPEWlx8oINodsc24XYGuIfhym Ep5s2tXMmfbu7Pn6E8nuzeA2zLo.mU.1.sJkvBvcdibYEQ9VUw3PieBRGquDBurQSJLraIboWlp6 aLfBsvAHnjEJ.w1rVu0mirN_aEuXoqWCuxBTi93LmtIm8o0TzF7138Yr1YbRHNJE6fzcjsgYn5Vw RCQK8j7g7c_aljHhCbMoZ.WDFvgWQJ4Pxm9j_yjCoaVzhmg560UeZufPo4bkPKlP_8O.RZYLl6Jz dNdkYpQwVKFro.Nwvdtkqh18FFMdoFxHSZNYqCpA4JlPkyd2Lkqv7nzF8VLthu23nz5SIREYXXwh eYyUp.jvaAsPtWfyQqFq1U.Y8n3x5LUhR.FePg2tBROnTMr6XDmHI_mFgog8l0Pt9o6e.jBcKiau TJqAkG7ZljE.6HSLWVvKM2kIgsFvAVe04pmNkN2vbTk2YWTGe3GJl.kxfgY3PeSE5IoCgHCuG8Wy rtAQcBLW8ja_etvCkoNbyxFTAcSNsJ94j1WJ58FauVtIlNS4.GUzl9FFCcUlXHdbzvQweob6MduT BojkG.jBfJ8fwh3GRK093dmqcx5iTqttswCz4VDTzDwoZtIYGOGYN.EZ7JsIPw.fYVOQ9Xz5URDc XqWzn.9hC29mAozOtlXsQB2THqoPVD37RLjNJZOMk99kJwHC48G.aEEeW52bLXu4fgj8fuH5dPav 7pJBr1iVcThQ9zpOUsLbViw5Be9ZawyODKEx4ESyRA2k6l32qjfmR.WjQ5hGryQbU6Q0Yg4SEguf fr.OjokhRO86ETPynyIgzBDqI5cCEeC1d4c1eAiYGt83AdJIp9hLkj8pD7YOQdn3k5.r6lCU3Grx D3WNCnT2aCO_KGIhNRCAOCZv_f7nFJuuIoZBpmGPShjayys_G0wi1qPozFZVZ4u33tjUhv3nHvH_ VXNny5N.MRwAeYAUpdpWsHj6Tv0aOPxlsBwnoERPNuNHepkgeogAr6fB.MqRa.gCWh.6FnTyq8Zh 6oyJv Received: from sonic.gate.mail.ne1.yahoo.com by sonic309.consmr.mail.ne1.yahoo.com with HTTP; Fri, 1 Mar 2019 20:15:37 +0000 Received: from c-67-170-167-181.hsd1.or.comcast.net (EHLO [192.168.1.113]) ([67.170.167.181]) by smtp417.mail.ne1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA ID 055c9a72be27f708d115548f28119fc1; Fri, 01 Mar 2019 20:15:36 +0000 (UTC) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\)) Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed] From: Mark Millard In-Reply-To: <20190302043936.A4444@besplex.bde.org> Date: Fri, 1 Mar 2019 12:15:35 -0800 Cc: Konstantin Belousov , freebsd-hackers Hackers , FreeBSD PowerPC ML Content-Transfer-Encoding: quoted-printable Message-Id: <908615FE-0638-4A80-A3C9-6FC36219ECC3@yahoo.com> References: <20190228145542.GT2420@kib.kiev.ua> <20190228150811.GU2420@kib.kiev.ua> <962D78C3-65BE-40C1-BB50-A0088223C17B@yahoo.com> <28C2BB0A-3DAA-4D18-A317-49A8DD52778F@yahoo.com> <20190301112717.GW2420@kib.kiev.ua> <20190302043936.A4444@besplex.bde.org> To: Bruce Evans X-Mailer: Apple Mail (2.3445.102.3) X-Rspamd-Queue-Id: D6B308FCD4 X-Spamd-Bar: ++ X-Spamd-Result: default: False [2.72 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; MV_CASE(0.50)[]; FREEMAIL_FROM(0.00)[yahoo.com]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; MX_GOOD(-0.01)[cached: mta6.am0.yahoodns.net]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; FREEMAIL_TO(0.00)[optusnet.com.au]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36646, ipnet:66.163.184.0/21, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com.dwl.dnswl.org : 127.0.5.0]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; NEURAL_SPAM_SHORT(0.96)[0.962,0]; MIME_GOOD(-0.10)[text/plain]; IP_SCORE(1.06)[ip: (3.04), ipnet: 66.163.184.0/21(1.30), asn: 36646(1.04), country: US(-0.07)]; NEURAL_SPAM_MEDIUM(0.83)[0.828,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_SPAM_LONG(0.38)[0.377,0]; RCVD_IN_DNSWL_NONE(0.00)[147.184.163.66.list.dnswl.org : 127.0.5.0] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Mar 2019 20:15:46 -0000 On 2019-Mar-1, at 10:40, Bruce Evans wrote: > On Fri, 1 Mar 2019, Konstantin Belousov wrote: >=20 >> On Thu, Feb 28, 2019 at 11:39:06PM -0800, Mark Millard wrote: >>> [The new, trial code also has truncation occurring.] >> In fact no, I do not think it is. >>=20 >>> An example observation of diff_scaled having an overflowed >>> value was: >>>=20 >>> scale_factor =3D=3D 0x80da2067ac >>> scale_factor*freq overflows unsigned, 64 bit representation. >>> tim_offset =3D=3D 0x3da0eaeb >>> tim_cnt =3D=3D 0x42dea3c4 >>> tim_diff =3D=3D 0x53db8d9 >>> For reference: 0x1fc9d43 =3D=3D = 0xffffffffffffffffull/scale_factor >>> scaled_diff =3D=3D 0xA353A5BF3FF780CC (truncated to 64 bits) >>>=20 >>> But for the new, trail code: >>>=20 >>> 0x80da2067ac is 40 bits >>> 0x53db8d9 is 27 bits >>> So 67 bits, more than 63. Then: >>>=20 >>> x >>> =3D=3D (0x80da2067ac>>32) * 0x53db8d9 >>> =3D=3D 0x80 * 0x53db8d9 >>> =3D=3D 0x29EDC6C80 >>>=20 >>> x>>32 >>> =3D=3D 0x2 >>>=20 >>> x<<32 >>> =3D=3D 0x9EDC6C8000000000 (limited to 64 bits) >>> Note the truncation of: 0x29EDC6C8000000000. >> Right, this is how the patch is supposed to work. Note that the = overflow >> bits 'lost' due to overflow of the left shift are the same bits that = as >> used to increment bt->sec: >> bt->sec +=3D x >> 32; >> So the 2 seconds are accounted for. >>=20 >>>=20 >>> Thus the "bintime_addx(bt, x << 32)" is still >>> based on a truncated value. >>=20 >> I must admit that 2 seconds of interval where the timehands where >> not updated is too much. This might be the real cause of all ppc >> troubles. I tried to see if the overflow case is possible on amd64, >> and did not get a single case of the '> 63' branch executed during = the >> /usr/tests/lib/libc run. >=20 > The algorithm requires the update interval to be less than 1 second. > th_scale is 2**64 / tc_frequency, so whatever tc_frequency is, after > 1 second the value of the multiplication is approximately 2**64 so > it overflows about then (depending on rounding). I've not tracked down evidence below binuptime on powerpc64 yet. I'm not sure how much of the substructure I can investigate. The context here is 2 sockets, each with 2 cores. > The most useful timecounters are TSC's, and these give another = overflow > in tc_delta() after 1 second when their frequency is 4 GHz (except the > bogus TSC-low timecounter reduces the frequency to below 2 binary GHz, > so the usual case is overflow after 2 seconds). The wording suggests a amd64/i386 context but my report was for powerpc64, specifically for old PowerMac G5's. (I currently have\access to only one ut I've seen the beuavior on others in the last.) FreeBSD reports: # sysctl kern.timecounter kern.timecounter.tc.timebase.quality: 0 kern.timecounter.tc.timebase.frequency: 33333333 kern.timecounter.tc.timebase.counter: 1831468476 kern.timecounter.tc.timebase.mask: 4294967295 kern.timecounter.stepwarnings: 0 kern.timecounter.alloweddeviation: 5 kern.timecounter.hardware: timebase kern.timecounter.choice: timebase(0) dummy(-1000000) kern.timecounter.tick: 1 kern.timecounter.fast_gettime: 1 FreeBSD uses the lower 32 bits of the tbr (via mftb). As I understand there are problems with how close the tbr's can be kept. >> Actually, the same overflow-prone code exists in libc, so below is = the >> updated patch: >> - I added __predict_false() >> - libc multiplication is also done separately for high-order bits. >> (fftclock counterpart is still pending). >>=20 >> diff --git a/lib/libc/sys/__vdso_gettimeofday.c = b/lib/libc/sys/__vdso_gettimeofday.c >> index 3749e0473af..a14576988ff 100644 >> --- a/lib/libc/sys/__vdso_gettimeofday.c >> +++ b/lib/libc/sys/__vdso_gettimeofday.c >> @@ -32,6 +32,8 @@ __FBSDID("$FreeBSD$"); >> #include >> #include >> #include >> +#include >> +#include >> #include >> #include >> #include "libc_private.h" >> @@ -62,6 +64,7 @@ binuptime(struct bintime *bt, struct vdso_timekeep = *tk, int abs) >> { >> struct vdso_timehands *th; >> uint32_t curr, gen; >> + uint64_t scale, x; >> u_int delta; >> int error; >>=20 >> @@ -78,7 +81,14 @@ binuptime(struct bintime *bt, struct vdso_timekeep = *tk, int abs) >> continue; >> if (error !=3D 0) >> return (error); >> - bintime_addx(bt, th->th_scale * delta); >> + scale =3D th->th_scale; >> + if (__predict_false(fls(scale) + fls(delta) > 63)) { >=20 > This is unnecessarily pessimal. Updates must be frequent enough to = prevent > tc_delta() overflowing, and it is even easier to arrange that this > multiplication doesn't overflow (since the necessary update interval = for > the latter is constant). >=20 > `scale' is 64 bits, so fls(scale) is broken on 32-bit arches, and > flls(scale) is an especially large pessimization. I saw this on my > calcru1() fixes -- the flls()s take almost as long as long long > divisions when the use the pessimal C versions. Unlike i386's and amd64's: static __inline __pure2 int fls(int mask) { return (mask =3D=3D 0 ? mask : (int)bsrl((u_int)mask) + 1); } powerpc64's apparently use /usr/src/sys/libkern/fls.c 's: int fls(int mask) { int bit; if (mask =3D=3D 0) return (0); for (bit =3D 1; mask !=3D 1; bit++) mask =3D (unsigned int)mask >> 1; return (bit); } (At least I did not find an alternate for the powerpc64 context.) So, if I got that right, fls does a lot of looping on powerpc64. > The algorithm requires tc_delta() to be only 32 bits, since otherwise = the > multiplication would be 64 x 64 bits so would be much slower and = harder > to write. >=20 > If tc_freqency is far above 4GHz, then th_scale is far below 4G, so = the > scaling is not so accurate. But 0.25 parts per billion is much more = than > enough. Even 1 part per million is enough for a TSC, since TSC = instability > is more than 1ppm. The overflows could be pushed off to 1024 seconds = by > dividing by 1024 at suitable places. A 32-bit scale times a 64-bit = delta > would be simple compared with both 64 bits (much like the code here). >=20 > The code here can be optimized using values calculated at = initialization > time instead of fls*(). The overflow threshold for delta is = approximately > 2**64 / tc_frequency. >=20 >> + x =3D (scale >> 32) * delta; >> + scale &=3D UINT_MAX; >> + bt->sec +=3D x >> 32; >> + bintime_addx(bt, x << 32); >> + } >> + bintime_addx(bt, scale * delta); >> if (abs) >> bintime_add(bt, &th->th_boottime); >=20 > When the timecounter is the i8254, as it often was when timecounters > were new, tc_windup() had to be called more often than every i8254 = rollover > (in practice once every hardclock tick), partly to keep tc_delta() = small > (since rollover gives a form of overflow). This was not so easy to = arrange. > It requires not losing any hardclock ticks and also not having any = with > high latency and also complications to detect the rollover when there = is > small latency. Most hardware is easier to handle now. With tickless > kernels, hardclock() is often not called for about 1 second, but it = must > be called at least that often to prevent the overflow here. Thanks for the notes. I've been wondering around in unfamiliar = territory. =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)