From owner-freebsd-ppc@freebsd.org Sun Mar 3 05:21:06 2019 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D762E15244D4 for ; Sun, 3 Mar 2019 05:21:05 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic317-36.consmr.mail.ne1.yahoo.com (sonic317-36.consmr.mail.ne1.yahoo.com [66.163.184.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 974BE6EDE6 for ; Sun, 3 Mar 2019 05:21:04 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: qYN4k3IVM1lUtekxELAbd42EJuBb7C7eA4P_EyTRQ_K6TdyTM1cbzUqDPu4KV3M gCAgqy6wc2BiQtk5oJNAIq01M_Uu4Asixqsb7ZN3hnyc69CwT_Msst4bN0GVoolnXqlb0rLIJPnj FXS2gaBHA1Isy7P0rBXxr4KIVv2_BMmb3KVqJfGOxjqMwW.y_sEXy30xBoP6SnY3OwiHv0IiPUxq b9eQRWZPz15hZaTBofYkBuyN96FULOQ_zhxgrVEcTWUFvCfv.Aik6oELWCjsDLf1iABFREPoTfXm MnM4sSEFx9j3n.xlYnHdZdFGMFU8UoHf2Zi38trFbU6aYre2z9qVvyeJrSt2AxfyUqedFroS9LBc 76NMtjfd24WYb1_Juzx5gFqqyi2H3COYAzSiDjV_WSMa6FptXaIgB9HT6UI0KlAlG9m9zLJeKPXQ 2L2wfD_VAl8e_ld73o091ws80Vigl0FZBKmlUBE5KyRyFAiQ4Dd37xATiStRGJZ7VKx8XXmcnook R7fpOBOjcr5gvJEN3WMQaUf9hl1NLvKw009PAPSvvjx8yDmtdpdT2HvQSVclpudBO8cKaTngu3Uc Ts6Ls9NtHb_Cbn2ZIwXH42CEhL8lgwX9sC282o8COW9JJSnd9FjU9UE4qoIba7REjzvIEEe5fuUS stMoDANNAyGzoGHmaD_3EWyHtK9_.bAGO_OLTi9siYT6cdo_zMCfywcG37E6o_1SNlzs8YQPEKjr 9CJBwkeYNRTLiw0lVPqa.QOtcNFlZgGgTDg7XHbamIxD6jdLoJ5u1lD6dGalUKYl7ucX3KRSYxUP idSJGzkneMgDgQEU6lRkUCrE0PxiAZYOpmWKzR2TXU8AYb.egbqYnMJIsyALShoDB8N7B6L_mMI1 hbHRXrwNWu9d7KTy.qc0gY5.ab2lZ2PrDqcJLJiSjUl0T5y9EfgZI_U5dUH2b2RNHVGvnsnkf0BH _MVAefJoZxjqY4TgAFPpbNVfAta4tY_D6Rv.8uE5eXOscoHxOEovK.x22KjM547i9jJzRgwyGfiq yTg1ryJr5cwkeW.zWR4ocR51z.CGrUbBMCRZ9zUx5GLYxUW25PdM5lhvl0nxz2GwWhYQ6lJQ- Received: from sonic.gate.mail.ne1.yahoo.com by sonic317.consmr.mail.ne1.yahoo.com with HTTP; Sun, 3 Mar 2019 05:21:02 +0000 Received: from c-67-170-167-181.hsd1.or.comcast.net (EHLO [192.168.1.115]) ([67.170.167.181]) by smtp421.mail.ne1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA ID 44d2170f61a51d6c5540268ab4cad8d3; Sun, 03 Mar 2019 05:21:00 +0000 (UTC) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\)) Subject: powerpc64 on PowerMac G5 4-core (system total): a hack that so far seem to avoid the stuck-sleeping issue Message-Id: Date: Sat, 2 Mar 2019 21:20:58 -0800 To: FreeBSD PowerPC ML , Mark Millard via freebsd-hackers X-Mailer: Apple Mail (2.3445.102.3) X-Rspamd-Queue-Id: 974BE6EDE6 X-Spamd-Bar: +++ X-Spamd-Result: default: False [3.33 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; MV_CASE(0.50)[]; FREEMAIL_FROM(0.00)[yahoo.com]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; MX_GOOD(-0.01)[cached: mta6.am0.yahoodns.net]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36646, ipnet:66.163.184.0/21, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com.dwl.dnswl.org : 127.0.5.0]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; FROM_HAS_DN(0.00)[]; NEURAL_SPAM_SHORT(0.67)[0.672,0]; MIME_GOOD(-0.10)[text/plain]; IP_SCORE(1.33)[ip: (4.40), ipnet: 66.163.184.0/21(1.29), asn: 36646(1.03), country: US(-0.07)]; NEURAL_SPAM_MEDIUM(0.96)[0.956,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_SPAM_LONG(0.88)[0.879,0]; RCVD_IN_DNSWL_NONE(0.00)[47.184.163.66.list.dnswl.org : 127.0.5.0] X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Mar 2019 05:21:06 -0000 [This note goes in a different direction compared to my prior evidence report for overflows and the later activity that has been happening for it. This does *not* involve the patches associated with that report.] I view the following as an evidence-gathering hack: showing the change in behavior with the code changes, not as directly what FreeBSD should do for powerpc64. In code for defined(__powerpc64__) && defined(AIM) I freely use knowledge of the PowerMac G5 context instead of attempting general code. Also: the code is set up to record some information that I've been looking at via ddb. The recording is not part of what changes the behavior but I decided to show that code too. It is preliminary, but, so far, the hack has avoided buf*daemon* threads and pmac_thermal getting stuck sleeping (or, at least, far less frequently). The tbr-value hack: =46rom what I see the G5 various cores have each tbr running at the same rate but have some some offsets as far as the base time goes. cpu_mp_unleash does: ap_awake =3D 1; /* Provide our current DEC and TB values for APs */ ap_timebase =3D mftb() + 10; __asm __volatile("msync; isync"); /* Let APs continue */ atomic_store_rel_int(&ap_letgo, 1); platform_smp_timebase_sync(ap_timebase, 0); and machdep_ap_bootstrap does: /* * Set timebase as soon as possible to meet an implicit = rendezvous * from cpu_mp_unleash(), which sets ap_letgo and then = immediately * sets timebase. * * Note that this is instrinsically racy and is only relevant on * platforms that do not support better mechanisms. */ platform_smp_timebase_sync(ap_timebase, 1); which attempts to set the tbrs appropriately. But on small scales of differences the various tbr values from different cpus end up not well ordered relative to time, synchronizes with, and the like. Only large enough differences can well indicate an ordering of interest. Note: tc->tc_get_timecount(tc) only provides the least signficant 32 bits of the tbr value. th->th_offset_count is also 32 bits and based on truncated tbr values. So I made binuptime avoid finishing when it sees a small (<0x10) step backwards for a new tc->tc_get_timecount(tc) value vs. the existing th->th_offset_count value (values strongly tied to powerpc64 tbr values): void binuptime(struct bintime *bt) { struct timehands *th; u_int gen; struct bintime old_bt=3D *bt; // HACK!!! struct timecounter *tc; // HACK!!! u_int tim_cnt, tim_offset, tim_diff; // HACK!!! uint64_t freq, scale_factor, diff_scaled; // HACK!!! u_int try_cnt=3D 0ull; // HACK!!! do { do { // HACK!!! th =3D timehands; tc =3D th->th_counter; gen =3D atomic_load_acq_int(&th->th_generation); tim_cnt=3D tc->tc_get_timecount(tc); tim_offset=3D th->th_offset_count; } while (tim_cntth_offset; tim_diff=3D (tim_cnt - tim_offset) & = tc->tc_counter_mask; scale_factor=3D th->th_scale; diff_scaled=3D scale_factor * tim_diff; bintime_addx(bt, diff_scaled); freq=3D tc->tc_frequency; atomic_thread_fence_acq(); try_cnt++; } while (gen =3D=3D 0 || gen !=3D th->th_generation); if (*(volatile uint64_t*)0xc000000000000020=3D=3D0u && = (0xffffffffffffffffull/scale_factor)tc_get_timecount(tc) not actually indicating a useful < vs. =3D=3D vs. > ordering relation uniquely. (I make no claim that the hack is a proper way to deal with such.) =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) From owner-freebsd-ppc@freebsd.org Sun Mar 3 11:19:45 2019 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9A6E11506BBC; Sun, 3 Mar 2019 11:19:45 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 02A5180B42; Sun, 3 Mar 2019 11:19:44 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id x23BJWMX054208 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sun, 3 Mar 2019 13:19:36 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua x23BJWMX054208 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id x23BJVXN054206; Sun, 3 Mar 2019 13:19:31 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sun, 3 Mar 2019 13:19:31 +0200 From: Konstantin Belousov To: Bruce Evans Cc: Mark Millard , freebsd-hackers Hackers , FreeBSD PowerPC ML Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed] Message-ID: <20190303111931.GI68879@kib.kiev.ua> References: <962D78C3-65BE-40C1-BB50-A0088223C17B@yahoo.com> <28C2BB0A-3DAA-4D18-A317-49A8DD52778F@yahoo.com> <20190301112717.GW2420@kib.kiev.ua> <20190302043936.A4444@besplex.bde.org> <20190301194217.GB68879@kib.kiev.ua> <20190302071425.G5025@besplex.bde.org> <20190302105140.GC68879@kib.kiev.ua> <20190302225513.W3408@besplex.bde.org> <20190302142521.GE68879@kib.kiev.ua> <20190303041441.V4781@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190303041441.V4781@besplex.bde.org> User-Agent: Mutt/1.11.3 (2019-02-01) X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM, NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on tom.home X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Mar 2019 11:19:45 -0000 On Sun, Mar 03, 2019 at 04:43:20AM +1100, Bruce Evans wrote: > On Sat, 2 Mar 2019, Konstantin Belousov wrote: > > > On Sun, Mar 03, 2019 at 12:03:18AM +1100, Bruce Evans wrote: > >> On Sat, 2 Mar 2019, Konstantin Belousov wrote: > >>> ... > >>> So I am able to reproduce it with some surprising ease on HPET running > >>> on Haswell. > >> > >> So what is the cause of it? Maybe the tickless code doesn't generate > >> fake clock ticks right. Or it is just a library bug. The kernel has > >> to be slightly real-time to satisfy the requirement of 1 update per. > >> Applications are further from being real-time. But isn't it enough > >> for the kernel to ensure that the timehands cycle more than once per > >> second? > > No, I entered ddb as you suggested. > > But using ddb is not normal. It is convenient that this fixes HPET and > ACPI timecounters after using ddb, but this method doesn't help for > timecounters that wrap fast. TSC-low at 2GHz wraps in 2 seconds, and > i8254 wraps in a few milliseconds. > > >> I don't changing this at all this. binuptime() was carefully written > >> to not need so much 64-bit arithmetic. > >> > >> If this pessimization is allowed, then it can also handle a 64-bit > >> deltas. Using the better kernel method: > >> > >> if (__predict_false(delta >= th->th_large_delta)) { > >> bt->sec += (scale >> 32) * (delta >> 32); > >> x = (scale >> 32) * (delta & 0xffffffff); > >> bt->sec += x >> 32; > >> bintime_addx(bt, x << 32); > >> x = (scale & 0xffffffff) * (delta >> 32); > >> bt->sec += x >> 32; > >> bintime_addx(bt, x << 32); > >> bintime_addx(bt, (scale & 0xffffffff) * > >> (delta & 0xffffffff)); > >> } else > >> bintime_addx(bt, scale * (delta & 0xffffffff)); > > This only makes sense if delta is extended to uint64_t, which requires > > the pass over timecounters. > > Yes, that was its point. It is a bit annoying to have a hardware > timecounter like the TSC that doesn't wrap naturally, but then make it > wrap by masking high bits. > > The masking step is also a bit wasteful. For the TSC, it is 1 step to > discard high bids at the register level, then another step to apply the > nask to discard th high bits again. rdtsc-low is implemented in the natural way, after RDTSC, no register combining into 64bit value is done, instead shrd operates on %edx:%eax to get the final result into %eax. I am not sure what you refer to. > > >> I just noticed that there is a 64 x 32 -> 64 bit multiplication in the > >> current method. This can be changed to do expicit 32 x 32 -> 64 bit > >> multiplications and fix the overflow problem at small extra cost on > >> 32-bit arches: > >> > >> /* 32-bit arches did the next multiplication implicitly. */ > >> x = (scale >> 32) * delta; > >> /* > >> * And they did the following shifts and most of the adds > >> * implicitly too. Except shifting x left by 32 lost the > >> * seconds part that the next line handles. The next line > >> * is the only extra cost for them. > >> */ > >> bt->sec += x >> 32; > >> bintime_addx(bt, (x << 32) + (scale & 0xffffffff) * delta); > > > > Ok, what about the following. > > I'm not sure that I really want this, even if the pessimization is done. > But it avoids using fls*(), so is especially good for 32-bit systems and > OK for 64-bit systems too, especially in userland where fls*() is in the > fast path. For userland I looked at the generated code, and BSR usage seems to be good enough, for default compilation settings with clang. > > > > > diff --git a/lib/libc/sys/__vdso_gettimeofday.c b/lib/libc/sys/__vdso_gettimeofday.c > > index 3749e0473af..cfe3d96d001 100644 > > --- a/lib/libc/sys/__vdso_gettimeofday.c > > +++ b/lib/libc/sys/__vdso_gettimeofday.c > > @@ -32,6 +32,8 @@ __FBSDID("$FreeBSD$"); > > #include > > #include > > #include > > +#include > > Not needed with 0xffffffff instead of UINT_MAX. > > The userland part is otherwise little changed. Yes, see above. If ABI for shared page going to be changed in some future, I will export th_large_delta as well. > > > diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c > > index 2656fb4d22f..2e28f872229 100644 > > --- a/sys/kern/kern_tc.c > > +++ b/sys/kern/kern_tc.c > > ... > > @@ -351,17 +352,44 @@ fbclock_getmicrotime(struct timeval *tvp) > > } while (gen == 0 || gen != th->th_generation); > > } > > #else /* !FFCLOCK */ > > + > > +static void > > +bintime_helper(struct bintime *bt, uint64_t *scale, u_int delta) > > +{ > > + uint64_t x; > > + > > + x = (*scale >> 32) * delta; > > + *scale &= 0xffffffff; > > + bt->sec += x >> 32; > > + bintime_addx(bt, x << 32); > > +} > > It is probably best to not inline the slow path, but clang tends to > inline everything anyway. It does not matter if it inlines it, as far as it is moved out of the linear sequence for the fast path. > > I prefer my way of writing this in 3 lines. Modifying 'scale' for > the next step is especially ugly and pessimal when the next step is > in the caller and this function is not inlined. Can you show exactly what do you want ? > > > + > > void > > binuptime(struct bintime *bt) > > { > > struct timehands *th; > > - u_int gen; > > + uint64_t scale; > > + u_int delta, gen; > > > > do { > > th = timehands; > > gen = atomic_load_acq_int(&th->th_generation); > > *bt = th->th_offset; > > - bintime_addx(bt, th->th_scale * tc_delta(th)); > > + scale = th->th_scale; > > + delta = tc_delta(th); > > +#ifdef _LP64 > > + /* Avoid overflow for scale * delta. */ > > + if (__predict_false(th->th_large_delta <= delta)) > > + bintime_helper(bt, &scale, delta); > > + bintime_addx(bt, scale * delta); > > +#else > > + /* > > + * Also avoid (uint64_t, uint32_t) -> uint64_t > > + * multiplication on 32bit arches. > > + */ > > "Also avoid overflow for ..." > > > + bintime_helper(bt, &scale, delta); > > + bintime_addx(bt, (u_int)scale * delta); > > The cast should be to uint32_t, but better write it as & 0xffffffff as > elsewhere. > > bintime_helper() already reduced 'scale' to 32 bits. The cast might be > needed to tell the compiler this, especially when the function is not > inlined. Better not do it in the function. The function doesn't even > use the reduced value. I used cast to use 32x32 multiplication. I am not sure that all (or any) compilers are smart enough to deduce that they can use 32 bit mul. > > bintime_helper() is in the fast path in this case, so should be inlined. > > > +#endif > > atomic_thread_fence_acq(); > > } while (gen == 0 || gen != th->th_generation); > > } > > This needs lots of testing of course. Current kernel-only part of the change is below, see the question about your preference for binuptime_helper(). diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c index 2656fb4d22f..6c41ab22288 100644 --- a/sys/kern/kern_tc.c +++ b/sys/kern/kern_tc.c @@ -72,6 +71,7 @@ struct timehands { struct timecounter *th_counter; int64_t th_adjustment; uint64_t th_scale; + uint64_t th_large_delta; u_int th_offset_count; struct bintime th_offset; struct bintime th_bintime; @@ -351,17 +351,45 @@ fbclock_getmicrotime(struct timeval *tvp) } while (gen == 0 || gen != th->th_generation); } #else /* !FFCLOCK */ + +static void +bintime_helper(struct bintime *bt, uint64_t *scale, u_int delta) +{ + uint64_t x; + + x = (*scale >> 32) * delta; + *scale &= 0xffffffff; + bt->sec += x >> 32; + bintime_addx(bt, x << 32); +} + void binuptime(struct bintime *bt) { struct timehands *th; - u_int gen; + uint64_t scale; + u_int delta, gen; do { th = timehands; gen = atomic_load_acq_int(&th->th_generation); *bt = th->th_offset; - bintime_addx(bt, th->th_scale * tc_delta(th)); + scale = th->th_scale; + delta = tc_delta(th); +#ifdef _LP64 + /* Avoid overflow for scale * delta. */ + if (__predict_false(th->th_large_delta <= delta)) + bintime_helper(bt, &scale, delta); + bintime_addx(bt, scale * delta); +#else + /* + * Avoid both overflow as above and + * (uint64_t, uint32_t) -> uint64_t + * multiplication on 32bit arches. + */ + bintime_helper(bt, &scale, delta); + bintime_addx(bt, (uint32_t)scale * delta); +#endif atomic_thread_fence_acq(); } while (gen == 0 || gen != th->th_generation); } @@ -388,13 +416,29 @@ void bintime(struct bintime *bt) { struct timehands *th; - u_int gen; + uint64_t scale; + u_int delta, gen; do { th = timehands; gen = atomic_load_acq_int(&th->th_generation); *bt = th->th_bintime; - bintime_addx(bt, th->th_scale * tc_delta(th)); + scale = th->th_scale; + delta = tc_delta(th); +#ifdef _LP64 + /* Avoid overflow for scale * delta. */ + if (__predict_false(th->th_large_delta <= delta)) + bintime_helper(bt, &scale, delta); + bintime_addx(bt, scale * delta); +#else + /* + * Avoid both overflow as above and + * (uint64_t, uint32_t) -> uint64_t + * multiplication on 32bit arches. + */ + bintime_helper(bt, &scale, delta); + bintime_addx(bt, (uint32_t)scale * delta); +#endif atomic_thread_fence_acq(); } while (gen == 0 || gen != th->th_generation); } @@ -1464,6 +1508,7 @@ tc_windup(struct bintime *new_boottimebin) scale += (th->th_adjustment / 1024) * 2199; scale /= th->th_counter->tc_frequency; th->th_scale = scale * 2; + th->th_large_delta = ((uint64_t)1 << 63) / scale; /* * Now that the struct timehands is again consistent, set the new From owner-freebsd-ppc@freebsd.org Sun Mar 3 16:16:47 2019 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1BDCC150FB82; Sun, 3 Mar 2019 16:16:47 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 41FCD8A1A6; Sun, 3 Mar 2019 16:16:46 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id x23GGaML078609 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sun, 3 Mar 2019 18:16:39 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua x23GGaML078609 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id x23GGZF2078608; Sun, 3 Mar 2019 18:16:35 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sun, 3 Mar 2019 18:16:35 +0200 From: Konstantin Belousov To: Bruce Evans Cc: Mark Millard , freebsd-hackers Hackers , FreeBSD PowerPC ML Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed] Message-ID: <20190303161635.GJ68879@kib.kiev.ua> References: <20190301112717.GW2420@kib.kiev.ua> <20190302043936.A4444@besplex.bde.org> <20190301194217.GB68879@kib.kiev.ua> <20190302071425.G5025@besplex.bde.org> <20190302105140.GC68879@kib.kiev.ua> <20190302225513.W3408@besplex.bde.org> <20190302142521.GE68879@kib.kiev.ua> <20190303041441.V4781@besplex.bde.org> <20190303111931.GI68879@kib.kiev.ua> <20190303223100.B3572@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190303223100.B3572@besplex.bde.org> User-Agent: Mutt/1.11.3 (2019-02-01) X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM, NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on tom.home X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Mar 2019 16:16:47 -0000 On Mon, Mar 04, 2019 at 12:32:12AM +1100, Bruce Evans wrote: > On Sun, 3 Mar 2019, Konstantin Belousov wrote: > > > On Sun, Mar 03, 2019 at 04:43:20AM +1100, Bruce Evans wrote: > >> On Sat, 2 Mar 2019, Konstantin Belousov wrote: > >> > >>> On Sun, Mar 03, 2019 at 12:03:18AM +1100, Bruce Evans wrote: > >>>> On Sat, 2 Mar 2019, Konstantin Belousov wrote: > >* ... > >>>> I don't changing this at all this. binuptime() was carefully written > >>>> to not need so much 64-bit arithmetic. > >>>> > >>>> If this pessimization is allowed, then it can also handle a 64-bit > >>>> deltas. Using the better kernel method: > >>>> > >>>> if (__predict_false(delta >= th->th_large_delta)) { > >>>> bt->sec += (scale >> 32) * (delta >> 32); > >>>> x = (scale >> 32) * (delta & 0xffffffff); > >>>> bt->sec += x >> 32; > >>>> bintime_addx(bt, x << 32); > >>>> x = (scale & 0xffffffff) * (delta >> 32); > >>>> bt->sec += x >> 32; > >>>> bintime_addx(bt, x << 32); > >>>> bintime_addx(bt, (scale & 0xffffffff) * > >>>> (delta & 0xffffffff)); > >>>> } else > >>>> bintime_addx(bt, scale * (delta & 0xffffffff)); > >>> This only makes sense if delta is extended to uint64_t, which requires > >>> the pass over timecounters. > >> > >> Yes, that was its point. It is a bit annoying to have a hardware > >> timecounter like the TSC that doesn't wrap naturally, but then make it > >> wrap by masking high bits. > >> > >> The masking step is also a bit wasteful. For the TSC, it is 1 step to > >> discard high bids at the register level, then another step to apply the > >> nask to discard th high bits again. > > rdtsc-low is implemented in the natural way, after RDTSC, no register > > combining into 64bit value is done, instead shrd operates on %edx:%eax > > to get the final result into %eax. I am not sure what you refer to. > > I was referring mostly to the masking step '& tc->tc_counter_mask' and > the lack of register combining in rdtsc(). > > However, shrd in rdtsc-low (tsc_get_timecount_low()) does a slow combining > step. i386 used to be faster here -- the first masking step of discarding > %edx doesn't take any code. amd64 has to mask out the top bits in %rax. > Now for the tsc-low pessimization, i386 has to do a slow shrd, and amd64 > has to do a not so slow shr. i386 cannot discard %edx after RDTSC since some bits from %edx come into the timecounter value. amd64 cannot either, but amd64 does not need to mask out top bits in %rax, since the whole shrdl calculation occurs in 32bit registers, and the result is in %rax where top word is cleared by shrdl instruction automatically. But the clearing is not required since result is unsigned int anyway. Dissassemble of tsc_get_timecount_low() is very clear: 0xffffffff806767e4 <+4>: mov 0x30(%rdi),%ecx 0xffffffff806767e7 <+7>: rdtsc 0xffffffff806767e9 <+9>: shrd %cl,%edx,%eax ... 0xffffffff806767ed <+13>: retq (I removed frame manipulations). > > Then the '& tc->tc_counter_mask' step has no effect. This is true. > > All this is wrapped in many layers of function calls which are quite slow > but this lets the other operations run in parallel on some CPUs. > > >>>> /* 32-bit arches did the next multiplication implicitly. */ > >>>> x = (scale >> 32) * delta; > >>>> /* > >>>> * And they did the following shifts and most of the adds > >>>> * implicitly too. Except shifting x left by 32 lost the > >>>> * seconds part that the next line handles. The next line > >>>> * is the only extra cost for them. > >>>> */ > >>>> bt->sec += x >> 32; > >>>> bintime_addx(bt, (x << 32) + (scale & 0xffffffff) * delta); > >>> > >>> Ok, what about the following. > >> > >> I'm not sure that I really want this, even if the pessimization is done. > >> But it avoids using fls*(), so is especially good for 32-bit systems and > >> OK for 64-bit systems too, especially in userland where fls*() is in the > >> fast path. > > For userland I looked at the generated code, and BSR usage seems to be > > good enough, for default compilation settings with clang. > > I use gcc-4.2.1, and it doesn't do this optimization. > > I already reported this in connection with fixing calcru1(). calcru1() > is unnecessarily several times slower on i386 than on amd64 even after > avoiding using flsll() on it. The main slowness is in converting 'usec' > to tv_sec and tv_usec, due to the bad design and implementation of the > __udivdi3 and __umoddi3 libcalls. The bad design is having to make 2 > libcalls to get the quotient and remainder. The bad implementation is > the portable C version in libkern. libgcc provides a better implementation, > but this is not available in the kernel. > > >>> diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c > >>> index 2656fb4d22f..2e28f872229 100644 > >>> --- a/sys/kern/kern_tc.c > >>> +++ b/sys/kern/kern_tc.c > >>> ... > >>> @@ -351,17 +352,44 @@ fbclock_getmicrotime(struct timeval *tvp) > >>> } while (gen == 0 || gen != th->th_generation); > >>> } > >>> #else /* !FFCLOCK */ > >>> + > >>> +static void > >>> +bintime_helper(struct bintime *bt, uint64_t *scale, u_int delta) > >>> +{ > >>> + uint64_t x; > >>> + > >>> + x = (*scale >> 32) * delta; > >>> + *scale &= 0xffffffff; > >>> + bt->sec += x >> 32; > >>> + bintime_addx(bt, x << 32); > >>> +} > >> > >> It is probably best to not inline the slow path, but clang tends to > >> inline everything anyway. > > It does not matter if it inlines it, as far as it is moved out of the > > linear sequence for the fast path. > >> > >> I prefer my way of writing this in 3 lines. Modifying 'scale' for > >> the next step is especially ugly and pessimal when the next step is > >> in the caller and this function is not inlined. > > Can you show exactly what do you want ? > > Just write 'scale & 0xffffffff' for the low bits of 'scale' in callers, > and don't pass 'scale' indirectly to bintime_helper() and don't modify > it there. > > Oops, there is a problem. 'scale' must be reduced iff bintime_helper() > was used. Duplicate some source code so as to not need a fall-through > to the fast path. See below. Yes, this is the reason why it is passed by pointer (C has no references). > > >>> void > >>> binuptime(struct bintime *bt) > >>> { > >>> struct timehands *th; > >>> - u_int gen; > >>> + uint64_t scale; > >>> + u_int delta, gen; > >>> > >>> do { > >>> th = timehands; > >>> gen = atomic_load_acq_int(&th->th_generation); > >>> *bt = th->th_offset; > >>> - bintime_addx(bt, th->th_scale * tc_delta(th)); > >>> + scale = th->th_scale; > >>> + delta = tc_delta(th); > >>> +#ifdef _LP64 > >>> + /* Avoid overflow for scale * delta. */ > >>> + if (__predict_false(th->th_large_delta <= delta)) > >>> + bintime_helper(bt, &scale, delta); > >>> + bintime_addx(bt, scale * delta); > >>> +#else > >>> + /* > >>> + * Also avoid (uint64_t, uint32_t) -> uint64_t > >>> + * multiplication on 32bit arches. > >>> + */ > >> > >> "Also avoid overflow for ..." > >> > >>> + bintime_helper(bt, &scale, delta); > >>> + bintime_addx(bt, (u_int)scale * delta); > >> > >> The cast should be to uint32_t, but better write it as & 0xffffffff as > >> elsewhere. > > This is actually very broken. The cast gives a 32 x 32 -> 32 bit > multiplication, but all 64 bits of the result are needed. Yes, fixed in the updated version. > > >> > >> bintime_helper() already reduced 'scale' to 32 bits. The cast might be > >> needed to tell the compiler this, especially when the function is not > >> inlined. Better not do it in the function. The function doesn't even > >> use the reduced value. > > I used cast to use 32x32 multiplication. I am not sure that all (or any) > > compilers are smart enough to deduce that they can use 32 bit mul. > > Writing the reduction to 32 bits using a mask instead of a cast automatically > avoids the bug, but might not give the optimization. > > They do do this optimization, but might need the cast as well as the mask. > At worst, '(uint64_t)(uint32_t)(scale & 0xffffffff)', where the mask is > now redundant but the cast back to 64 bits is needed if the cast to 32 > bits is used. > > You already depended on them not needing the cast for the expression > '(*scale >> 32) * delta'. Here delta is 32 bits and the other operand > must remain 64 bits so that after default promotions the multiplication > is 64 x 64 -> 64 bits, but the compiler should optimize this to > 32 x 32 -> 64 bits. (*scale >> 32) would need to be cast to 32 bits > and then back to 64 bits if the compiler can't do this automatically. > > I checked what some compilers do. Both gcc-3.3.3 and gcc-4.2.1 > optimize only (uint64_t)x * y (where x and y have type uint32_t), so they > need to be helped by casts if x and y have have a larger type even if > their values obviously fit in 32 bits. So the expressions should be > written as: > > (uint64_t)(uint32_t)(scale >> 32) * delta; > > and > > (uint64_t)(uint32_t)scale * delta; > > The 2 casts are always needed, but the '& 0xffffffff' operation doesn't > need to be explicit because the cast does. This is what I do now. > > >> This needs lots of testing of course. > > > > Current kernel-only part of the change is below, see the question about > > your preference for binuptime_helper(). > > > > diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c > > index 2656fb4d22f..6c41ab22288 100644 > > --- a/sys/kern/kern_tc.c > > +++ b/sys/kern/kern_tc.c > > @@ -72,6 +71,7 @@ struct timehands { > > struct timecounter *th_counter; > > int64_t th_adjustment; > > uint64_t th_scale; > > + uint64_t th_large_delta; > > u_int th_offset_count; > > struct bintime th_offset; > > struct bintime th_bintime; > > @@ -351,17 +351,45 @@ fbclock_getmicrotime(struct timeval *tvp) > > } while (gen == 0 || gen != th->th_generation); > > } > > #else /* !FFCLOCK */ > > + > > +static void > > Add __inline. This is in the fast path for 32-bit systems. Compilers do not need this hand-holding, and I prefer to avoid __inline unless really necessary. I checked with both clang 7.0 and gcc 8.3 that autoinlining did occured. > > > +bintime_helper(struct bintime *bt, uint64_t *scale, u_int delta) > > +{ > > + uint64_t x; > > + > > + x = (*scale >> 32) * delta; > > + *scale &= 0xffffffff; > > Remove the '*' on scale, cast (scale >> 32) to > (uint64_t)(uint32_t)(scale >> 32), and remove the change to *scale. > > > + bt->sec += x >> 32; > > + bintime_addx(bt, x << 32); > > +} > > + > > void > > binuptime(struct bintime *bt) > > { > > struct timehands *th; > > - u_int gen; > > + uint64_t scale; > > + u_int delta, gen; > > > > do { > > th = timehands; > > gen = atomic_load_acq_int(&th->th_generation); > > *bt = th->th_offset; > > - bintime_addx(bt, th->th_scale * tc_delta(th)); > > + scale = th->th_scale; > > + delta = tc_delta(th); > > +#ifdef _LP64 > > + /* Avoid overflow for scale * delta. */ > > + if (__predict_false(th->th_large_delta <= delta)) > > + bintime_helper(bt, &scale, delta); > > + bintime_addx(bt, scale * delta); > > Change to: > > if (__predict_false(th->th_large_delta <= delta)) { > bintime_helper(bt, scale, delta); > bintime_addx(bt, (scale & 0xffffffff) * delta); > } else > bintime_addx(bt, scale * delta); I do not like it, but ok. > > > +#else > > + /* > > + * Avoid both overflow as above and > > + * (uint64_t, uint32_t) -> uint64_t > > + * multiplication on 32bit arches. > > + */ > > This is a bit unclear. Better emphasize avoidance of the 64 x 32 -> 64 bit > multiplication. Something like: > > /* > * Use bintime_helper() unconditionally, since the fast > * path in the above method is not so fast here, since > * the 64 x 32 -> 64 bit multiplication is usually not > * available in hardware and emulating it using 2 > * 32 x 32 -> 64 bit multiplications uses code much > * like that in bintime_helper(). > */ > > > + bintime_helper(bt, &scale, delta); > > + bintime_addx(bt, (uint32_t)scale * delta); > > +#endif > > Remove '&' as usual, and fix this by casting the reduced scale back to > 64 bits. > > Similarly in bintime(). I merged two functions, finally. Having to copy the same code is too annoying for this change. So I verified that: - there is no 64bit multiplication in the generated code, for i386 both for clang 7.0 and gcc 8.3; - that everything is inlined, the only call from bintime/binuptime is the indirect call to get the timecounter value. > > Similarly in libc -- don't use the slow flsll() method in the 32-bit > case where it is especially slow. Don't use it in the 64-bit case either, > since this would need to be change when th_large_delta is added to the > API. > > Now I don't like my method in the kernel. It is is unnecessarily > complicated to have a specal case, and not faster either. diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c index 2656fb4d22f..0fd39e25058 100644 --- a/sys/kern/kern_tc.c +++ b/sys/kern/kern_tc.c @@ -72,6 +72,7 @@ struct timehands { struct timecounter *th_counter; int64_t th_adjustment; uint64_t th_scale; + uint64_t th_large_delta; u_int th_offset_count; struct bintime th_offset; struct bintime th_bintime; @@ -351,21 +352,63 @@ fbclock_getmicrotime(struct timeval *tvp) } while (gen == 0 || gen != th->th_generation); } #else /* !FFCLOCK */ -void -binuptime(struct bintime *bt) + +static void +bintime_helper(struct bintime *bt, uint64_t scale, u_int delta) +{ + uint64_t x; + + x = (scale >> 32) * delta; + bt->sec += x >> 32; + bintime_addx(bt, x << 32); +} + +static void +binnouptime(struct bintime *bt, u_int off) { struct timehands *th; - u_int gen; + struct bintime *bts; + uint64_t scale; + u_int delta, gen; do { th = timehands; gen = atomic_load_acq_int(&th->th_generation); - *bt = th->th_offset; - bintime_addx(bt, th->th_scale * tc_delta(th)); + bts = (struct bintime *)(vm_offset_t)th + off; + *bt = *bts; + scale = th->th_scale; + delta = tc_delta(th); +#ifdef _LP64 + if (__predict_false(th->th_large_delta <= delta)) { + /* Avoid overflow for scale * delta. */ + bintime_helper(bt, scale, delta); + bintime_addx(bt, (scale & 0xffffffff) * delta); + } else { + bintime_addx(bt, scale * delta); + } +#else + /* + * Use bintime_helper() unconditionally, since the fast + * path in the above method is not so fast here, since + * the 64 x 32 -> 64 bit multiplication is usually not + * available in hardware and emulating it using 2 + * 32 x 32 -> 64 bit multiplications uses code much + * like that in bintime_helper(). + */ + bintime_helper(bt, scale, delta); + bintime_addx(bt, (uint64_t)(uint32_t)scale * delta); +#endif atomic_thread_fence_acq(); } while (gen == 0 || gen != th->th_generation); } +void +binuptime(struct bintime *bt) +{ + + binnouptime(bt, __offsetof(struct timehands, th_offset)); +} + void nanouptime(struct timespec *tsp) { @@ -387,16 +430,8 @@ microuptime(struct timeval *tvp) void bintime(struct bintime *bt) { - struct timehands *th; - u_int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *bt = th->th_bintime; - bintime_addx(bt, th->th_scale * tc_delta(th)); - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + binnouptime(bt, __offsetof(struct timehands, th_bintime)); } void @@ -1464,6 +1499,7 @@ tc_windup(struct bintime *new_boottimebin) scale += (th->th_adjustment / 1024) * 2199; scale /= th->th_counter->tc_frequency; th->th_scale = scale * 2; + th->th_large_delta = ((uint64_t)1 << 63) / scale; /* * Now that the struct timehands is again consistent, set the new From owner-freebsd-ppc@freebsd.org Sun Mar 3 13:32:23 2019 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D82A8150AD29; Sun, 3 Mar 2019 13:32:22 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246]) by mx1.freebsd.org (Postfix) with ESMTP id 2DFAB84F1C; Sun, 3 Mar 2019 13:32:21 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 2FB8F436AEC; Mon, 4 Mar 2019 00:32:12 +1100 (AEDT) Date: Mon, 4 Mar 2019 00:32:12 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Konstantin Belousov cc: Mark Millard , freebsd-hackers Hackers , FreeBSD PowerPC ML Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed] In-Reply-To: <20190303111931.GI68879@kib.kiev.ua> Message-ID: <20190303223100.B3572@besplex.bde.org> References: <962D78C3-65BE-40C1-BB50-A0088223C17B@yahoo.com> <28C2BB0A-3DAA-4D18-A317-49A8DD52778F@yahoo.com> <20190301112717.GW2420@kib.kiev.ua> <20190302043936.A4444@besplex.bde.org> <20190301194217.GB68879@kib.kiev.ua> <20190302071425.G5025@besplex.bde.org> <20190302105140.GC68879@kib.kiev.ua> <20190302225513.W3408@besplex.bde.org> <20190302142521.GE68879@kib.kiev.ua> <20190303041441.V4781@besplex.bde.org> <20190303111931.GI68879@kib.kiev.ua> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=FNpr/6gs c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=L2uf15vNulIdqj9DapQA:9 a=CjuIK1q_8ugA:10 X-Rspamd-Queue-Id: 2DFAB84F1C X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [-6.90 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; NEURAL_HAM_SHORT(-0.90)[-0.900,0]; REPLY(-4.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0] X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Mar 2019 13:32:23 -0000 On Sun, 3 Mar 2019, Konstantin Belousov wrote: > On Sun, Mar 03, 2019 at 04:43:20AM +1100, Bruce Evans wrote: >> On Sat, 2 Mar 2019, Konstantin Belousov wrote: >> >>> On Sun, Mar 03, 2019 at 12:03:18AM +1100, Bruce Evans wrote: >>>> On Sat, 2 Mar 2019, Konstantin Belousov wrote: >* ... >>>> I don't changing this at all this. binuptime() was carefully written >>>> to not need so much 64-bit arithmetic. >>>> >>>> If this pessimization is allowed, then it can also handle a 64-bit >>>> deltas. Using the better kernel method: >>>> >>>> if (__predict_false(delta >= th->th_large_delta)) { >>>> bt->sec += (scale >> 32) * (delta >> 32); >>>> x = (scale >> 32) * (delta & 0xffffffff); >>>> bt->sec += x >> 32; >>>> bintime_addx(bt, x << 32); >>>> x = (scale & 0xffffffff) * (delta >> 32); >>>> bt->sec += x >> 32; >>>> bintime_addx(bt, x << 32); >>>> bintime_addx(bt, (scale & 0xffffffff) * >>>> (delta & 0xffffffff)); >>>> } else >>>> bintime_addx(bt, scale * (delta & 0xffffffff)); >>> This only makes sense if delta is extended to uint64_t, which requires >>> the pass over timecounters. >> >> Yes, that was its point. It is a bit annoying to have a hardware >> timecounter like the TSC that doesn't wrap naturally, but then make it >> wrap by masking high bits. >> >> The masking step is also a bit wasteful. For the TSC, it is 1 step to >> discard high bids at the register level, then another step to apply the >> nask to discard th high bits again. > rdtsc-low is implemented in the natural way, after RDTSC, no register > combining into 64bit value is done, instead shrd operates on %edx:%eax > to get the final result into %eax. I am not sure what you refer to. I was referring mostly to the masking step '& tc->tc_counter_mask' and the lack of register combining in rdtsc(). However, shrd in rdtsc-low (tsc_get_timecount_low()) does a slow combining step. i386 used to be faster here -- the first masking step of discarding %edx doesn't take any code. amd64 has to mask out the top bits in %rax. Now for the tsc-low pessimization, i386 has to do a slow shrd, and amd64 has to do a not so slow shr. Then the '& tc->tc_counter_mask' step has no effect. All this is wrapped in many layers of function calls which are quite slow but this lets the other operations run in parallel on some CPUs. >>>> /* 32-bit arches did the next multiplication implicitly. */ >>>> x = (scale >> 32) * delta; >>>> /* >>>> * And they did the following shifts and most of the adds >>>> * implicitly too. Except shifting x left by 32 lost the >>>> * seconds part that the next line handles. The next line >>>> * is the only extra cost for them. >>>> */ >>>> bt->sec += x >> 32; >>>> bintime_addx(bt, (x << 32) + (scale & 0xffffffff) * delta); >>> >>> Ok, what about the following. >> >> I'm not sure that I really want this, even if the pessimization is done. >> But it avoids using fls*(), so is especially good for 32-bit systems and >> OK for 64-bit systems too, especially in userland where fls*() is in the >> fast path. > For userland I looked at the generated code, and BSR usage seems to be > good enough, for default compilation settings with clang. I use gcc-4.2.1, and it doesn't do this optimization. I already reported this in connection with fixing calcru1(). calcru1() is unnecessarily several times slower on i386 than on amd64 even after avoiding using flsll() on it. The main slowness is in converting 'usec' to tv_sec and tv_usec, due to the bad design and implementation of the __udivdi3 and __umoddi3 libcalls. The bad design is having to make 2 libcalls to get the quotient and remainder. The bad implementation is the portable C version in libkern. libgcc provides a better implementation, but this is not available in the kernel. >>> diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c >>> index 2656fb4d22f..2e28f872229 100644 >>> --- a/sys/kern/kern_tc.c >>> +++ b/sys/kern/kern_tc.c >>> ... >>> @@ -351,17 +352,44 @@ fbclock_getmicrotime(struct timeval *tvp) >>> } while (gen == 0 || gen != th->th_generation); >>> } >>> #else /* !FFCLOCK */ >>> + >>> +static void >>> +bintime_helper(struct bintime *bt, uint64_t *scale, u_int delta) >>> +{ >>> + uint64_t x; >>> + >>> + x = (*scale >> 32) * delta; >>> + *scale &= 0xffffffff; >>> + bt->sec += x >> 32; >>> + bintime_addx(bt, x << 32); >>> +} >> >> It is probably best to not inline the slow path, but clang tends to >> inline everything anyway. > It does not matter if it inlines it, as far as it is moved out of the > linear sequence for the fast path. >> >> I prefer my way of writing this in 3 lines. Modifying 'scale' for >> the next step is especially ugly and pessimal when the next step is >> in the caller and this function is not inlined. > Can you show exactly what do you want ? Just write 'scale & 0xffffffff' for the low bits of 'scale' in callers, and don't pass 'scale' indirectly to bintime_helper() and don't modify it there. Oops, there is a problem. 'scale' must be reduced iff bintime_helper() was used. Duplicate some source code so as to not need a fall-through to the fast path. See below. >>> void >>> binuptime(struct bintime *bt) >>> { >>> struct timehands *th; >>> - u_int gen; >>> + uint64_t scale; >>> + u_int delta, gen; >>> >>> do { >>> th = timehands; >>> gen = atomic_load_acq_int(&th->th_generation); >>> *bt = th->th_offset; >>> - bintime_addx(bt, th->th_scale * tc_delta(th)); >>> + scale = th->th_scale; >>> + delta = tc_delta(th); >>> +#ifdef _LP64 >>> + /* Avoid overflow for scale * delta. */ >>> + if (__predict_false(th->th_large_delta <= delta)) >>> + bintime_helper(bt, &scale, delta); >>> + bintime_addx(bt, scale * delta); >>> +#else >>> + /* >>> + * Also avoid (uint64_t, uint32_t) -> uint64_t >>> + * multiplication on 32bit arches. >>> + */ >> >> "Also avoid overflow for ..." >> >>> + bintime_helper(bt, &scale, delta); >>> + bintime_addx(bt, (u_int)scale * delta); >> >> The cast should be to uint32_t, but better write it as & 0xffffffff as >> elsewhere. This is actually very broken. The cast gives a 32 x 32 -> 32 bit multiplication, but all 64 bits of the result are needed. >> >> bintime_helper() already reduced 'scale' to 32 bits. The cast might be >> needed to tell the compiler this, especially when the function is not >> inlined. Better not do it in the function. The function doesn't even >> use the reduced value. > I used cast to use 32x32 multiplication. I am not sure that all (or any) > compilers are smart enough to deduce that they can use 32 bit mul. Writing the reduction to 32 bits using a mask instead of a cast automatically avoids the bug, but might not give the optimization. They do do this optimization, but might need the cast as well as the mask. At worst, '(uint64_t)(uint32_t)(scale & 0xffffffff)', where the mask is now redundant but the cast back to 64 bits is needed if the cast to 32 bits is used. You already depended on them not needing the cast for the expression '(*scale >> 32) * delta'. Here delta is 32 bits and the other operand must remain 64 bits so that after default promotions the multiplication is 64 x 64 -> 64 bits, but the compiler should optimize this to 32 x 32 -> 64 bits. (*scale >> 32) would need to be cast to 32 bits and then back to 64 bits if the compiler can't do this automatically. I checked what some compilers do. Both gcc-3.3.3 and gcc-4.2.1 optimize only (uint64_t)x * y (where x and y have type uint32_t), so they need to be helped by casts if x and y have have a larger type even if their values obviously fit in 32 bits. So the expressions should be written as: (uint64_t)(uint32_t)(scale >> 32) * delta; and (uint64_t)(uint32_t)scale * delta; The 2 casts are always needed, but the '& 0xffffffff' operation doesn't need to be explicit because the cast does. >> This needs lots of testing of course. > > Current kernel-only part of the change is below, see the question about > your preference for binuptime_helper(). > > diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c > index 2656fb4d22f..6c41ab22288 100644 > --- a/sys/kern/kern_tc.c > +++ b/sys/kern/kern_tc.c > @@ -72,6 +71,7 @@ struct timehands { > struct timecounter *th_counter; > int64_t th_adjustment; > uint64_t th_scale; > + uint64_t th_large_delta; > u_int th_offset_count; > struct bintime th_offset; > struct bintime th_bintime; > @@ -351,17 +351,45 @@ fbclock_getmicrotime(struct timeval *tvp) > } while (gen == 0 || gen != th->th_generation); > } > #else /* !FFCLOCK */ > + > +static void Add __inline. This is in the fast path for 32-bit systems. > +bintime_helper(struct bintime *bt, uint64_t *scale, u_int delta) > +{ > + uint64_t x; > + > + x = (*scale >> 32) * delta; > + *scale &= 0xffffffff; Remove the '*' on scale, cast (scale >> 32) to (uint64_t)(uint32_t)(scale >> 32), and remove the change to *scale. > + bt->sec += x >> 32; > + bintime_addx(bt, x << 32); > +} > + > void > binuptime(struct bintime *bt) > { > struct timehands *th; > - u_int gen; > + uint64_t scale; > + u_int delta, gen; > > do { > th = timehands; > gen = atomic_load_acq_int(&th->th_generation); > *bt = th->th_offset; > - bintime_addx(bt, th->th_scale * tc_delta(th)); > + scale = th->th_scale; > + delta = tc_delta(th); > +#ifdef _LP64 > + /* Avoid overflow for scale * delta. */ > + if (__predict_false(th->th_large_delta <= delta)) > + bintime_helper(bt, &scale, delta); > + bintime_addx(bt, scale * delta); Change to: if (__predict_false(th->th_large_delta <= delta)) { bintime_helper(bt, scale, delta); bintime_addx(bt, (scale & 0xffffffff) * delta); } else bintime_addx(bt, scale * delta); > +#else > + /* > + * Avoid both overflow as above and > + * (uint64_t, uint32_t) -> uint64_t > + * multiplication on 32bit arches. > + */ This is a bit unclear. Better emphasize avoidance of the 64 x 32 -> 64 bit multiplication. Something like: /* * Use bintime_helper() unconditionally, since the fast * path in the above method is not so fast here, since * the 64 x 32 -> 64 bit multiplication is usually not * available in hardware and emulating it using 2 * 32 x 32 -> 64 bit multiplications uses code much * like that in bintime_helper(). */ > + bintime_helper(bt, &scale, delta); > + bintime_addx(bt, (uint32_t)scale * delta); > +#endif Remove '&' as usual, and fix this by casting the reduced scale back to 64 bits. Similarly in bintime(). Similarly in libc -- don't use the slow flsll() method in the 32-bit case where it is especially slow. Don't use it in the 64-bit case either, since this would need to be change when th_large_delta is added to the API. Now I don't like my method in the kernel. It is is unnecessarily complicated to have a specal case, and not faster either. Bruce From owner-freebsd-ppc@freebsd.org Sun Mar 3 21:33:40 2019 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 50DC1151CACB for ; Sun, 3 Mar 2019 21:33:40 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic305-49.consmr.mail.ne1.yahoo.com (sonic305-49.consmr.mail.ne1.yahoo.com [66.163.185.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 3A7556E393 for ; Sun, 3 Mar 2019 21:33:39 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: qD9TOnMVM1k.Ghd4.8T_W8XTbUVNtLtatAuJ7hQZ5Ks1VpuxiRcKsrOf8r66Bod mP9F5P4PQQGh22H8HbhChBHcmXxKQ6SA_llrTDQY1oawxWa4RUDk- Received: from sonic.gate.mail.ne1.yahoo.com by sonic305.consmr.mail.ne1.yahoo.com with HTTP; Sun, 3 Mar 2019 21:33:31 +0000 Received: from c-67-170-167-181.hsd1.or.comcast.net (EHLO [192.168.1.115]) ([67.170.167.181]) by smtp410.mail.ne1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA ID 908822f7437e5714b55851d382380e1a; Sun, 03 Mar 2019 21:23:06 +0000 (UTC) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\)) Subject: Re: powerpc64 on PowerMac G5 4-core (system total): a hack that so far seem to avoid the stuck-sleeping issue [self-hosted buildworld/buildkernel completed] Date: Sun, 3 Mar 2019 13:23:04 -0800 References: To: FreeBSD PowerPC ML , Mark Millard via freebsd-hackers In-Reply-To: Message-Id: <76E8BF75-A8F5-4A48-9B7C-6494F4A9520B@yahoo.com> X-Mailer: Apple Mail (2.3445.102.3) X-Rspamd-Queue-Id: 3A7556E393 X-Spamd-Bar: / X-Spamd-Result: default: False [-0.66 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; MV_CASE(0.50)[]; FREEMAIL_FROM(0.00)[yahoo.com]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_ALL(0.00)[]; MX_GOOD(-0.01)[cached: mta6.am0.yahoodns.net]; DKIM_TRACE(0.00)[yahoo.com:+]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36646, ipnet:66.163.184.0/21, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com.dwl.dnswl.org : 127.0.5.0]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-0.30)[-0.299,0]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; FROM_HAS_DN(0.00)[]; NEURAL_SPAM_SHORT(0.58)[0.583,0]; NEURAL_HAM_LONG(-0.88)[-0.883,0]; MIME_GOOD(-0.10)[text/plain]; IP_SCORE(0.45)[ipnet: 66.163.184.0/21(1.29), asn: 36646(1.04), country: US(-0.07)]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[175.185.163.66.list.dnswl.org : 127.0.5.0] X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Mar 2019 21:33:40 -0000 [So far the hack has been successful. Details given later below.] On 2019-Mar-2, at 21:20, Mark Millard wrote: > [This note goes in a different direction compared to my > prior evidence report for overflows and the later activity > that has been happening for it. This does *not* involve > the patches associated with that report.] >=20 > I view the following as an evidence-gathering hack: > showing the change in behavior with the code changes, > not as directly what FreeBSD should do for powerpc64. > In code for defined(__powerpc64__) && defined(AIM) > I freely use knowledge of the PowerMac G5 context > instead of attempting general code. >=20 > Also: the code is set up to record some information > that I've been looking at via ddb. The recording is > not part of what changes the behavior but I decided > to show that code too. >=20 > It is preliminary, but, so far, the hack has avoided > buf*daemon* threads and pmac_thermal getting stuck > sleeping (or, at least, far less frequently). >=20 >=20 > The tbr-value hack: >=20 > =46rom what I see the G5 various cores have each tbr running at the > same rate but have some some offsets as far as the base time > goes. cpu_mp_unleash does: >=20 > ap_awake =3D 1; >=20 > /* Provide our current DEC and TB values for APs */ > ap_timebase =3D mftb() + 10; > __asm __volatile("msync; isync"); >=20 > /* Let APs continue */ > atomic_store_rel_int(&ap_letgo, 1); >=20 > platform_smp_timebase_sync(ap_timebase, 0); >=20 > and machdep_ap_bootstrap does: >=20 > /* > * Set timebase as soon as possible to meet an implicit = rendezvous > * from cpu_mp_unleash(), which sets ap_letgo and then = immediately > * sets timebase. > * > * Note that this is instrinsically racy and is only relevant = on > * platforms that do not support better mechanisms. > */ > platform_smp_timebase_sync(ap_timebase, 1); >=20 >=20 > which attempts to set the tbrs appropriately. >=20 > But on small scales of differences the various tbr > values from different cpus end up not well ordered > relative to time, synchronizes with, and the like. > Only large enough differences can well indicate an > ordering of interest. >=20 > Note: tc->tc_get_timecount(tc) only provides the > least signficant 32 bits of the tbr value. > th->th_offset_count is also 32 bits and based on > truncated tbr values. >=20 > So I made binuptime avoid finishing when it sees > a small (<0x10) step backwards for a new > tc->tc_get_timecount(tc) value vs. the existing > th->th_offset_count value (values strongly tied > to powerpc64 tbr values): >=20 > void > binuptime(struct bintime *bt) > { > struct timehands *th; > u_int gen; >=20 > struct bintime old_bt=3D *bt; // HACK!!! > struct timecounter *tc; // HACK!!! > u_int tim_cnt, tim_offset, tim_diff; // HACK!!! > uint64_t freq, scale_factor, diff_scaled; // HACK!!! >=20 > u_int try_cnt=3D 0ull; // HACK!!! >=20 > do { > do { // HACK!!! > th =3D timehands; > tc =3D th->th_counter; > gen =3D atomic_load_acq_int(&th->th_generation); > tim_cnt=3D tc->tc_get_timecount(tc); > tim_offset=3D th->th_offset_count; > } while (tim_cnt *bt =3D th->th_offset; > tim_diff=3D (tim_cnt - tim_offset) & = tc->tc_counter_mask; > scale_factor=3D th->th_scale; > diff_scaled=3D scale_factor * tim_diff; > bintime_addx(bt, diff_scaled); > freq=3D tc->tc_frequency; > atomic_thread_fence_acq(); > try_cnt++; > } while (gen =3D=3D 0 || gen !=3D th->th_generation); >=20 > if (*(volatile uint64_t*)0xc000000000000020=3D=3D0u && = (0xffffffffffffffffull/scale_factor) *(volatile uint64_t*)0xc000000000000020=3D = bttosbt(old_bt); > *(volatile uint64_t*)0xc000000000000028=3D = bttosbt(*bt); > *(volatile uint64_t*)0xc000000000000030=3D freq; > *(volatile uint64_t*)0xc000000000000038=3D = scale_factor; > *(volatile uint64_t*)0xc000000000000040=3D tim_offset; > *(volatile uint64_t*)0xc000000000000048=3D tim_cnt; > *(volatile uint64_t*)0xc000000000000050=3D tim_diff; > *(volatile uint64_t*)0xc000000000000058=3D try_cnt; > *(volatile uint64_t*)0xc000000000000060=3D diff_scaled; > *(volatile uint64_t*)0xc000000000000068=3D = scale_factor*freq; > __asm__ ("sync"); > } else if (*(volatile uint64_t*)0xc0000000000000a0=3D=3D0u && = (0xffffffffffffffffull/scale_factor) *(volatile uint64_t*)0xc0000000000000a0=3D = bttosbt(old_bt); > *(volatile uint64_t*)0xc0000000000000a8=3D = bttosbt(*bt); > *(volatile uint64_t*)0xc0000000000000b0=3D freq; > *(volatile uint64_t*)0xc0000000000000b8=3D = scale_factor; > *(volatile uint64_t*)0xc0000000000000c0=3D tim_offset; > *(volatile uint64_t*)0xc0000000000000c8=3D tim_cnt; > *(volatile uint64_t*)0xc0000000000000d0=3D tim_diff; > *(volatile uint64_t*)0xc0000000000000d8=3D try_cnt; > *(volatile uint64_t*)0xc0000000000000e0=3D diff_scaled; > *(volatile uint64_t*)0xc0000000000000e8=3D = scale_factor*freq; > __asm__ ("sync"); > } > } > #else > . . . > #endif >=20 > So far as I can tell, the FreeBSD code is not designed to deal > with small differences in tc->tc_get_timecount(tc) not actually > indicating a useful < vs. =3D=3D vs. > ordering relation uniquely. >=20 > (I make no claim that the hack is a proper way to deal with > such.) I did a somewhat over 7 hours buildworld buildkernel on the PowerMac G5. Overall the G5 has been up over 13 hours and none of the buf*daemon* threads have gotten stuck sleeping. Nor has pmac_thermal gotten stuck. Similarly for vnlru and syncer: "top -HIStopid" still shows them all as periodically active. Previously for this usefdt=3D1 context (with the modern VM_MAX_KERNEL_ADDRESS), going more than a few minutes without at least one of those threads getting stuck sleeping was rare on the G5 (powerpc64 example). So this hack has managed to avoid finding sbinuptime() in sleepq_timeout being less than the earlier (by call structure/code sequencing) sbinuptime() in timercb that lead to the sleepq_timeout callout being called in the first place. So in the sleepq_timeout callout's: if (td->td_sleeptimo > sbinuptime() || td->td_sleeptimo =3D=3D = 0) { /* * The thread does not want a timeout (yet). */ } else . . . td->td_sleeptimo > sbinuptime() ends up false now for small enough original differences. This case does not set up another timeout, it just leaves the thread stuck sleeping, no longer doing periodic activities. As stands what I did (presuming an appropriate definition of "small differences in the problematical direction") should leave this and other sbinuptime-using code with: td->td_sleeptimo <=3D sbinuptime() for what were originally "small" tbr value differences in the problematical direction (in case other places require it in some way). If, instead, just sleepq_timeout's test could allow for some slop in the ordering, it could be a cheaper hack then looping in binuptime . At this point I've no clue what a correct/efficient FreeBSD design for allowing the sloppy match across tbr's for different CPUs would be. =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) From owner-freebsd-ppc@freebsd.org Sun Mar 3 18:29:54 2019 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8A3861513E35; Sun, 3 Mar 2019 18:29:54 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246]) by mx1.freebsd.org (Postfix) with ESMTP id C60318DC84; Sun, 3 Mar 2019 18:29:53 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id F3142433301; Mon, 4 Mar 2019 05:29:49 +1100 (AEDT) Date: Mon, 4 Mar 2019 05:29:48 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Konstantin Belousov cc: Bruce Evans , Mark Millard , freebsd-hackers Hackers , FreeBSD PowerPC ML Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed] In-Reply-To: <20190303161635.GJ68879@kib.kiev.ua> Message-ID: <20190304043416.V5640@besplex.bde.org> References: <20190301112717.GW2420@kib.kiev.ua> <20190302043936.A4444@besplex.bde.org> <20190301194217.GB68879@kib.kiev.ua> <20190302071425.G5025@besplex.bde.org> <20190302105140.GC68879@kib.kiev.ua> <20190302225513.W3408@besplex.bde.org> <20190302142521.GE68879@kib.kiev.ua> <20190303041441.V4781@besplex.bde.org> <20190303111931.GI68879@kib.kiev.ua> <20190303223100.B3572@besplex.bde.org> <20190303161635.GJ68879@kib.kiev.ua> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=FNpr/6gs c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=8yM2XH24hrI5ozH3vLgA:9 a=CjuIK1q_8ugA:10 X-Rspamd-Queue-Id: C60318DC84 X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [-6.97 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; NEURAL_HAM_SHORT(-0.97)[-0.973,0]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; REPLY(-4.00)[] X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Mar 2019 18:29:54 -0000 On Sun, 3 Mar 2019, Konstantin Belousov wrote: > On Mon, Mar 04, 2019 at 12:32:12AM +1100, Bruce Evans wrote: >> On Sun, 3 Mar 2019, Konstantin Belousov wrote: >> >>> On Sun, Mar 03, 2019 at 04:43:20AM +1100, Bruce Evans wrote: >>>> On Sat, 2 Mar 2019, Konstantin Belousov wrote: >>>> >>>>> On Sun, Mar 03, 2019 at 12:03:18AM +1100, Bruce Evans wrote: >>>>>> On Sat, 2 Mar 2019, Konstantin Belousov wrote: > * ... >>>> Yes, that was its point. It is a bit annoying to have a hardware >>>> timecounter like the TSC that doesn't wrap naturally, but then make it >>>> wrap by masking high bits. >>>> >>>> The masking step is also a bit wasteful. For the TSC, it is 1 step to >>>> discard high bids at the register level, then another step to apply the >>>> nask to discard th high bits again. >>> rdtsc-low is implemented in the natural way, after RDTSC, no register >>> combining into 64bit value is done, instead shrd operates on %edx:%eax >>> to get the final result into %eax. I am not sure what you refer to. >> >> I was referring mostly to the masking step '& tc->tc_counter_mask' and >> the lack of register combining in rdtsc(). >> >> However, shrd in rdtsc-low (tsc_get_timecount_low()) does a slow combining >> step. i386 used to be faster here -- the first masking step of discarding >> %edx doesn't take any code. amd64 has to mask out the top bits in %rax. >> Now for the tsc-low pessimization, i386 has to do a slow shrd, and amd64 >> has to do a not so slow shr. > i386 cannot discard %edx after RDTSC since some bits from %edx come into > the timecounter value. These bits are part of the tsc-low pessimization. The shift count should always be 1, giving a TSC frequency of > INT32_MAX (usually) and > UINT32_MAX sometimes. When tsc-low was new, the shift count was often larger (as much as 8), and it is still changeable by a read-only tunable, but now it is 1 in almost all cases. The code only limits the timecounter frequency to UINT_MAX, except the tunable defaults to 1 so average CPUs running at nearly 4 GHz are usually limited to about 2 GHz. The comment about this UINT_MAX doesn't match the code. The comment says int, but the code says UINT. All that a shoft count of 1 does is waste time to lose 1 bit of accuracy. This much accuracy is noise for most purposes. The tunable is fairly undocumented. Its description is "Shift to apply for the maximum TSC frequency". Of course, it has no effect on the TSC frequency. It only affects the TSC timecounter frequency. The cputicker normally uses the TSC without even an lfence. This use only has to be monotonic per-CPU, so this is OK. Also, any bugs hidden by discarding low bits shouldn't show up per-CPU. However, keeping the cputicker below 4G actually has some efficiency advantages. For timecounters, there are no multiplications or divisions by the frequency in the fast path, but cputicker use isn't so optimized and it does a slow 64-bit division in cputick2usec(). Keeping cpu_tick_freqency below UINT_MAX allows dividing by it in integer arithmetic in some cases, This optimization is not done. > amd64 cannot either, but amd64 does not need to mask out top bits in %rax, > since the whole shrdl calculation occurs in 32bit registers, and the result > is in %rax where top word is cleared by shrdl instruction automatically. > But the clearing is not required since result is unsigned int anyway. > > Dissassemble of tsc_get_timecount_low() is very clear: > 0xffffffff806767e4 <+4>: mov 0x30(%rdi),%ecx > 0xffffffff806767e7 <+7>: rdtsc > 0xffffffff806767e9 <+9>: shrd %cl,%edx,%eax > ... > 0xffffffff806767ed <+13>: retq > (I removed frame manipulations). It would without the shift pessimization, since the function returns uint32_t but rdtsc() gives uint64_t. Removing the top bits is not needed since tc_delta() removes them again, but the API doesn't allow expressing this. Without the shift pessimization, we just do rdtsc() in all cases and don't need this function call. I think this is about 5-10 cycles faster after some parallelism. >>>> I prefer my way of writing this in 3 lines. Modifying 'scale' for >>>> the next step is especially ugly and pessimal when the next step is >>>> in the caller and this function is not inlined. >>> Can you show exactly what do you want ? >> >> Just write 'scale & 0xffffffff' for the low bits of 'scale' in callers, >> and don't pass 'scale' indirectly to bintime_helper() and don't modify >> it there. >> >> Oops, there is a problem. 'scale' must be reduced iff bintime_helper() >> was used. Duplicate some source code so as to not need a fall-through >> to the fast path. See below. > Yes, this is the reason why it is passed by pointer (C has no references). The indirection is slow no matter how it is spelled, unless it is inlined away. >>> diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c >>> index 2656fb4d22f..6c41ab22288 100644 >>> --- a/sys/kern/kern_tc.c >>> +++ b/sys/kern/kern_tc.c >>> @@ -72,6 +71,7 @@ struct timehands { >>> struct timecounter *th_counter; >>> int64_t th_adjustment; >>> uint64_t th_scale; >>> + uint64_t th_large_delta; >>> u_int th_offset_count; >>> struct bintime th_offset; >>> struct bintime th_bintime; >>> @@ -351,17 +351,45 @@ fbclock_getmicrotime(struct timeval *tvp) >>> } while (gen == 0 || gen != th->th_generation); >>> } >>> #else /* !FFCLOCK */ >>> + >>> +static void >> >> Add __inline. This is in the fast path for 32-bit systems. > Compilers do not need this hand-holding, and I prefer to avoid __inline > unless really necessary. I checked with both clang 7.0 and gcc 8.3 > that autoinlining did occured. But they do. I don't use either of these compilers, and turn of inlining as much as possible anyway using -fno-inline -fno-inline-functions-called- once (this is very broken in clang -- -fno-inline turns off inlining of even functions declared as __inline (like curthread), and clang doesn't support -fno-inline -fno-inline-functions-called-once. >> ... >> Similarly in bintime(). > I merged two functions, finally. Having to copy the same code is too > annoying for this change. > > So I verified that: > - there is no 64bit multiplication in the generated code, for i386 both > for clang 7.0 and gcc 8.3; > - that everything is inlined, the only call from bintime/binuptime is > the indirect call to get the timecounter value. I will have to fix it for compilers that I use. > diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c > index 2656fb4d22f..0fd39e25058 100644 > --- a/sys/kern/kern_tc.c > +++ b/sys/kern/kern_tc.c + ... > +static void > +binnouptime(struct bintime *bt, u_int off) > { > struct timehands *th; > - u_int gen; > + struct bintime *bts; > + uint64_t scale; > + u_int delta, gen; > > do { > th = timehands; > gen = atomic_load_acq_int(&th->th_generation); > - *bt = th->th_offset; > - bintime_addx(bt, th->th_scale * tc_delta(th)); > + bts = (struct bintime *)(vm_offset_t)th + off; I don't like the merging. It obscures the code with conversions like this. > + *bt = *bts; > + scale = th->th_scale; > + delta = tc_delta(th); > +#ifdef _LP64 > + if (__predict_false(th->th_large_delta <= delta)) { > + /* Avoid overflow for scale * delta. */ > + bintime_helper(bt, scale, delta); > + bintime_addx(bt, (scale & 0xffffffff) * delta); > + } else { > + bintime_addx(bt, scale * delta); > + } > +#else > + /* > + * Use bintime_helper() unconditionally, since the fast > + * path in the above method is not so fast here, since > + * the 64 x 32 -> 64 bit multiplication is usually not > + * available in hardware and emulating it using 2 > + * 32 x 32 -> 64 bit multiplications uses code much > + * like that in bintime_helper(). > + */ > + bintime_helper(bt, scale, delta); > + bintime_addx(bt, (uint64_t)(uint32_t)scale * delta); > +#endif Check that this method is really better. Without this, the complicated part is about half as large and duplicating it is smaller than this version. > @@ -387,16 +430,8 @@ microuptime(struct timeval *tvp) > void > bintime(struct bintime *bt) > { > - struct timehands *th; > - u_int gen; > > - do { > - th = timehands; > - gen = atomic_load_acq_int(&th->th_generation); > - *bt = th->th_bintime; > - bintime_addx(bt, th->th_scale * tc_delta(th)); > - atomic_thread_fence_acq(); > - } while (gen == 0 || gen != th->th_generation); Duplicating this loop is much better than obfuscating it using inline functions. This loop was almost duplicated (except for the delta calculation) in no less than 17 functions in kern_tc.c (9 tc ones and 8 fflock ones). Now it is only duplicated 16 times. > + binnouptime(bt, __offsetof(struct timehands, th_bintime)); > } > > void Bruce From owner-freebsd-ppc@freebsd.org Mon Mar 4 08:47:24 2019 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7D5E51507CA8 for ; Mon, 4 Mar 2019 08:47:24 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from freefall.freebsd.org (freefall.freebsd.org [96.47.72.132]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "freefall.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 1A0EC8E1C4 for ; Mon, 4 Mar 2019 08:47:24 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: by freefall.freebsd.org (Postfix) id 0065E6236; Mon, 4 Mar 2019 08:47:24 +0000 (UTC) Delivered-To: powerpc@localmail.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (Client CN "mx1.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by freefall.freebsd.org (Postfix) with ESMTPS id EFF826235 for ; Mon, 4 Mar 2019 08:47:23 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.ysv.freebsd.org (mxrelay.ysv.freebsd.org [IPv6:2001:1900:2254:206a::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.ysv.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B54868E1C0 for ; Mon, 4 Mar 2019 08:47:23 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.ysv.freebsd.org (Postfix) with ESMTPS id C85ECA5BA for ; Mon, 4 Mar 2019 08:47:22 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id x248lMM9015015 for ; Mon, 4 Mar 2019 08:47:22 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id x248lMYQ015014 for powerpc@FreeBSD.org; Mon, 4 Mar 2019 08:47:22 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: powerpc@FreeBSD.org Subject: [Bug 236188] devel/boost-libs and BOOST_MATH_NO_LONG_DOUBLE_MATH_FUNCTIONS Date: Mon, 04 Mar 2019 08:47:22 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Ports & Packages X-Bugzilla-Component: Individual Port(s) X-Bugzilla-Version: Latest X-Bugzilla-Keywords: needs-patch, needs-qa X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: jbeich@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: office@FreeBSD.org X-Bugzilla-Flags: maintainer-feedback? X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-Rspamd-Queue-Id: 1A0EC8E1C4 X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [-2.98 / 15.00]; local_wl_from(0.00)[freebsd.org]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; NEURAL_HAM_SHORT(-0.98)[-0.980,0]; ASN(0.00)[asn:11403, ipnet:96.47.64.0/20, country:US]; NEURAL_HAM_LONG(-1.00)[-1.000,0] X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Mar 2019 08:47:24 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D236188 Jan Beich changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |powerpc@FreeBSD.org --- Comment #2 from Jan Beich --- Can someone on powerpc (preferably, 32bit) check the following sample? $ cat a.cc // from https://github.com/lballabio/QuantLib/pull/597#issuecomment-4667168= 05 #include #include using namespace boost::math; int main() { const double q =3D 0.3142; std::cout << std::setprecision(16)=20 << quantile( non_central_chi_squared_distribution(3.0,1.0),q)=20 << std::endl; } $ pkg install boost-libs $ c++ a.cc -isystem/usr/local/include $ ./a.out 2.034589723572673 --=20 You are receiving this mail because: You are on the CC list for the bug.= From owner-freebsd-ppc@freebsd.org Mon Mar 4 09:40:32 2019 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 78EF91509776 for ; Mon, 4 Mar 2019 09:40:32 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic309-20.consmr.mail.ne1.yahoo.com (sonic309-20.consmr.mail.ne1.yahoo.com [66.163.184.146]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id D66C68FD78 for ; Mon, 4 Mar 2019 09:40:29 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: l_LH1.sVM1kq1w61EeTqzGXhGfMI5E8B25qx4egeP4xLq0SeNmBAQYIGYRTjCb3 D.pJ7qyIX2j6HvKkuC_l9huJ6ImJfvdmUT75tYr0FfCDXz5zb3yeqzfDzKADbRxu2JMG4Y7oDZVw bfUBjZawDv5PUm.Dt0gECgyLSRl4yPBGTXnhiqjYQt9_Qhl2CiUyBiPkCZx8sR98onE84H5FbYBk 8o508qJc3I7ADmQfprKhnWl9mGrRnbvTN60LBINS46IXxXIMBXHaK8qJoPrd2mn7KdiLbxCY0bIq H.rLzlvKMFULcGwLrVGn3SqR4mTMqzPzvUH8fSgF5Jq.5Ntrm88CR9jG4VNm7GOH.OjdDACkP1rN poSBtZqLH3Ne5I81HCAcc10YgKqfDV3QPc_LT9zSl.F5qYr0uL9A1AlCrrIavaXUCZJQyT_z8Uqv 56fe2Ugm4elEc3MD7XgKXFKZRF5zHgVtMugLMCuyvnw2DIR14FHzk9vN8b0YBIuG8ys7vQb8oKav HnTQ4yko6I4eWqKEKjnOz07tvF409LKHxuhPHl0Ga42kyLTsFLmMcjWDkLqjisv2yc.hyUbA1re. lPTJa9o1u6XODGfre2ypNFW5ftObUmR..CjOciWoWp6QZO.odCOSS_cuxkaBqoE18OrCFSmUvU.i GvSHD__0vzSYwN_O0NJiz3iHHX1JhZAeHu1u87fs77hL4kgMf2KYdL9DuHGg8YXsVU4X7n.goMBb O95iSehu1BA4OqrU_Vfku8MzRXGYI09Kk3w9W4z0XnKqi_PrDnJ8Bex6tGlREL5crikAfPzjXZ5D 3TJsS6bZzcc_pKQ6ky9fl09I63EZEpRZzUo_S6nm71aoH69lQlcfhJQ3O7i0n80u0SpKBJlfxE5N KhOcQPDur7Tx_QkV6r5zR9004OOmSctivkWo6hLOzCCY046.8u5jD9djBp1_1.8du51oKFimrwEg bZnbqTxPeOQVebYBEa6kccai0C1SP82retwGWuE_unmdvLmO2jTBeOs13a05CHOJkYu7yJdiMeqf 3tAP9LCQ7tdvDaYGVgwvQ8XXFpHpoX69_QnOJmwz4CA-- Received: from sonic.gate.mail.ne1.yahoo.com by sonic309.consmr.mail.ne1.yahoo.com with HTTP; Mon, 4 Mar 2019 09:40:23 +0000 Received: from c-67-170-167-181.hsd1.or.comcast.net (EHLO [192.168.1.115]) ([67.170.167.181]) by smtp413.mail.ne1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA ID d5eeb14818ac1606459c94027e379899; Mon, 04 Mar 2019 09:40:19 +0000 (UTC) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\)) Subject: Re: powerpc64 on PowerMac G5 4-core (system total): a hack that so far seem to avoid the stuck-sleeping issue [self-hosted buildworld/buildkernel completed] Date: Mon, 4 Mar 2019 01:40:18 -0800 References: <76E8BF75-A8F5-4A48-9B7C-6494F4A9520B@yahoo.com> To: FreeBSD PowerPC ML , Mark Millard via freebsd-hackers In-Reply-To: <76E8BF75-A8F5-4A48-9B7C-6494F4A9520B@yahoo.com> Message-Id: <75A8BB07-3273-423E-9436-798395BC8640@yahoo.com> X-Mailer: Apple Mail (2.3445.102.3) X-Rspamd-Queue-Id: D66C68FD78 X-Spamd-Bar: +++ X-Spamd-Result: default: False [3.37 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; MV_CASE(0.50)[]; FREEMAIL_FROM(0.00)[yahoo.com]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_ALL(0.00)[]; MX_GOOD(-0.01)[cached: mta6.am0.yahoodns.net]; DKIM_TRACE(0.00)[yahoo.com:+]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36646, ipnet:66.163.184.0/21, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com.dwl.dnswl.org : 127.0.5.0]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; FROM_HAS_DN(0.00)[]; NEURAL_SPAM_SHORT(0.96)[0.960,0]; MIME_GOOD(-0.10)[text/plain]; IP_SCORE(1.28)[ip: (4.16), ipnet: 66.163.184.0/21(1.29), asn: 36646(1.04), country: US(-0.07)]; NEURAL_SPAM_MEDIUM(0.75)[0.754,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_SPAM_LONG(0.89)[0.886,0]; RCVD_IN_DNSWL_NONE(0.00)[146.184.163.66.list.dnswl.org : 127.0.5.0] X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Mar 2019 09:40:32 -0000 [I did some testing of other figures than testing for < 0x10.] On 2019-Mar-3, at 13:23, Mark Millard wrote: > [So far the hack has been successful. Details given later > below.] >=20 > On 2019-Mar-2, at 21:20, Mark Millard wrote: >=20 >> [This note goes in a different direction compared to my >> prior evidence report for overflows and the later activity >> that has been happening for it. This does *not* involve >> the patches associated with that report.] >>=20 >> I view the following as an evidence-gathering hack: >> showing the change in behavior with the code changes, >> not as directly what FreeBSD should do for powerpc64. >> In code for defined(__powerpc64__) && defined(AIM) >> I freely use knowledge of the PowerMac G5 context >> instead of attempting general code. >>=20 >> Also: the code is set up to record some information >> that I've been looking at via ddb. The recording is >> not part of what changes the behavior but I decided >> to show that code too. >>=20 >> It is preliminary, but, so far, the hack has avoided >> buf*daemon* threads and pmac_thermal getting stuck >> sleeping (or, at least, far less frequently). >>=20 >>=20 >> The tbr-value hack: >>=20 >> =46rom what I see the G5 various cores have each tbr running at the >> same rate but have some some offsets as far as the base time >> goes. cpu_mp_unleash does: >>=20 >> ap_awake =3D 1; >>=20 >> /* Provide our current DEC and TB values for APs */ >> ap_timebase =3D mftb() + 10; >> __asm __volatile("msync; isync"); >>=20 >> /* Let APs continue */ >> atomic_store_rel_int(&ap_letgo, 1); >>=20 >> platform_smp_timebase_sync(ap_timebase, 0); >>=20 >> and machdep_ap_bootstrap does: >>=20 >> /* >> * Set timebase as soon as possible to meet an implicit = rendezvous >> * from cpu_mp_unleash(), which sets ap_letgo and then = immediately >> * sets timebase. >> * >> * Note that this is instrinsically racy and is only relevant = on >> * platforms that do not support better mechanisms. >> */ >> platform_smp_timebase_sync(ap_timebase, 1); >>=20 >>=20 >> which attempts to set the tbrs appropriately. >>=20 >> But on small scales of differences the various tbr >> values from different cpus end up not well ordered >> relative to time, synchronizes with, and the like. >> Only large enough differences can well indicate an >> ordering of interest. >>=20 >> Note: tc->tc_get_timecount(tc) only provides the >> least signficant 32 bits of the tbr value. >> th->th_offset_count is also 32 bits and based on >> truncated tbr values. >>=20 >> So I made binuptime avoid finishing when it sees >> a small (<0x10) step backwards for a new >> tc->tc_get_timecount(tc) value vs. the existing >> th->th_offset_count value (values strongly tied >> to powerpc64 tbr values): >>=20 >> void >> binuptime(struct bintime *bt) >> { >> struct timehands *th; >> u_int gen; >>=20 >> struct bintime old_bt=3D *bt; // HACK!!! >> struct timecounter *tc; // HACK!!! >> u_int tim_cnt, tim_offset, tim_diff; // HACK!!! >> uint64_t freq, scale_factor, diff_scaled; // HACK!!! >>=20 >> u_int try_cnt=3D 0ull; // HACK!!! >>=20 >> do { >> do { // HACK!!! >> th =3D timehands; >> tc =3D th->th_counter; >> gen =3D atomic_load_acq_int(&th->th_generation); >> tim_cnt=3D tc->tc_get_timecount(tc); >> tim_offset=3D th->th_offset_count; >> } while (tim_cnt> *bt =3D th->th_offset; >> tim_diff=3D (tim_cnt - tim_offset) & = tc->tc_counter_mask; >> scale_factor=3D th->th_scale; >> diff_scaled=3D scale_factor * tim_diff; >> bintime_addx(bt, diff_scaled); >> freq=3D tc->tc_frequency; >> atomic_thread_fence_acq(); >> try_cnt++; >> } while (gen =3D=3D 0 || gen !=3D th->th_generation); >>=20 >> if (*(volatile uint64_t*)0xc000000000000020=3D=3D0u && = (0xffffffffffffffffull/scale_factor)> *(volatile uint64_t*)0xc000000000000020=3D = bttosbt(old_bt); >> *(volatile uint64_t*)0xc000000000000028=3D = bttosbt(*bt); >> *(volatile uint64_t*)0xc000000000000030=3D freq; >> *(volatile uint64_t*)0xc000000000000038=3D = scale_factor; >> *(volatile uint64_t*)0xc000000000000040=3D tim_offset; >> *(volatile uint64_t*)0xc000000000000048=3D tim_cnt; >> *(volatile uint64_t*)0xc000000000000050=3D tim_diff; >> *(volatile uint64_t*)0xc000000000000058=3D try_cnt; >> *(volatile uint64_t*)0xc000000000000060=3D diff_scaled; >> *(volatile uint64_t*)0xc000000000000068=3D = scale_factor*freq; >> __asm__ ("sync"); >> } else if (*(volatile uint64_t*)0xc0000000000000a0=3D=3D0u && = (0xffffffffffffffffull/scale_factor)> *(volatile uint64_t*)0xc0000000000000a0=3D = bttosbt(old_bt); >> *(volatile uint64_t*)0xc0000000000000a8=3D = bttosbt(*bt); >> *(volatile uint64_t*)0xc0000000000000b0=3D freq; >> *(volatile uint64_t*)0xc0000000000000b8=3D = scale_factor; >> *(volatile uint64_t*)0xc0000000000000c0=3D tim_offset; >> *(volatile uint64_t*)0xc0000000000000c8=3D tim_cnt; >> *(volatile uint64_t*)0xc0000000000000d0=3D tim_diff; >> *(volatile uint64_t*)0xc0000000000000d8=3D try_cnt; >> *(volatile uint64_t*)0xc0000000000000e0=3D diff_scaled; >> *(volatile uint64_t*)0xc0000000000000e8=3D = scale_factor*freq; >> __asm__ ("sync"); >> } >> } >> #else >> . . . >> #endif >>=20 >> So far as I can tell, the FreeBSD code is not designed to deal >> with small differences in tc->tc_get_timecount(tc) not actually >> indicating a useful < vs. =3D=3D vs. > ordering relation uniquely. >>=20 >> (I make no claim that the hack is a proper way to deal with >> such.) >=20 > I did a somewhat over 7 hours buildworld buildkernel on the > PowerMac G5. Overall the G5 has been up over 13 hours and > none of the buf*daemon* threads have gotten stuck sleeping. > Nor has pmac_thermal gotten stuck. Similarly for vnlru > and syncer: "top -HIStopid" still shows them all as > periodically active. >=20 > Previously for this usefdt=3D1 context (with the modern > VM_MAX_KERNEL_ADDRESS), going more than a few minutes > without at least one of those threads getting stuck > sleeping was rare on the G5 (powerpc64 example). >=20 > So this hack has managed to avoid finding sbinuptime() > in sleepq_timeout being less than the earlier (by call > structure/code sequencing) sbinuptime() in timercb that > lead to the sleepq_timeout callout being called in the > first place. >=20 > So in the sleepq_timeout callout's: >=20 > if (td->td_sleeptimo > sbinuptime() || td->td_sleeptimo =3D=3D = 0) { > /* > * The thread does not want a timeout (yet). > */ > } else . . . >=20 > td->td_sleeptimo > sbinuptime() ends up false now for small > enough original differences. >=20 > This case does not set up another timeout, it just leaves the > thread stuck sleeping, no longer doing periodic activities. >=20 > As stands what I did (presuming an appropriate definition > of "small differences in the problematical direction") should > leave this and other sbinuptime-using code with: >=20 > td->td_sleeptimo <=3D sbinuptime() >=20 > for what were originally "small" tbr value differences in the > problematical direction (in case other places require it in > some way). >=20 > If, instead, just sleepq_timeout's test could allow for > some slop in the ordering, it could be a cheaper hack then > looping in binuptime . >=20 > At this point I've no clue what a correct/efficient FreeBSD > design for allowing the sloppy match across tbr's for different > CPUs would be. Instead of 0x10 in "&& tim_offset-tim_cnt<0x10" I tried the each of following and they all failed: && tim_offset-tim_cnt<0x2 && tim_offset-tim_cnt<0x4 && tim_offset-tim_cnt<0x8 && tim_offset-tim_cnt<0xc 0x2, 0x4, and 0x8 failed for the first boot attempt, almost mediately having stuck-in-sleep threads. 0xc seemed to be working for the first boot (including a buildworld buildkernel that did not have to rebuild much). But the 2nd boot attempt had a stuck-in-sleep thread by the time I logged in. By contrast, for: && tim_offset-tim_cnt<0x10 I've not it fail so far, after many reboots, a full buildworld buildkernel, and running over 24 hours (that included the somewhat over 7 hours for build world buildkernel). But it might be that some boots would need a bigger figure. =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) From owner-freebsd-ppc@freebsd.org Mon Mar 4 11:42:00 2019 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 82D9F150DC4D; Mon, 4 Mar 2019 11:42:00 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C4A1C95862; Mon, 4 Mar 2019 11:41:59 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id x24BfplY084864 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Mon, 4 Mar 2019 13:41:54 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua x24BfplY084864 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id x24BfopB084863; Mon, 4 Mar 2019 13:41:50 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 4 Mar 2019 13:41:50 +0200 From: Konstantin Belousov To: Bruce Evans Cc: Mark Millard , freebsd-hackers Hackers , FreeBSD PowerPC ML Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed] Message-ID: <20190304114150.GM68879@kib.kiev.ua> References: <20190301194217.GB68879@kib.kiev.ua> <20190302071425.G5025@besplex.bde.org> <20190302105140.GC68879@kib.kiev.ua> <20190302225513.W3408@besplex.bde.org> <20190302142521.GE68879@kib.kiev.ua> <20190303041441.V4781@besplex.bde.org> <20190303111931.GI68879@kib.kiev.ua> <20190303223100.B3572@besplex.bde.org> <20190303161635.GJ68879@kib.kiev.ua> <20190304043416.V5640@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190304043416.V5640@besplex.bde.org> User-Agent: Mutt/1.11.3 (2019-02-01) X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM, NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on tom.home X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Mar 2019 11:42:00 -0000 On Mon, Mar 04, 2019 at 05:29:48AM +1100, Bruce Evans wrote: > On Sun, 3 Mar 2019, Konstantin Belousov wrote: > > > On Mon, Mar 04, 2019 at 12:32:12AM +1100, Bruce Evans wrote: > >> On Sun, 3 Mar 2019, Konstantin Belousov wrote: > >> > >>> On Sun, Mar 03, 2019 at 04:43:20AM +1100, Bruce Evans wrote: > >>>> On Sat, 2 Mar 2019, Konstantin Belousov wrote: > >>>> > >>>>> On Sun, Mar 03, 2019 at 12:03:18AM +1100, Bruce Evans wrote: > >>>>>> On Sat, 2 Mar 2019, Konstantin Belousov wrote: > > * ... > >>>> Yes, that was its point. It is a bit annoying to have a hardware > >>>> timecounter like the TSC that doesn't wrap naturally, but then make it > >>>> wrap by masking high bits. > >>>> > >>>> The masking step is also a bit wasteful. For the TSC, it is 1 step to > >>>> discard high bids at the register level, then another step to apply the > >>>> nask to discard th high bits again. > >>> rdtsc-low is implemented in the natural way, after RDTSC, no register > >>> combining into 64bit value is done, instead shrd operates on %edx:%eax > >>> to get the final result into %eax. I am not sure what you refer to. > >> > >> I was referring mostly to the masking step '& tc->tc_counter_mask' and > >> the lack of register combining in rdtsc(). > >> > >> However, shrd in rdtsc-low (tsc_get_timecount_low()) does a slow combining > >> step. i386 used to be faster here -- the first masking step of discarding > >> %edx doesn't take any code. amd64 has to mask out the top bits in %rax. > >> Now for the tsc-low pessimization, i386 has to do a slow shrd, and amd64 > >> has to do a not so slow shr. > > i386 cannot discard %edx after RDTSC since some bits from %edx come into > > the timecounter value. > > These bits are part of the tsc-low pessimization. The shift count should > always be 1, giving a TSC frequency of > INT32_MAX (usually) and > UINT32_MAX > sometimes. > > When tsc-low was new, the shift count was often larger (as much as 8), > and it is still changeable by a read-only tunable, but now it is 1 in > almost all cases. The code only limits the timecounter frequency > to UINT_MAX, except the tunable defaults to 1 so average CPUs running > at nearly 4 GHz are usually limited to about 2 GHz. The comment about > this UINT_MAX doesn't match the code. The comment says int, but the > code says UINT. > > All that a shoft count of 1 does is waste time to lose 1 bit of accuracy. > This much accuracy is noise for most purposes. > > The tunable is fairly undocumented. Its description is "Shift to apply > for the maximum TSC frequency". Of course, it has no effect on the TSC > frequency. It only affects the TSC timecounter frequency. I suspect that the shift of 1 (at least) hides cross-socket inaccuracy. Otherwise, I think, some multi-socket machines would start showing the detectable backward-counting bintime(). At the frequencies at 4GHz and above (Intel has 5Ghz part numbers) I do not think that stability of 100MHz crystall and on-board traces is enough to avoid that. We can try to set the tsc-low shift count to 0 (but keep lfence) and see what is going on in HEAD, but I am afraid that the HEAD users population is not representative enough to catch the issue with the certainity. More, it is unclear to me how to diagnose the cause, e.g. I would expect the sleeps to hang on timeouts, as was reported from the very beginning of this thread. How would we root-cause it ? > > The cputicker normally uses the TSC without even an lfence. This use > only has to be monotonic per-CPU, so this is OK. Also, any bugs hidden > by discarding low bits shouldn't show up per-CPU. However, keeping > the cputicker below 4G actually has some efficiency advantages. For > timecounters, there are no multiplications or divisions by the frequency > in the fast path, but cputicker use isn't so optimized and it does a > slow 64-bit division in cputick2usec(). Keeping cpu_tick_freqency > below UINT_MAX allows dividing by it in integer arithmetic in some cases, > This optimization is not done. > > > amd64 cannot either, but amd64 does not need to mask out top bits in %rax, > > since the whole shrdl calculation occurs in 32bit registers, and the result > > is in %rax where top word is cleared by shrdl instruction automatically. > > But the clearing is not required since result is unsigned int anyway. > > > > Dissassemble of tsc_get_timecount_low() is very clear: > > 0xffffffff806767e4 <+4>: mov 0x30(%rdi),%ecx > > 0xffffffff806767e7 <+7>: rdtsc > > 0xffffffff806767e9 <+9>: shrd %cl,%edx,%eax > > ... > > 0xffffffff806767ed <+13>: retq > > (I removed frame manipulations). > > It would without the shift pessimization, since the function returns uint32_t > but rdtsc() gives uint64_t. Removing the top bits is not needed since > tc_delta() removes them again, but the API doesn't allow expressing this. > > Without the shift pessimization, we just do rdtsc() in all cases and don't > need this function call. I think this is about 5-10 cycles faster after > some parallelism. > > >>>> I prefer my way of writing this in 3 lines. Modifying 'scale' for > >>>> the next step is especially ugly and pessimal when the next step is > >>>> in the caller and this function is not inlined. > >>> Can you show exactly what do you want ? > >> > >> Just write 'scale & 0xffffffff' for the low bits of 'scale' in callers, > >> and don't pass 'scale' indirectly to bintime_helper() and don't modify > >> it there. > >> > >> Oops, there is a problem. 'scale' must be reduced iff bintime_helper() > >> was used. Duplicate some source code so as to not need a fall-through > >> to the fast path. See below. > > Yes, this is the reason why it is passed by pointer (C has no references). > > The indirection is slow no matter how it is spelled, unless it is inlined > away. > > >>> diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c > >>> index 2656fb4d22f..6c41ab22288 100644 > >>> --- a/sys/kern/kern_tc.c > >>> +++ b/sys/kern/kern_tc.c > >>> @@ -72,6 +71,7 @@ struct timehands { > >>> struct timecounter *th_counter; > >>> int64_t th_adjustment; > >>> uint64_t th_scale; > >>> + uint64_t th_large_delta; > >>> u_int th_offset_count; > >>> struct bintime th_offset; > >>> struct bintime th_bintime; > >>> @@ -351,17 +351,45 @@ fbclock_getmicrotime(struct timeval *tvp) > >>> } while (gen == 0 || gen != th->th_generation); > >>> } > >>> #else /* !FFCLOCK */ > >>> + > >>> +static void > >> > >> Add __inline. This is in the fast path for 32-bit systems. > > Compilers do not need this hand-holding, and I prefer to avoid __inline > > unless really necessary. I checked with both clang 7.0 and gcc 8.3 > > that autoinlining did occured. > > But they do. I don't use either of these compilers, and turn of inlining > as much as possible anyway using -fno-inline -fno-inline-functions-called- > once (this is very broken in clang -- -fno-inline turns off inlining of > even functions declared as __inline (like curthread), and clang doesn't > support -fno-inline -fno-inline-functions-called-once. > > >> ... > >> Similarly in bintime(). > > I merged two functions, finally. Having to copy the same code is too > > annoying for this change. > > > > So I verified that: > > - there is no 64bit multiplication in the generated code, for i386 both > > for clang 7.0 and gcc 8.3; > > - that everything is inlined, the only call from bintime/binuptime is > > the indirect call to get the timecounter value. > > I will have to fix it for compilers that I use. Ok, I will add __inline. > > > diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c > > index 2656fb4d22f..0fd39e25058 100644 > > --- a/sys/kern/kern_tc.c > > +++ b/sys/kern/kern_tc.c > + ... > > +static void > > +binnouptime(struct bintime *bt, u_int off) > > { > > struct timehands *th; > > - u_int gen; > > + struct bintime *bts; > > + uint64_t scale; > > + u_int delta, gen; > > > > do { > > th = timehands; > > gen = atomic_load_acq_int(&th->th_generation); > > - *bt = th->th_offset; > > - bintime_addx(bt, th->th_scale * tc_delta(th)); > > + bts = (struct bintime *)(vm_offset_t)th + off; > > I don't like the merging. It obscures the code with conversions like this. > > > + *bt = *bts; > > + scale = th->th_scale; > > + delta = tc_delta(th); > > +#ifdef _LP64 > > + if (__predict_false(th->th_large_delta <= delta)) { > > + /* Avoid overflow for scale * delta. */ > > + bintime_helper(bt, scale, delta); > > + bintime_addx(bt, (scale & 0xffffffff) * delta); > > + } else { > > + bintime_addx(bt, scale * delta); > > + } > > +#else > > + /* > > + * Use bintime_helper() unconditionally, since the fast > > + * path in the above method is not so fast here, since > > + * the 64 x 32 -> 64 bit multiplication is usually not > > + * available in hardware and emulating it using 2 > > + * 32 x 32 -> 64 bit multiplications uses code much > > + * like that in bintime_helper(). > > + */ > > + bintime_helper(bt, scale, delta); > > + bintime_addx(bt, (uint64_t)(uint32_t)scale * delta); > > +#endif > > Check that this method is really better. Without this, the complicated > part is about half as large and duplicating it is smaller than this > version. Better in what sence ? I am fine with the C code, and asm code looks good. > > > @@ -387,16 +430,8 @@ microuptime(struct timeval *tvp) > > void > > bintime(struct bintime *bt) > > { > > - struct timehands *th; > > - u_int gen; > > > > - do { > > - th = timehands; > > - gen = atomic_load_acq_int(&th->th_generation); > > - *bt = th->th_bintime; > > - bintime_addx(bt, th->th_scale * tc_delta(th)); > > - atomic_thread_fence_acq(); > > - } while (gen == 0 || gen != th->th_generation); > > Duplicating this loop is much better than obfuscating it using inline > functions. This loop was almost duplicated (except for the delta > calculation) in no less than 17 functions in kern_tc.c (9 tc ones and > 8 fflock ones). Now it is only duplicated 16 times. How did you counted the 16 ? I can see only 4 instances in the unpatched kern_tc.c, and 3 in patched, but it is 3 and not 1 only because I do not touch ffclock until the patch is finalized. After that, it would be 1 instance for kernel and 1 for userspace. > > > + binnouptime(bt, __offsetof(struct timehands, th_bintime)); > > } > > > > void > > Bruce From owner-freebsd-ppc@freebsd.org Mon Mar 4 13:38:19 2019 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 16A711512067 for ; Mon, 4 Mar 2019 13:38:19 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from freefall.freebsd.org (freefall.freebsd.org [96.47.72.132]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "freefall.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B14CE6B51B for ; Mon, 4 Mar 2019 13:38:18 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: by freefall.freebsd.org (Postfix) id 69FC2B69B; Mon, 4 Mar 2019 13:38:18 +0000 (UTC) Delivered-To: powerpc@localmail.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (Client CN "mx1.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by freefall.freebsd.org (Postfix) with ESMTPS id 61C74B69A for ; Mon, 4 Mar 2019 13:38:18 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.ysv.freebsd.org (mxrelay.ysv.freebsd.org [IPv6:2001:1900:2254:206a::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.ysv.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 245966B518 for ; Mon, 4 Mar 2019 13:38:18 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.ysv.freebsd.org (Postfix) with ESMTPS id 64C2AD236 for ; Mon, 4 Mar 2019 13:38:17 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id x24DcHZ8073948 for ; Mon, 4 Mar 2019 13:38:17 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id x24DcHx1073947 for powerpc@FreeBSD.org; Mon, 4 Mar 2019 13:38:17 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: powerpc@FreeBSD.org Subject: [Bug 236188] devel/boost-libs and BOOST_MATH_NO_LONG_DOUBLE_MATH_FUNCTIONS Date: Mon, 04 Mar 2019 13:38:17 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Ports & Packages X-Bugzilla-Component: Individual Port(s) X-Bugzilla-Version: Latest X-Bugzilla-Keywords: needs-patch, needs-qa X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: danfe@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: office@FreeBSD.org X-Bugzilla-Flags: maintainer-feedback? X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-Rspamd-Queue-Id: B14CE6B51B X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [-2.99 / 15.00]; local_wl_from(0.00)[freebsd.org]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; NEURAL_HAM_SHORT(-0.99)[-0.989,0]; ASN(0.00)[asn:11403, ipnet:96.47.64.0/20, country:US]; NEURAL_HAM_LONG(-1.00)[-1.000,0] X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Mar 2019 13:38:19 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D236188 Alexey Dokuchaev changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |danfe@FreeBSD.org --- Comment #3 from Alexey Dokuchaev --- c++ won't work on FreeBSD/powerpc, because it is gcc 4.2.1. I've built fre= sh devel/boost-libs using g++7, so it goes like this: $ g++7 a.cc -isystem/usr/local/include -Wl,-rpath=3D/usr/local/lib/gcc7 $ ./a.out 2.034589723572673 This is on Mac mini G4, 32-bit, ~r302710 12.0-CURRENT. --=20 You are receiving this mail because: You are on the CC list for the bug.= From owner-freebsd-ppc@freebsd.org Mon Mar 4 17:36:05 2019 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 701DC151BA19 for ; Mon, 4 Mar 2019 17:36:05 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2610:1c1:1:6074::16:84]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "freefall.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 104E37495A for ; Mon, 4 Mar 2019 17:36:05 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: by freefall.freebsd.org (Postfix) id C9371EFD2; Mon, 4 Mar 2019 17:36:04 +0000 (UTC) Delivered-To: powerpc@localmail.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [96.47.72.80]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (Client CN "mx1.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by freefall.freebsd.org (Postfix) with ESMTPS id C5360EFD1 for ; Mon, 4 Mar 2019 17:36:04 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.ysv.freebsd.org (mxrelay.ysv.freebsd.org [IPv6:2001:1900:2254:206a::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.ysv.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 70F9174954 for ; Mon, 4 Mar 2019 17:36:04 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.ysv.freebsd.org (Postfix) with ESMTPS id B8D34F568 for ; Mon, 4 Mar 2019 17:36:03 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id x24Ha3OP090560 for ; Mon, 4 Mar 2019 17:36:03 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id x24Ha3UY090559 for powerpc@FreeBSD.org; Mon, 4 Mar 2019 17:36:03 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: powerpc@FreeBSD.org Subject: [Bug 236188] devel/boost-libs and BOOST_MATH_NO_LONG_DOUBLE_MATH_FUNCTIONS Date: Mon, 04 Mar 2019 17:36:03 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Ports & Packages X-Bugzilla-Component: Individual Port(s) X-Bugzilla-Version: Latest X-Bugzilla-Keywords: needs-patch, needs-qa X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: dclarke@blastwave.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: office@FreeBSD.org X-Bugzilla-Flags: maintainer-feedback? X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-Rspamd-Queue-Id: 104E37495A X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [-2.98 / 15.00]; local_wl_from(0.00)[freebsd.org]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; NEURAL_HAM_SHORT(-0.98)[-0.984,0]; ASN(0.00)[asn:11403, ipnet:2610:1c1:1::/48, country:US]; NEURAL_HAM_LONG(-1.00)[-1.000,0] X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Mar 2019 17:36:05 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D236188 Dennis Clarke changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |dclarke@blastwave.org --- Comment #4 from Dennis Clarke --- (In reply to Alexey Dokuchaev from comment #3) I have 'current' head at the moment here and would have to swap around hard disks and re-install 12-RELEASE to test.=20 hydra# uname -a=20 FreeBSD hydra 13.0-CURRENT FreeBSD 13.0-CURRENT r344744 GENERIC powerpc I'll make up a post-it note todo list for this. --=20 You are receiving this mail because: You are on the CC list for the bug.= From owner-freebsd-ppc@freebsd.org Mon Mar 4 18:17:26 2019 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 75BCC151DB25; Mon, 4 Mar 2019 18:17:26 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246]) by mx1.freebsd.org (Postfix) with ESMTP id 6A52577842; Mon, 4 Mar 2019 18:17:25 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 41B5B43A329; Tue, 5 Mar 2019 05:17:15 +1100 (AEDT) Date: Tue, 5 Mar 2019 05:17:14 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Konstantin Belousov cc: Bruce Evans , Mark Millard , freebsd-hackers Hackers , FreeBSD PowerPC ML Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed] In-Reply-To: <20190304114150.GM68879@kib.kiev.ua> Message-ID: <20190305031010.I4610@besplex.bde.org> References: <20190301194217.GB68879@kib.kiev.ua> <20190302071425.G5025@besplex.bde.org> <20190302105140.GC68879@kib.kiev.ua> <20190302225513.W3408@besplex.bde.org> <20190302142521.GE68879@kib.kiev.ua> <20190303041441.V4781@besplex.bde.org> <20190303111931.GI68879@kib.kiev.ua> <20190303223100.B3572@besplex.bde.org> <20190303161635.GJ68879@kib.kiev.ua> <20190304043416.V5640@besplex.bde.org> <20190304114150.GM68879@kib.kiev.ua> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=P6RKvmIu c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=2apI1eGbhsv_kSbrP38A:9 a=CjuIK1q_8ugA:10 X-Rspamd-Queue-Id: 6A52577842 X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [-6.99 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; NEURAL_HAM_SHORT(-0.99)[-0.994,0]; REPLY(-4.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0] X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Mar 2019 18:17:26 -0000 On Mon, 4 Mar 2019, Konstantin Belousov wrote: > On Mon, Mar 04, 2019 at 05:29:48AM +1100, Bruce Evans wrote: >> On Sun, 3 Mar 2019, Konstantin Belousov wrote: >> >>> On Mon, Mar 04, 2019 at 12:32:12AM +1100, Bruce Evans wrote: >>>> On Sun, 3 Mar 2019, Konstantin Belousov wrote: >>>> >>>>> On Sun, Mar 03, 2019 at 04:43:20AM +1100, Bruce Evans wrote: >>>>>> On Sat, 2 Mar 2019, Konstantin Belousov wrote: >>>>>> >>>>>>> On Sun, Mar 03, 2019 at 12:03:18AM +1100, Bruce Evans wrote: >>>>>>>> On Sat, 2 Mar 2019, Konstantin Belousov wrote: >>> * ... >>>> However, shrd in rdtsc-low (tsc_get_timecount_low()) does a slow combining >>>> step. i386 used to be faster here -- the first masking step of discarding >>>> %edx doesn't take any code. amd64 has to mask out the top bits in %rax. >>>> Now for the tsc-low pessimization, i386 has to do a slow shrd, and amd64 >>>> has to do a not so slow shr. >>> i386 cannot discard %edx after RDTSC since some bits from %edx come into >>> the timecounter value. >> >> These bits are part of the tsc-low pessimization. The shift count should >> always be 1, giving a TSC frequency of > INT32_MAX (usually) and > UINT32_MAX >> sometimes. >> >> When tsc-low was new, the shift count was often larger (as much as 8), >> and it is still changeable by a read-only tunable, but now it is 1 in >> almost all cases. The code only limits the timecounter frequency >> to UINT_MAX, except the tunable defaults to 1 so average CPUs running >> at nearly 4 GHz are usually limited to about 2 GHz. The comment about >> this UINT_MAX doesn't match the code. The comment says int, but the >> code says UINT. >> >> All that a shoft count of 1 does is waste time to lose 1 bit of accuracy. >> This much accuracy is noise for most purposes. >> >> The tunable is fairly undocumented. Its description is "Shift to apply >> for the maximum TSC frequency". Of course, it has no effect on the TSC >> frequency. It only affects the TSC timecounter frequency. > I suspect that the shift of 1 (at least) hides cross-socket inaccuracy. > Otherwise, I think, some multi-socket machines would start showing the > detectable backward-counting bintime(). At the frequencies at 4GHz and > above (Intel has 5Ghz part numbers) I do not think that stability of > 100MHz crystall and on-board traces is enough to avoid that. I think it is just a kludge that reduced the problem before it was fixed properly using fences. Cross-socket latency is over 100 cycles according to jhb's tscskew benchmark: on Haswell 4x2: CPU | TSC skew (min/avg/max/stddev) ----+------------------------------ 0 | 0 0 0 0.000 1 | 24 49 84 14.353 2 | 164 243 308 47.811 3 | 164 238 312 47.242 4 | 168 242 332 49.593 5 | 168 243 324 48.722 6 | 172 242 320 52.596 7 | 172 240 316 53.014 freefall is similar. Latency is apparently measured relative to CPU 0. It is much lower to CPU 1 since that is on the same core. I played with this program a lot 3 and a half years ago, but forgot mist of what I learned :-(. I tried different fencing in it. This seems to make little difference when the program is rerun. With the default TESTS = 1024, the min skew sometimes goes negative on freefall, but with TESTS = 1024000 that doesn't happen. This is the opposite of what I would expect. freefall has load average about 1. Removing the only fencing in it reduces average latency by 10-20 cycles and minimum latency by over 100 cycles, except on freefall it is reduced from 33 to 6. On Haswell it is 24 with fencing and I didn't test it with no fencing. I think tscskew doesn't really measure tsc skew. What it measures is the time taken for a locking protocol, using the TSCs on different CPUs to make the start and end timestamps. If the TSCs have a lot of skew or jitter, then this will show up indirectly as inconsistent and possibly negative differences. A shift of just 1 can't hide latencies of hundreds of cycles on single- socket machines. Even a shift of 8 only works sometimes, by reducing the chance of observing the TSC going backwards by a factor of 256. E.g., assume for simplicity that all instructions and IPCs take 0-1 cycles, and that unfenced rdtsc's differ by at most +-5 cycles (with the 11 values between -5 and 5 uniformly distributed. Then with a shift of 0 and no fences, a CPU that updates the timehands is ahead of another CPU that spins reading the timehands about 5/11 of the time. With a shift of 8, the CPUs are close enough when the first one reads at least 5 above and at least 5 below a 256-boundary. The chance of seeing a negative difference is reduced by at least a factor of 10/256. > I suspect that the shift of 1 (at least) hides cross-socket inaccuracy. > Otherwise, I think, some multi-socket machines would start showing the > detectable backward-counting bintime(). At the frequencies at 4GHz and > above (Intel has 5Ghz part numbers) I do not think that stability of > 100MHz crystall and on-board traces is enough to avoid that. Why would losing just 1 bit fix that? Fences for rdtsc of course only serialize it for the CPU that runs it. The locking (ordering) protocol (for the generation count) orders the CPUs too. It takes longer than we would like, much more than the 1- cycle error that might be hidden by ignoring the low bit. Surely the ordering protocol must work across sockets? It then gives ordering of rdtsc's. TSC-low was added in 2011. That was long before the ordering was fixed. You added fences in 2012 and memory ordering for the generation count in 2016. Fences slowed everything down by 10-20+ cycles and probably hide bugs in the memory ordering better than TSC-low. Memory ordering plus fences slow down the cross-core case by more than 100 cycles according to tscskew. That is enough to hide large hardware bugs. > We can try to set the tsc-low shift count to 0 (but keep lfence) and see > what is going on in HEAD, but I am afraid that the HEAD users population > is not representative enough to catch the issue with the certainity. > More, it is unclear to me how to diagnose the cause, e.g. I would expect > the sleeps to hang on timeouts, as was reported from the very beginning > of this thread. How would we root-cause it ? Negative time differences cause lots of overflows so break the timecounter. The fix under discussion actually gives larger overflows in the positive direction. E.g., a delta of -1 first overflows to 0xffffffff. The fix prevents overflow on multiplication by that. When the timecounter frequency is small, say 1 MHz, 0xffffffff means 4294 seconds, so the timecounter advances by that. >>> amd64 cannot either, but amd64 does not need to mask out top bits in %rax, >>> since the whole shrdl calculation occurs in 32bit registers, and the result >>> is in %rax where top word is cleared by shrdl instruction automatically. >>> But the clearing is not required since result is unsigned int anyway. >>> >>> Dissassemble of tsc_get_timecount_low() is very clear: >>> 0xffffffff806767e4 <+4>: mov 0x30(%rdi),%ecx >>> 0xffffffff806767e7 <+7>: rdtsc >>> 0xffffffff806767e9 <+9>: shrd %cl,%edx,%eax >>> ... >>> 0xffffffff806767ed <+13>: retq >>> (I removed frame manipulations). I checked that all compilers still produce horrible code for the better source code 'return (rdtsc() << (intptr_t)tc->tc_priv);'. 64-bit shifts are apparently pessimal for compatibility. The above is written mostly in asm to avoid 2-5 extra instructions. >>>> ... >>>> Similarly in bintime(). >>> I merged two functions, finally. Having to copy the same code is too >>> annoying for this change. I strongly disklike the merge. >>> So I verified that: >>> - there is no 64bit multiplication in the generated code, for i386 both >>> for clang 7.0 and gcc 8.3; >>> - that everything is inlined, the only call from bintime/binuptime is >>> the indirect call to get the timecounter value. >> >> I will have to fix it for compilers that I use. > Ok, I will add __inline. That will make it fast enough, but still hard to read. >>> + *bt = *bts; >>> + scale = th->th_scale; >>> + delta = tc_delta(th); >>> +#ifdef _LP64 >>> + if (__predict_false(th->th_large_delta <= delta)) { >>> + /* Avoid overflow for scale * delta. */ >>> + bintime_helper(bt, scale, delta); >>> + bintime_addx(bt, (scale & 0xffffffff) * delta); >>> + } else { >>> + bintime_addx(bt, scale * delta); >>> + } >>> +#else >>> + /* >>> + * Use bintime_helper() unconditionally, since the fast >>> + * path in the above method is not so fast here, since >>> + * the 64 x 32 -> 64 bit multiplication is usually not >>> + * available in hardware and emulating it using 2 >>> + * 32 x 32 -> 64 bit multiplications uses code much >>> + * like that in bintime_helper(). >>> + */ >>> + bintime_helper(bt, scale, delta); >>> + bintime_addx(bt, (uint64_t)(uint32_t)scale * delta); >>> +#endif >> >> Check that this method is really better. Without this, the complicated >> part is about half as large and duplicating it is smaller than this >> version. > Better in what sence ? I am fine with the C code, and asm code looks > good. Better in terms of actually running significantly faster. I fear the 32-bit method is actually slightly slower for the fast path. >>> - do { >>> - th = timehands; >>> - gen = atomic_load_acq_int(&th->th_generation); >>> - *bt = th->th_bintime; >>> - bintime_addx(bt, th->th_scale * tc_delta(th)); >>> - atomic_thread_fence_acq(); >>> - } while (gen == 0 || gen != th->th_generation); >> >> Duplicating this loop is much better than obfuscating it using inline >> functions. This loop was almost duplicated (except for the delta >> calculation) in no less than 17 functions in kern_tc.c (9 tc ones and >> 8 fflock ones). Now it is only duplicated 16 times. > How did you counted the 16 ? I can see only 4 instances in the unpatched > kern_tc.c, and 3 in patched, but it is 3 and not 1 only because I do not > touch ffclock until the patch is finalized. After that, it would be > 1 instance for kernel and 1 for userspace. Grep for the end condition in this loop. There are actually 20 of these. I'm counting the loops and not the previously-simple scaling operation in it. The scaling is indeed only done for 4 cases. I prefer the 20 duplications (except I only want about 6 of the functions). Duplication works even better for only 4 cases. This should be written as a function call to 1 new function to replace the line with the overflowing multiplication. The line is always the same, so the new function call can look like bintime_xxx(bt, th). Bruce From owner-freebsd-ppc@freebsd.org Mon Mar 4 18:41:30 2019 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 91358151E80F for ; Mon, 4 Mar 2019 18:41:30 +0000 (UTC) (envelope-from dclarke@blastwave.org) Received: from atl4mhfb04.myregisteredsite.com (atl4mhfb04.myregisteredsite.com [209.17.115.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id B35F1810DD for ; Mon, 4 Mar 2019 18:41:27 +0000 (UTC) (envelope-from dclarke@blastwave.org) Received: from atl4mhob23.registeredsite.com (atl4mhob23.registeredsite.com [209.17.115.117]) by atl4mhfb04.myregisteredsite.com (8.14.4/8.14.4) with ESMTP id x24IetSD011136 for ; Mon, 4 Mar 2019 13:40:55 -0500 Received: from mailpod.hostingplatform.com (atl4qobmail02pod2.registeredsite.com [10.30.77.36]) by atl4mhob23.registeredsite.com (8.14.4/8.14.4) with ESMTP id x24Iemtv047408 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL) for ; Mon, 4 Mar 2019 13:40:49 -0500 Received: (qmail 19035 invoked by uid 0); 4 Mar 2019 18:40:48 -0000 X-TCPREMOTEIP: 174.118.245.214 X-Authenticated-UID: dclarke@blastwave.org Received: from unknown (HELO ?172.16.35.3?) (dclarke@blastwave.org@174.118.245.214) by 0 with ESMTPA; 4 Mar 2019 18:40:48 -0000 To: FreeBSD PowerPC ML From: Dennis Clarke Subject: r344744 ppc64 still needs kern.smp.disabled=1 Openpgp: preference=signencrypt Autocrypt: addr=dclarke@blastwave.org; keydata= mQINBFxoSrYBEAC1M5KicBVclSHf6d81rxTQYgFhIMhNxekNQgNsB39lCWcq3zSZi75Rflb0 Q74b+lIjBi7a5XygweXgFINPNVLpknrG8y7jA/8jrKqVy5qQ/7Mw/uVou4culndNOkXwNyW9 WTNoAzAtKlDEmzIX/pfaqrulAP8se3ci9vqXInIHpRHZithrrvAsWQWuhC200PYvBlA/Vmv6 3UxV26LVa1uNYgJSgiBbCI9VTv14YSnFRG6WWXTRmVksJMiNY7fZnKGNhFkrcnGxVqVKnCgj enG67ms6uwzhkfa/F1C3BPljb5WcApJwph/Iaq+7EpVD6DmE1xYP6pgqFX4yW5MVRMn6XaIR rbkP90CodrCOTedyrB1E7N8xNZKX+sUwWBnfqv7n8rBGnlNzo2GOBHVxqw7EGYoQItlHDmhx deOOgq6VmmL1kZn4D+5BLUw/w2SljDqXpdF/Gnm3WXGe+ooBGcoMXeiqv+4PM5k11CIBLjRK p2cD51upwccFILPDF8Wipy8t6Oc+ToLz80zb5kiBR9dggORbPr4WHCt7VS4s24mAX7wBQ/EB ePRUykvES3WJLuRBdFAPtXBc9m/q0gzU9iPx3eIm8u2SbO7kUMBESexeBpJ8cIfJ7/LX2LV8 UoWxfJieklheUPZtOA06pyMcb37/A/HZNMOUYh83TKVCnv7FxwARAQABtCVEZW5uaXMgQ2xh cmtlIDxkY2xhcmtlQGJsYXN0d2F2ZS5vcmc+iQJOBBMBCAA4FiEE1j0Rv6qd1s9jGqtWj5Fg Cl9xztwFAlxoSrYCGwMFCwkIBwIGFQoJCAsCBBYCAwECHgECF4AACgkQj5FgCl9xztz1Bg/+ KIyWqzrWfTexJ0+9S0EhCNwkb8aCaGKde+dqiqTFFobS5UWphhAtMtLnU4tZG2K+GPIBnMpC 6tC5gxB4TppgcGzqRNle4CjY4Lt7SQs23V+hbTZJLDwlBWbbuqDIvkNiO1pFuaHGNJVYaQ5y qlm156/Y+GmarfVGbVhjelRq3DjDwTcdo1J36UUo3GS8/g1uXX84Va71nAeyivtzwNbU18F4 Bcbmo7fMS0nBUmEqJJWftjmz2ihP1opz2HOEzv9q7uU8q3yfg1pweT8Zscx+Y5dtUd3d4dRL iXJxm2Z2dVcWabMmlhOnLqhPaf39WjKkxr2mHiYN2sUJ5S6yKUM6HKVM7ZE/1HRYo1OZgsEC PQka65hK36ezldtQplKcGlG7DjIW3Vi1BK6o70/7Hvdyfqdeft3qY1bs8BcHfNyan/DBGgTe 34eGnqqU+YY0mRTCpukbC2/MYYEYdeS9/RYiwCf1Tn8x232iVpX6wYx8+L8Nb3QEkTNM3VP0 ArAoF1EE9RZ2jLBV9g+vKRRiatPN8pGMv9on0pO6HhAp19Db4owW/pcgsAXsLS/mjjkxo1Br Gu0shJZ6o6SqDfMpfdNyUVdzvAgAUwWtdSXlgXpn6oCn7B7YhEkj+jQ9p8Y398o9YAybe70v 7GLkZqcPkCv9GQ3Cw5a+i/FNm4JCDeD99ZC5Ag0EXGhKtgEQAMZCBzuT2z/PWurlNcc/ChFy 4sRHrDXL/pwGOy9Ue0s/busdKxPWomOMbFA4PIILaxrT0L1w6xb1Svj2CgYbhSDsW12SdqsA C5MrqQi/j5S/H4rEsZt8nsSbSx6JF+tP5x0i14zG2GXv7+DjxrDMfFThejeEeIcHU//Ip1MQ CF7uGv4ug3WUSKHR7wVTceq5T3oR9kLguszBhavyJZrYte6r0TDG0GdFAGQMAau4FcHsOHyf 46Gx66rGoWmgH+938kodF71d7a0FXpUUI9RAhL1MepR78QkyjGTocBKRbrcXZPO8ya9/Tcmp fRxlJNeMM9TQKND3GYSzZrsYWdmXPdx18R0rzfBOCdDPUjVJhcV9AbeH4EApDPxjDSADQ0X9 SmSoMd27MjU8rFG+Mfu0gbK/OG4kPga/2MO5lU3sublv0PMYcsQqYOcqSBDxBdkAZMDFt376 lCSxau0Ijj2bb49ippjjH6gQU5iA6ASLSFN8AWs80dVeIUt964RAc/XY8QAW621Qe6OaSqh3 M+Umdf38Cc6qySjphSEF6i+YQ1FlbmK09yyEEpDuaFejgRXXaMxj6sF+b/g4JTqxlHDEc9Nd 8+L/zrtPkUXWAss9a8jtm5hGquc37EjyZyLr+35dtyEJBJ2o0G9Len2F9+mfDdRRKJAiqqLL 3JxHKFTZ4cShABEBAAGJAjYEGAEIACAWIQTWPRG/qp3Wz2Maq1aPkWAKX3HO3AUCXGhKtgIb DAAKCRCPkWAKX3HO3MYdEACW614cKJJT9/M2wPyYecKj+KR5tv+oTdGdcZl87mG47XWn4fKI kpyTR9EGVHGbSbrCyG8qMvz+vhe+Aj9SbJ4ccr+1KIaNkBcACOSJdU2UC2sqOBxckki0ArbB ds3efHBaAEKCZv4Qfj5sHILLkImaCtR+FjvP0fr5ankJkbOeucqgxPmkKJxFBgiotWQxPp59 Sl5uzNGeLPBmkleYQMQFAOK6Yhrgsh35AmYNgNoPR6KWsfaIh9BPgEOOxc3Zl99fsZogbt1U 2YUj7L0nCa5s1AMTftZDTBsqZyotDO8/TpwSEC0EOHvcg/GAj+ocMgVPTHaTrgCV2Yy2lCVG u1Mu2T7zsCRMDJNvhC7LA3Qo8Fdc7SFJekr7TllTWB4mbQyYj9/vjQINxoKZV6v7Yfw/rYcm xY2fVsSdxZFvDIM/VRryQpoqzPv9YQrDVWDEb139NtvrNEeUnIXv+cRBKFMBxQ0PIHDkwNAb cmXY5/R58QiqnGE23je0WQNg+iBrbJN9P7inp178m6j9SFor+5pW567vYakASRQn5GPqHqt9 fRQvz5E3aa8xDscR6Gs9HQAhsA5kDqvH/XxQRD7Y1jG9T73WMlS6j928qHfMwQ6EvNuIQwqN PToVd6cMhrTJKE5gUVLVs9Oa81zr/5pNCKJ9upm6cU349JNDO/SDKSTtLA== Message-ID: <3a521768-278e-2530-da57-0c31f953b6de@blastwave.org> Date: Mon, 4 Mar 2019 13:40:40 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.5.1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: B35F1810DD X-Spamd-Bar: +++++ Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [5.16 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCVD_COUNT_FIVE(0.00)[5]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_SPAM_SHORT(0.98)[0.976,0]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-ppc@freebsd.org]; DMARC_NA(0.00)[blastwave.org]; AUTH_NA(1.00)[]; RCPT_COUNT_ONE(0.00)[1]; NEURAL_SPAM_MEDIUM(1.00)[1.000,0]; RCVD_TLS_LAST(0.00)[]; TO_DN_ALL(0.00)[]; MX_GOOD(-0.01)[mx1.netsolmail.net]; NEURAL_SPAM_LONG(1.00)[1.000,0]; RCVD_IN_DNSWL_NONE(0.00)[120.115.17.209.list.dnswl.org : 127.0.5.0]; R_SPF_NA(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:19871, ipnet:209.17.112.0/21, country:US]; MID_RHS_MATCH_FROM(0.00)[]; IP_SCORE(1.30)[ip: (4.03), ipnet: 209.17.112.0/21(1.40), asn: 19871(1.12), country: US(-0.07)] X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Mar 2019 18:41:30 -0000 Merely a note that r344744 hangs at starting CPU3 message and then the fans roar as usual. Of course usefdt=1 is being used here also. If there is a svn co that I can do from some other hackary location :-) then I would be happy to build and test however current head doesn't allow for all cores to come online. -- Dennis Clarke RISC-V/SPARC/PPC/ARM/CISC UNIX and Linux spoken GreyBeard and suspenders optional From owner-freebsd-ppc@freebsd.org Mon Mar 4 20:58:26 2019 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 69C361522B62 for ; Mon, 4 Mar 2019 20:58:26 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic317-34.consmr.mail.ne1.yahoo.com (sonic317-34.consmr.mail.ne1.yahoo.com [66.163.184.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 18517879B2 for ; Mon, 4 Mar 2019 20:58:24 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: Av7ZzPMVM1l7UlZBbiUW8idImHi0nHz4ITc3Jyo21R27rl_HGqXYs41UrIuhVHR BH0YrPEQ9Fm0UeS6UuNHdJ9Enbwsx0MyVCLaZVgg10n5hzowe3_n3FiKUbWi30GFxlIWSKPOufWa mzYT6yxrjrYX3C9HQZYWM6M.87.FH1lGBa8CkzSIj0tCwJB4UUcwg9OWmwT6aInR7bK_.qMYiQHl vZL5v6PAzqhAqvTRQcTrUh4lHaVYj2.wbhjMSG9Cgz7xRHTEzK1ERYYOZaAkCrAlXQNj1wC7iVvh PUJDJ_2iurCK0M_HL_LcDOEf5P3pwH9HhBHq8wIp4Tx1VrhHmtgRM0F61A0NdPP.7nBw1hg3HKrj RggP9i6Tjb.NvwL1o_lmo8pqT3AoNkomzm66.H1HqzyHh7BeXG8RN7miZOY6_nhzZ4geXrU6l4tI 3_ypiTtgepCy1ogdW.BqjFwG7Ds0o1OZ5ESDA0cAkkPgjA.i_6PLz_pfcSmFvB42iA4mK_ORiry5 A82qjPSCJWJmlFH2BV2FOiFbhMSSgaB85Sq2XoqAYnQwba0aVZRL_vo0xAU_GKqrgDRC8HEcZaeY QSXAPNKSjXdv8UzKdFi37fVCwPn8GpYXkyZA7tpiCsTO1oN6LgSlC3MYxz73qCkqaaWSLHoBLFqv vsHoau68oZDuuN8_preWIrBlzxOEMqTfJ3AzaItnQubExOZHzVcvO_ZBxZIj.dBKzuSO_JAGR87V Rtma0p_xcGI9pUKysOMuWUUdqaLReL_YOBCgaMSpFVupNAc3SKRdbkU2jb2StJ_vnyGPhUwNajjW aqKitxLmSrKAGNyRH0Hz.eKfhw1YiwlVgq37FZE3VPgKSB6KBRfQbsj7MMgZWvthI.OOX7DvoUC8 yQfCKPczTar5VcW0S1BeZw_Vn6nFGx.Fj9w8Nb1IgGckb1GBH64.2tcLxo3bFQJsOdJdZIAkhwgd s8g7C4dbBozuTbGFs8w_7a.W7IKHEMrcobqYm1O5pdyTI7TRDJAksNSnEDtD3Dhl3Qme6dVg3tdZ igPjHHwaTKd853_QYquCMKOfTPILL8VEgxMpgRJA3Vv90BpZUvARBYhpK8ETy6ja0Els- Received: from sonic.gate.mail.ne1.yahoo.com by sonic317.consmr.mail.ne1.yahoo.com with HTTP; Mon, 4 Mar 2019 20:58:23 +0000 Received: from c-67-170-167-181.hsd1.or.comcast.net (EHLO [192.168.1.113]) ([67.170.167.181]) by smtp417.mail.ne1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA ID d6e36c6b74fad01663fd179bddcbc796; Mon, 04 Mar 2019 20:58:16 +0000 (UTC) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\)) Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed] From: Mark Millard In-Reply-To: <20190305031010.I4610@besplex.bde.org> Date: Mon, 4 Mar 2019 12:58:14 -0800 Cc: Konstantin Belousov , freebsd-hackers Hackers , FreeBSD PowerPC ML Content-Transfer-Encoding: 7bit Message-Id: References: <20190301194217.GB68879@kib.kiev.ua> <20190302071425.G5025@besplex.bde.org> <20190302105140.GC68879@kib.kiev.ua> <20190302225513.W3408@besplex.bde.org> <20190302142521.GE68879@kib.kiev.ua> <20190303041441.V4781@besplex.bde.org> <20190303111931.GI68879@kib.kiev.ua> <20190303223100.B3572@besplex.bde.org> <20190303161635.GJ68879@kib.kiev.ua> <20190304043416.V5640@besplex.bde.org> <20190304114150.GM68879@kib.kiev.ua> <20190305031010.I4610@besplex.bde.org> To: Bruce Evans X-Mailer: Apple Mail (2.3445.102.3) X-Rspamd-Queue-Id: 18517879B2 X-Spamd-Bar: ++ X-Spamd-Result: default: False [2.06 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; MV_CASE(0.50)[]; FREEMAIL_FROM(0.00)[yahoo.com]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_ALL(0.00)[]; MX_GOOD(-0.01)[cached: mta6.am0.yahoodns.net]; DKIM_TRACE(0.00)[yahoo.com:+]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; FREEMAIL_TO(0.00)[optusnet.com.au]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_LAST(0.00)[]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36646, ipnet:66.163.184.0/21, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com.dwl.dnswl.org : 127.0.5.0]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; NEURAL_SPAM_SHORT(0.62)[0.622,0]; MIME_GOOD(-0.10)[text/plain]; IP_SCORE(1.22)[ip: (3.86), ipnet: 66.163.184.0/21(1.29), asn: 36646(1.03), country: US(-0.07)]; NEURAL_SPAM_MEDIUM(0.13)[0.130,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_SPAM_LONG(0.60)[0.598,0]; RCVD_IN_DNSWL_NONE(0.00)[45.184.163.66.list.dnswl.org : 127.0.5.0]; RWL_MAILSPIKE_POSSIBLE(0.00)[45.184.163.66.rep.mailspike.net : 127.0.0.17]; FREEMAIL_CC(0.00)[gmail.com] X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Mar 2019 20:58:26 -0000 On 2019-Mar-4, at 10:17, Bruce Evans wrote: >> . . . > > I think it is just a kludge that reduced the problem before it was fixed > properly using fences. > > Cross-socket latency is over 100 cycles according to jhb's tscskew > benchmark: on Haswell 4x2: > > CPU | TSC skew (min/avg/max/stddev) > ----+------------------------------ > 0 | 0 0 0 0.000 > 1 | 24 49 84 14.353 > 2 | 164 243 308 47.811 > 3 | 164 238 312 47.242 > 4 | 168 242 332 49.593 > 5 | 168 243 324 48.722 > 6 | 172 242 320 52.596 > 7 | 172 240 316 53.014 > > freefall is similar. Latency is apparently measured relative to CPU 0. > It is much lower to CPU 1 since that is on the same core. > You may want to look at: https://lists.freebsd.org/pipermail/freebsd-hackers/2019-March/054218.html for cruder, but somewhat related, information for the old Powermac G5 2-socket with 2 cores each, given how FreeBSD tries to synchronize the tbr's across cores as it starts up the CPUs. It may give some idea of a ball-park scale involved for such context, especially the reports of what happened for varying one figure in the source code. As stands, I've only done the experiments with a debug kernel build. I built using devel/powerpc64-xtoolchain-gcc related infrastructure, not gcc 4.2.1 . (This is typical for me.) === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) From owner-freebsd-ppc@freebsd.org Mon Mar 4 22:07:05 2019 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 139911524B88 for ; Mon, 4 Mar 2019 22:07:05 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic306-21.consmr.mail.gq1.yahoo.com (sonic306-21.consmr.mail.gq1.yahoo.com [98.137.68.84]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 286388B033 for ; Mon, 4 Mar 2019 22:07:04 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: Jpi66lcVM1mPId35a4V4bN1l9XpvzO2Th9UP.uvhdY564hu5GhrsKb5w8uac59g 6dWAf7ei4IoPBRNtNuhvW6coNSa8NaczbYxWI9Cr4rW38AZ6nYfi7ZqWfvjmug3_8ZB89OXlojfI UlS5l2sNZJdo6Z2yytsZ_YpiTVjaPLCaSHR6fId_6LRsMZ4acV7k5mbP2Ee07ziNp02.JvLfAnVg PZAkx9pmY6tKgEHyXGaFKpSnPXH2zVoEYpmUHGt71e69v3DwRUyBJjUOl57YAFHcbfNEpy_xmdMS neA49nIcD7bWgjdQYJJwsEErlXL9fLqLG1kMmKzJjL3saWDKW6b7YAbBmkIazWlgb0Mm4i9wvJXU CmlgS7F2bVf7yceXstW8T5IFYo9pmH9KqQmMK23cpT0imwrGe4itD7mNrtEYsWS297zc3DvN1UdI DyRrkQMgvwIUuUTvUUzOkztBgynmXGGgT1CHEnpMquup4Lb2.f2Rd1dFb5j7Yj8sdY2IwSFPBs_k Xt3wddZvSLq_8Q9B5vgGOgpNPGILoryb4Z6gVYwHolCVz8BU8NnBOL9cNp0P3dGfW4PYK2raCwdx quwmFzr2jPAadSOLv4EP15R.Llpa3DYqn2_3szhLhmx3oCiGvS9do5NF.YyHMqTCTbGrXR42oZSZ QkBQfFfmPZOraxz0AgeuWkYC3cbuy.gfXc6wsX7e_4PnPh8Gyu_OUiWLjurxD8CewXrLHL7.u9ws Ea2Rf0dAs8hIB.UrEtg3wy76GmP2ABdP92NLYywSJZuJ5QnvtS6DM2vELJGSOcm4BKX6P9kOQ3Mt BCqxb7wtSowF56kTSUzeOksGADqrWV.ojcpkR2jllUwhiA9Vft_fPr9KWOQzgNHr6WnqGR3tOLPi jL_Cb63DFgL0zqPQUXi33mah56ZvDWCrFmirZvQG0Jv00Q3_8zyPP6zUiEz3Dpha_8suMKC1min1 GuW1EwXCR0tTFi9._mBvZ_HT.2Nds1kRPacadImqUl51JS1QWUYxV4KlIg.sBBhXb4x.P8y72bZc 8cK7XeqoeBS.THsa0yLeYCa_J0LQZZcvByAOiRQjM3PE0mXL.kff9ka__ Received: from sonic.gate.mail.ne1.yahoo.com by sonic306.consmr.mail.gq1.yahoo.com with HTTP; Mon, 4 Mar 2019 22:06:56 +0000 Received: from c-67-170-167-181.hsd1.or.comcast.net (EHLO [192.168.1.113]) ([67.170.167.181]) by smtp405.mail.gq1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA ID add0220742a25add56059db18fccd464; Mon, 04 Mar 2019 22:06:51 +0000 (UTC) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\)) Subject: Re: r344744 ppc64 still needs kern.smp.disabled=1 From: Mark Millard In-Reply-To: <3a521768-278e-2530-da57-0c31f953b6de@blastwave.org> Date: Mon, 4 Mar 2019 14:06:50 -0800 Cc: FreeBSD PowerPC ML Content-Transfer-Encoding: 7bit Message-Id: References: <3a521768-278e-2530-da57-0c31f953b6de@blastwave.org> To: Dennis Clarke X-Mailer: Apple Mail (2.3445.102.3) X-Rspamd-Queue-Id: 286388B033 X-Spamd-Bar: ++++ X-Spamd-Result: default: False [4.18 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; MV_CASE(0.50)[]; FREEMAIL_FROM(0.00)[yahoo.com]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; MX_GOOD(-0.01)[cached: mta6.am0.yahoodns.net]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/21, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com.dwl.dnswl.org : 127.0.5.0]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; FROM_HAS_DN(0.00)[]; NEURAL_SPAM_SHORT(0.96)[0.964,0]; MIME_GOOD(-0.10)[text/plain]; IP_SCORE(2.13)[ip: (8.91), ipnet: 98.137.64.0/21(1.01), asn: 36647(0.81), country: US(-0.07)]; NEURAL_SPAM_MEDIUM(0.65)[0.655,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_SPAM_LONG(0.94)[0.938,0]; RCVD_IN_DNSWL_NONE(0.00)[84.68.137.98.list.dnswl.org : 127.0.5.0] X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Mar 2019 22:07:05 -0000 On 2019-Mar-4, at 10:40, Dennis Clarke wrote: > Merely a note that r344744 hangs at starting CPU3 message and then the > fans roar as usual. Of course usefdt=1 is being used here also. > > If there is a svn co that I can do from some other hackary location :-) > then I would be happy to build and test however current head doesn't > allow for all cores to come online. > Justin H. has said he will work on the issue. My hack to head -r344018 is: Index: /usr/src/sys/powerpc/aim/slb.c =================================================================== --- /usr/src/sys/powerpc/aim/slb.c (revision 344018) +++ /usr/src/sys/powerpc/aim/slb.c (working copy) @@ -464,6 +464,28 @@ critical_exit(); } +void hack_into_slb_if_needed(void* vap); // HACK!!! +void hack_into_slb_if_needed(void* vap) // HACK!!! +{ // HACK!!! + struct slb *cache= PCPU_GET(aim.slb); + vm_offset_t va= (vm_offset_t)vap; + uint64_t slbv= kernel_va_to_slbv(va); + uint64_t esid= va>>ADDR_SR_SHFT; + uint64_t slbe= (esid<pc_curthread = pcpup->pc_idlethread; + #ifdef __powerpc64__ __asm __volatile("mr 13,%0" :: "r"(pcpup->pc_curthread)); #else __asm __volatile("mr 2,%0" :: "r"(pcpup->pc_curthread)); #endif + pcpup->pc_curpcb = pcpup->pc_curthread->td_pcb; + + hack_into_slb_if_needed(pcpup->pc_curpcb); // HACK!!! + sp = pcpup->pc_curpcb->pcb_sp; return (sp); But it is still possible for a replacement of the slb entry after slb_insert_kernel(slbe,slbv) but before pcpup->pc_curpcb->pcb_sp . Still, the hack makes booting far more reliable than the original code. I no longer revert the VM_MAX_KERNEL_ADDRESS value and instead use the above. I also have a hack for the stuck-sleeping problems seen for buf*daeomon* threads, pmac_thermal, and the like. I've been able to run a more than 24 hour test with no sleep hang-ups, where the test included an over 7 hour buildworld buildkernel. This was with a debug version of the kernel. I've not tested non-debug yet for the stuck-sleeping issue. I've also observed no problems across a fair number of reboots. But, like the racy slb replacement issue, what I've done may not guarantee to never fail: a constant might have to be increased, for example. (Several somewhat smaller constant values were observed to be not sufficient when tested.) There is a significant effort going on by a few folks that are looking at improving the official code that is related to the stuck-sleeping issue, for all architectures. Hopefully, multi-socket PowerMac G5's will not need any hacks once they are done. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) From owner-freebsd-ppc@freebsd.org Tue Mar 5 03:43:14 2019 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 492DC150EE86 for ; Tue, 5 Mar 2019 03:43:14 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic316-21.consmr.mail.ne1.yahoo.com (sonic316-21.consmr.mail.ne1.yahoo.com [66.163.187.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 53D777467C for ; Tue, 5 Mar 2019 03:43:13 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: o5oIhPkVM1mMEt6LbSlq7NxZGZu0AHjxITDCEVH0.LOwXOw75sfmL9WzjuHBTWu Y_g2b.mzPgvMMz8_lfasPSmf5ZCqTW0_He_KpI.CrktFG241w6wwFAnOYbfZAes12PSGKYj24ZCU yLChdDPzs0YkWa3DHkb5wlHSyvK4CrvFpQ6MifJCw0s5aqbHX_joeiY5cogERECO.qLN3fXYhCB_ P_JHjGKfotvb7Avf6.X0em9CMVltseEtigQC78doao9ce1mlTkGAmwxlJvHRTgvpEqbUZfZaHrL8 2ixJEoeDAjCR0iT0oN0UFkSS4vi.N5ZX_Avo5oyGp.GHxFAJ5V9rrXtXvphnM0oru3RrS6c9dRY_ pc1N.RXMmHvSH7f9p5vg4zxEhku8j57SMHlDRC8BUDN3SSjBrA7ZTFYL3K52VhUjoC50_jVt7S9X aDkuRwV6X2iQs6L89zEqJgc2la9_Tp_u5_A73UHGDvIfxx0R1aVmk5gg3oeeOBpuwSDKL9Ygm3rQ aFD1VgCRqD.xqt6FoFk.bOZDuZPbKn_hUuiz3SYPHzthBaoDmVuPCL7ONcTKDni.sOgtHKOku3Wk Q69F4Mgc_HSHuX7htrmT_yswRdF69Z8FFv7A5u7rlxPelDY8DCU4TzcCyRDOL8kdbRmD7DV4Akng z7e_sCrCwy34TGVdArm9RXaEfrkRrVbv5FENEPbQFGZlK3q1M60TMj4J2cTErISlpL779nu3LcZ5 KfKMhxWyxePPk4An7y2vtd40.x_YUT4Pw9VQrwvSS7.tA7sX35H2gUSckxpDxRVXaY51ExbN2WwV IOm_.PHECYzRWgJo5HBPTtggr2lTv8JFz6JmGBaNVTHnES1Dv55QcQX1CvQ9gRaTl3ShxxOwfOid 4mmIyAaGD8bJCn3jqOSll_1vEYb9sivik1t2i6hyymLe3ePZnjaoreGjdCq4MiU5CFUdOiLDgFtU bLNqTR41.8N1w8Qfs4AuccTXwmnnX.BNK2ZE.M5VEPXiwxmvc7QXPen6Ch1r.xII.RnADkCDJwLT gsOWKZNsUaV0Jef7pSnPaZPwDetShUED.k4S6KDAoyk2Oob0- Received: from sonic.gate.mail.ne1.yahoo.com by sonic316.consmr.mail.ne1.yahoo.com with HTTP; Tue, 5 Mar 2019 03:43:11 +0000 Received: from c-67-170-167-181.hsd1.or.comcast.net (EHLO [192.168.1.115]) ([67.170.167.181]) by smtp424.mail.ne1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA ID aced078fc5d192f968d617b862f84c56 for ; Tue, 05 Mar 2019 03:43:11 +0000 (UTC) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\)) Subject: head -r344018 powerpc64 variant on Powermac G5 (2 sockets, 2 cores each): [*buffer arena] shows up more . . .? Message-Id: Date: Mon, 4 Mar 2019 19:43:09 -0800 To: FreeBSD PowerPC ML X-Mailer: Apple Mail (2.3445.102.3) X-Rspamd-Queue-Id: 53D777467C X-Spamd-Bar: ++++ X-Spamd-Result: default: False [4.19 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; MV_CASE(0.50)[]; FREEMAIL_FROM(0.00)[yahoo.com]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; MX_GOOD(-0.01)[cached: mta6.am0.yahoodns.net]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; SUBJECT_ENDS_QUESTION(1.00)[]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36646, ipnet:66.163.184.0/21, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com.dwl.dnswl.org : 127.0.5.0]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_SPAM_SHORT(0.98)[0.985,0]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-ppc@freebsd.org]; NEURAL_SPAM_MEDIUM(0.86)[0.857,0]; RCPT_COUNT_ONE(0.00)[1]; RCVD_TLS_LAST(0.00)[]; NEURAL_SPAM_LONG(0.90)[0.901,0]; RCVD_IN_DNSWL_NONE(0.00)[147.187.163.66.list.dnswl.org : 127.0.5.0]; IP_SCORE(0.95)[ip: (2.58), ipnet: 66.163.184.0/21(1.26), asn: 36646(1.00), country: US(-0.07)] X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Mar 2019 03:43:14 -0000 [It is possible that the following is tied to my hack to avoid threads ending up stuck-sleeping. But I ask about an alternative that I see in the code.] Context: using the modern powerpc64 VM_MAX_KERNEL_ADDRESS and using usefdt=3D1 on an old Powermac G5 (2 sockets, 2 cores each). Hacks are in use to provide fairly reliable booting and to avoid threads getting stuck sleeping. Before the modern VM_MAX_KERNEL_ADDRESS figure there were only 2 or 3 bufspacedaemon-* threads as I remember. Now there are 8 (plus bufdaemon and its worker), for example: root 23 0.0 0.0 0 288 - DL 15:48 0:00.39 = [bufdaemon/bufdaemon] root 23 0.0 0.0 0 288 - DL 15:48 0:00.05 = [bufdaemon/bufspaced] root 23 0.0 0.0 0 288 - DL 15:48 0:00.05 = [bufdaemon/bufspaced] root 23 0.0 0.0 0 288 - DL 15:48 0:00.05 = [bufdaemon/bufspaced] root 23 0.0 0.0 0 288 - DL 15:48 0:00.05 = [bufdaemon/bufspaced] root 23 0.0 0.0 0 288 - DL 15:48 0:00.05 = [bufdaemon/bufspaced] root 23 0.0 0.0 0 288 - DL 15:48 0:00.07 = [bufdaemon/bufspaced] root 23 0.0 0.0 0 288 - DL 15:48 0:00.05 = [bufdaemon/bufspaced] root 23 0.0 0.0 0 288 - DL 15:48 0:00.56 = [bufdaemon// worker] I'm sometimes seeing processes showing [*buffer arena] that seemed to wait for a fairly long time with that status, not something I'd seen historically for those same types of processes for a similar overall load (not much). During such times trying to create processes to look around at what is going on seems to also wait. (Probably with the same status?) /usr/src/sys/vm/vm_init.c has: /* * Allocate the buffer arena. * * Enable the quantum cache if we have more than 4 cpus. This * avoids lock contention at the expense of some fragmentation. */ size =3D (long)nbuf * BKVASIZE; kmi->buffer_sva =3D firstaddr; kmi->buffer_eva =3D kmi->buffer_sva + size; vmem_init(buffer_arena, "buffer arena", kmi->buffer_sva, size, PAGE_SIZE, (mp_ncpus > 4) ? BKVASIZE * 8 : 0, 0); firstaddr +=3D size; I wonder if the use of "BKVASIZE * 8" should track the bufspacedeamon-* thread count and not just the mp_cpus count --or if the bufspacedeamon-* thread count should track the mp_ncpus count (and so be smaller for only 4 "cpus" in FreeBSD terms.) Or may be [*buffer arena] is inherent in having: (Not from the time frame of having the [*buffer arena] showing up, not even from after such. I've not managed to see such figures during and I've not recorded any after.) real memory =3D 17134088192 (16340 MB) avail memory =3D 16385716224 (15626 MB) hw.physmem: 17134088192 hw.usermem: 15232425984 hw.realmem: 17134088192 Virtual Memory: (Total: 455052K Active: 413888K) Real Memory: (Total: 64736K Active: 62508K) Shared Virtual Memory: (Total: 56264K Active: 15232K) Shared Real Memory: (Total: 16416K Active: 14204K) Free Memory: 14022736K vm.kmem_size: 5482692608 vm.kmem_zmax: 65536 vm.kmem_size_min: 12582912 vm.kmem_size_max: 13743895347 vm.kmem_size_scale: 3 vm.kmem_map_size: 414158848 vm.kmem_map_free: 5068533760 vfs.bufspace: 1397690368 vfs.bufkvaspace: 559185920 vfs.bufmallocspace: 0 vfs.bufspacethresh: 1680538825 vfs.buffreekvacnt: 1007 vfs.bufdefragcnt: 0 vfs.buf_pager_relbuf: 0 =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) From owner-freebsd-ppc@freebsd.org Tue Mar 5 13:19:45 2019 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1BAEF1529D22; Tue, 5 Mar 2019 13:19:45 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246]) by mx1.freebsd.org (Postfix) with ESMTP id 3E5966A676; Tue, 5 Mar 2019 13:19:42 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 9CDA243BF06; Wed, 6 Mar 2019 00:19:39 +1100 (AEDT) Date: Wed, 6 Mar 2019 00:19:38 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Bruce Evans cc: Konstantin Belousov , Mark Millard , freebsd-hackers Hackers , FreeBSD PowerPC ML Subject: TSC "skew" (was: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed]) In-Reply-To: <20190305031010.I4610@besplex.bde.org> Message-ID: <20190305223415.U1563@besplex.bde.org> References: <20190301194217.GB68879@kib.kiev.ua> <20190302071425.G5025@besplex.bde.org> <20190302105140.GC68879@kib.kiev.ua> <20190302225513.W3408@besplex.bde.org> <20190302142521.GE68879@kib.kiev.ua> <20190303041441.V4781@besplex.bde.org> <20190303111931.GI68879@kib.kiev.ua> <20190303223100.B3572@besplex.bde.org> <20190303161635.GJ68879@kib.kiev.ua> <20190304043416.V5640@besplex.bde.org> <20190304114150.GM68879@kib.kiev.ua> <20190305031010.I4610@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=UJetJGXy c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=aZ2SpzNVlL9aNEeq27IA:9 a=CjuIK1q_8ugA:10 X-Rspamd-Queue-Id: 3E5966A676 X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org; spf=pass (mx1.freebsd.org: domain of brde@optusnet.com.au designates 211.29.132.246 as permitted sender) smtp.mailfrom=brde@optusnet.com.au X-Spamd-Result: default: False [-6.13 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; RCVD_IN_DNSWL_LOW(-0.10)[246.132.29.211.list.dnswl.org : 127.0.5.1]; FROM_HAS_DN(0.00)[]; FREEMAIL_FROM(0.00)[optusnet.com.au]; R_SPF_ALLOW(-0.20)[+ip4:211.29.132.0/23]; MIME_GOOD(-0.10)[text/plain]; MIME_TRACE(0.00)[0:+]; DMARC_NA(0.00)[optusnet.com.au]; RCPT_COUNT_FIVE(0.00)[5]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; MX_GOOD(-0.01)[cached: extmail.optusnet.com.au]; NEURAL_HAM_SHORT(-0.74)[-0.739,0]; IP_SCORE(-3.08)[ip: (-8.06), ipnet: 211.28.0.0/14(-4.06), asn: 4804(-3.24), country: AU(-0.04)]; FREEMAIL_TO(0.00)[optusnet.com.au]; RCVD_NO_TLS_LAST(0.10)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[optusnet.com.au]; ASN(0.00)[asn:4804, ipnet:211.28.0.0/14, country:AU]; FREEMAIL_CC(0.00)[gmail.com]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Mar 2019 13:19:45 -0000 On Tue, 5 Mar 2019, Bruce Evans wrote: > On Mon, 4 Mar 2019, Konstantin Belousov wrote: >* [... shift for bogus TSC-low timecounter] >> I suspect that the shift of 1 (at least) hides cross-socket inaccuracy. >> Otherwise, I think, some multi-socket machines would start showing the >> detectable backward-counting bintime(). At the frequencies at 4GHz and >> above (Intel has 5Ghz part numbers) I do not think that stability of >> 100MHz crystall and on-board traces is enough to avoid that. > > I think it is just a kludge that reduced the problem before it was fixed > properly using fences. > > Cross-socket latency is over 100 cycles according to jhb's tscskew > benchmark: on Haswell 4x2: > > CPU | TSC skew (min/avg/max/stddev) > ----+------------------------------ > 0 | 0 0 0 0.000 > 1 | 24 49 84 14.353 > 2 | 164 243 308 47.811 > 3 | 164 238 312 47.242 > 4 | 168 242 332 49.593 > 5 | 168 243 324 48.722 > 6 | 172 242 320 52.596 > 7 | 172 240 316 53.014 > > freefall is similar. Latency is apparently measured relative to CPU 0. > It is much lower to CPU 1 since that is on the same core. > > I played with this program a lot 3 and a half years ago, but forgot > mist of what I learned :-(. I tried different fencing in it. This > seems to make little difference when the program is rerun. With the > default TESTS = 1024, the min skew sometimes goes negative on freefall, > but with TESTS = 1024000 that doesn't happen. This is the opposite > of what I would expect. freefall has load average about 1. I understand this program again. First, its name is actually tscdrift. I tested the 2015 version, and this version is still in /usr/src/tools/tools/tscdrift/tscdrift.c, with no changes to except to the copyright (rgrimes wouldn't like this) and to $FreeBSD$. The program doesn't actually measure either TSC drift or TSC skew, except indirectly. What it actually measures is the IPC (Inter-Process- Communication) time for synchronizing the drift and skew measurments, except bugs or intentional sloppiness in its synchronization also make it give an indirect measurement of similar bugs or sloppiness in normal use. After changing TESTS from 1024 to 1024000, it shows large errors in the negative direction, as expected from either large negative skew or program bugs: this is on freefall: XX CPU | TSC skew (min/avg/max/stddev) XX ----+------------------------------ XX 0 | 0 0 0 0.000 XX 1 | -6148 108 10232 46.871 XX 2 | 114 209 95676 163.359 XX 3 | 96 202 47835 101.250 XX 4 | -2223 207 34017 117.257 XX 5 | -2349 206 33837 106.259 XX 6 | -2664 213 33579 96.048 XX 7 | -2451 212 49242 126.428 The negative "skews" occur because the server and the clients (1 client at a time) read the TSC with uncontrolled timing after the server opens the gate for this read (gate = 2). The IPC time is about 200 cycles to CPUs on different cores. So when neither thread is preempted, the TSC on the server is about 200 cycles in advance. Sometimes the server is preempted, so it reads its TSC later than the client (a maximum of about 6148 cycles later in this test). More often the client is preempted, since the IPC time is march larger than the time between the server opening the gate and the server reading its TSC. The server is also missing fencing for its TSC read, so this read may appear to occur several cycles before opening the gate. This gives a an error in the positive direction for the reported "skew" (the error is actually in the positive direction for the reported IPC time). It would be useful to measure this error by intentionally omitting fencing, but currently it is just a small amount of noise on top of the noise from preemption. After fixing the syncronization: XX CPU | TSC skew (min/avg/max/stddev) XX ----+------------------------------ XX 0 | 0 0 0 0.000 XX 1 | 33 62 49161 57.652 XX 2 | 108 169 33678 73.456 XX 3 | 108 171 43053 119.256 XX 4 | 141 169 41289 114.567 XX 5 | 141 169 40035 112.755 XX 6 | 132 186 147099 269.449 XX 7 | 153 183 431526 436.689 Synchronization apparenly takes a long time, especially to other cores. The minimum and avergae now gives the best-case IPC time very accurately. The average is 20-30 cycles smaller than before, probably because I fixed the fencing. The maximum and standard deviation are garbage noise from preemption. Preemption should be disabled somehow. Large drifts and skews would show up as nonsense values for the minimum IPC time. Small drifts would soon give large skews. To measure small skews, change the CPU of the server to measure the minimum IPC time in the opposite direction. Fixes: XX --- tscdrift.c 2015-07-10 06:22:36.505493000 +0000 XX +++ w.c 2019-03-05 11:22:22.232341000 +0000 XX @@ -32,6 +32,15 @@ XX #include XX #include XX #include XX +/* XX + * XXX: atomic.h is not used. Instead we depend on x86 memory ordering and XX + * do direct assignments to and comparisons of 'gate', and sometimes add XX + * memory barriers. The correct atomic ops would do much the same with XX + * clearer spelling. The 'lock' prefix is never needed and the barriers are XX + * only to get program order so as to give acq or rel semantics for ether XX + * the loads, the stores or for buggy unfenced rdtsc's. Fences also give XX + * program order, so some of the explicit barriers are redundant. XX + */ XX #include XX #include XX #include XX @@ -45,7 +54,7 @@ XX XX #define barrier() __asm __volatile("" ::: "memory") XX XX -#define TESTS 1024 XX +#define TESTS 1024000 XX XX static volatile int gate; XX static volatile uint64_t thread_tsc; XX @@ -74,12 +83,12 @@ XX gate = 1; XX while (gate == 1) XX cpu_spinwait(); XX - barrier(); XX XX + barrier(); XX __asm __volatile("lfence"); XX thread_tsc = rdtsc(); XX - XX barrier(); XX + XX gate = 3; XX while (gate == 3) XX cpu_spinwait(); This is the client. The explicit barriers are confusing, and the blank lines are in all the wrong places. All the accesses to 'gate' need to be in program order. x86 memory ordering gives this automatically at the hardware level. 'gate' being volatile gives it at the compiler level. Both rdtsc() and storing the result to thread_tsc need to be in program order. lfence() in cpufunc.h has a memory clobber which gives the former, but we use a direct asm and need a barrier() before it to do the same thing. Then we need another barrier() after the assignment to thread_tsc so that the store for this is before the store to 'gate' (I think gate being volatile doesn't give this). This also keeps the rdtsc() in program order (the asm for rdtsc() doesn't have a memory clobber. I haven't noticed care about this being taken anywhere else). Summary: only style changes in this section. XX @@ -139,12 +148,13 @@ XX for (j = 0; j < TESTS; j++) { XX while (gate != 1) XX cpu_spinwait(); XX - gate = 2; XX - barrier(); Move down opening the gate so that it not opened until after reading the TSC on the server. XX XX + barrier(); XX + __asm __volatile("lfence"); Fencing is not critical here. Using an early TSC value just gives a larger reported IPC time. The barrier is important for getting program order of rdtsc(). XX tsc = rdtsc(); XX - XX barrier(); This barrier is still associated with the TSC read, and the blank like is moved to reflect this. Here rdtsc() must occur in program order, but storing to tsc can be after storing to 'gate'. The barrier gives ordering for the store to tsc too. XX + XX + gate = 2; XX while (gate != 3) XX cpu_spinwait(); XX gate = 4; I tried some locked atomic ops on 'gate') and mfence instead of lfence to try to speed up the IPC. Nothing helped. We noticed long ago that fence instructions tend to be even slower that locked atomic ops for mutexes, and jhb guessed that this might be because fence instructions don't do so much to force out stores. Similar IPC is needed for updating timecounters. This benchmark indicates that after an update, the change usually won't be visible on other CPUs for 100+ cycles. Since updates are rare, this isn't much of a problem. Similar IPC is needed for comparing timecounters across CPUs. Any activity on different CPUs is incomparable without synchronization to establish an ordering. Since fences give ordering relative to memory and timecounters don't use anything except fences and memory order for the generation count to establish their order, the synchronization for comparing timecounters (or clock_gettime() at higher levels) must also use memory order. If the synchronization takes over 100 cycles, then smaller TSC skews don't matter much (they never break monotonicity, and only show up time differences varying by 100 or so cycles depending on which CPU measures the start and end events). Small differences don't matter at all. Skews may be caused by the TSCs actually being out of sync, or hardware only syncing them on average (hopefully with small jitter) or bugs like missing fences. Missing fences don't matter much provided unserialized TSC reads aren't too far in the past. E.g., if we had a guarantee of only 10 cycles in the past for the TSC and 160 cycles for IPCs to other CPUs, then we could omit the fences. But IPCs to the same core are 100 cycles faster so the margin is too close for ommitting fences in all cases. Similarly for imperfect hardware. Hopefully its skew is in the +-1 cycle range, but even +-10 isn't a problem if the IPC time is a bit larger than 10 and even +-100 if the IPC time is a bit larger than 100. And the problem scales nicely with the distance of the CPUs -- when they are further apart so that hardware synchronization of their TSCs is more difficult, the IPC time is large too. Hmm, that is only with physical IPCs. Since timecounters use physical IPCs for everything, they can't work right with virtual synchronization. Something like ntpd is needed to compare times across even small local networks. It does virtual synchronization by compensating for delays. Bruce From owner-freebsd-ppc@freebsd.org Tue Mar 5 21:12:17 2019 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9930815137E0 for ; Tue, 5 Mar 2019 21:12:17 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic301-32.consmr.mail.ne1.yahoo.com (sonic301-32.consmr.mail.ne1.yahoo.com [66.163.184.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6299E88EDB for ; Tue, 5 Mar 2019 21:12:16 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: 7BcSr1UVM1l4HMa2snRHTvOc8ztBLaoY1RgHPgj3Z9QOo3rT5NOEtfB87by0We0 os8QV74ZsAahQWlqnc9_aWodnA.OWNX3APXnKgLiQcVCn7O24mPkRhISUrHEmvVltw.wL66jo9j6 S3klFNcfBqWYU6z1PLSCatdS.fJ3jhx087y21qNII1kdV7SUKvcvfdXovizLnVRTLYChxDYpEsCm wt6JG8J7dL84rsWXIr6WTiWIsmcAT65lXA.4NLcsX_vyiR9xV9gkSxmIHRYoFYfGHhNRxLIhq.Hi baLqKadrQLbCFGokn9xbYqwlxD0L6u4mh5bImSzqEJdH9iP8bsUkSkNeGDDNwGygnzghQAlAgHur iUUwOg7lGF69mVaKg.EVSUh.jAez_Oc.8jYXpEAc0qpR812boVlwmR3gQeYai4lAaGQCRkiUMXVu OlxQThBcEzF83JYKuw.2LXj5KTZE_MEgw511E.DsCyL5rhOe6gm8vk7vgkkgtLAeYOxA2MX6osoK USRJUtsQD73hem2aKEGFehwN7tcUULPbjpvULt4Tg8t.hrS54icJam9LMIrblLXaJ48JdlEToA2u hmRqQEAlwJiEqgc1sFVQLcQrMhqxxKiLEXBU1NrPuvQQ1Vbbwe82eDrAUDwHHvUwE_z.bxLH3peJ grecAUaw7OGizeYhLRH8UnEWto2vRd8wpkJtlq9_LgrsFYdPZI8JHaEEjsD8dcPoY6t_5pDPKFcR x5c6oEvpdUIfCcCxrEZg.nq0XgVVD8DcDoC56.2_yqHLs6Oo3kzh7HtnINc4ZBI338f4hXFyQ2b. vXjkwacR6ibfheBgtu6C9V71d5R9dUHaH5FTbOO5m2r1MHtdiDeE8y4RF7Hvy8B2g4odMoPe6d3n DJmP2y3HVn3P8fHMH9aBvJ7xiNaR7uxB3p1IZ2iKkXQRZ.chYHDbrTw6hM_fI4o56ZSAI_ASHRZ4 _LTYdf_bCgKT.hTfOWJg_1OfnKW.loJejLKbe45C9Xit3EVkFDjLAQ5I.vmL921qInzQIZ6bgi1b bDYx1AVK2b2J8Ld4Eqd22RzkeFxLQI0b8LRhlrs2gHkNa0VwxQIXoftj5Zvo2a4I- Received: from sonic.gate.mail.ne1.yahoo.com by sonic301.consmr.mail.ne1.yahoo.com with HTTP; Tue, 5 Mar 2019 21:12:14 +0000 Received: from c-67-170-167-181.hsd1.or.comcast.net (EHLO [192.168.1.115]) ([67.170.167.181]) by smtp426.mail.ne1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA ID cb764fd4e90ffe12eeee0268693c5fc5; Tue, 05 Mar 2019 21:12:12 +0000 (UTC) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\)) Subject: FYI on powerpc64: an apparent ctfmerge unbounded recursion error on kernel.full built by system clang (7) and devel/powerpc64-binutils ? Message-Id: <6277E1EE-59C6-4A84-8714-9E0BFFB02A75@yahoo.com> Date: Tue, 5 Mar 2019 13:12:10 -0800 To: FreeBSD PowerPC ML , FreeBSD Toolchain X-Mailer: Apple Mail (2.3445.102.3) X-Rspamd-Queue-Id: 6299E88EDB X-Spamd-Bar: ++++ X-Spamd-Result: default: False [4.61 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; MV_CASE(0.50)[]; FREEMAIL_FROM(0.00)[yahoo.com]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; MX_GOOD(-0.01)[cached: mta6.am0.yahoodns.net]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_LAST(0.00)[]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; SUBJECT_ENDS_QUESTION(1.00)[]; MID_RHS_MATCH_FROM(0.00)[]; ASN(0.00)[asn:36646, ipnet:66.163.184.0/21, country:US]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; FROM_HAS_DN(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com.dwl.dnswl.org : 127.0.5.0]; NEURAL_SPAM_SHORT(0.96)[0.956,0]; MIME_GOOD(-0.10)[text/plain]; IP_SCORE(1.32)[ip: (4.41), ipnet: 66.163.184.0/21(1.25), asn: 36646(1.00), country: US(-0.07)]; NEURAL_SPAM_MEDIUM(0.90)[0.899,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_SPAM_LONG(0.95)[0.947,0]; RCVD_IN_DNSWL_NONE(0.00)[201.184.163.66.list.dnswl.org : 127.0.5.0]; RWL_MAILSPIKE_POSSIBLE(0.00)[201.184.163.66.rep.mailspike.net : 127.0.0.17] X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Mar 2019 21:12:17 -0000 The context here is head -r344018 based that was built via devel/powerpc64-xtoolchain-gcc facilities (that included building system clang 7). I attempted to have a system-clang based buildworld buildkernel that used /usr/local/powerpc64-unknown-freebsd13.0/bin/ld and such. Until I can synchronize and test a more modern head (including an updated clang) and ports, the below is probably just informational in case someone else runs into something similar. The: ctfmerge -L VERSION -g -o kernel.full locore.o cam.o cam_compat.o = cam_iosched.o got a segmentation fault but gdb shows over 13,000 levels of subroutine calls and a stack-access related failure: # gdb = /usr/obj/powerpc64vtsc_clang_altbinutils/powerpc.powerpc64/usr/src/powerpc= .powerpc64/tmp/usr/bin/ctfmerge = /usr/obj/powerpc64vtsc_clang_altbinutils/powerpc.powerpc64/usr/src/powerpc= .powerpc64/sys/GENERIC64vtsc-NODBG/ctfmerge.57350.core . . . Core was generated by `ctfmerge -L VERSION -g -o kernel.full locore.o = cam.o cam_compat.o cam_iosched.o'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000000010006588 in .hash_find_first_cb () [Current thread is 1 (LWP 100604)] (gdb) info reg r1 r1 0x3fffffffdf9fc000 4611686017884209152 (gdb) disass Dump of assembler code for function .hash_find_first_cb: 0x0000000010006584 <+0>: mflr r0 =3D> 0x0000000010006588 <+4>: std r31,-8(r1) 0x000000001000658c <+8>: std r0,16(r1) . . . (gdb) info threads Id Target Id Frame=20 * 1 LWP 100604 0x0000000010006588 in .hash_find_first_cb () 2 LWP 100220 0x0000000010045f88 in .__sys.umtx_op () 3 LWP 100602 0x0000000010045f84 in .__sys.umtx_op () 4 LWP 100603 0x0000000010045f84 in .__sys.umtx_op () 5 LWP 100605 0x0000000010045f84 in .__sys.umtx_op () 6 LWP 100606 0x0000000010045f84 in .__sys.umtx_op ()(gdb) bt #0 0x0000000010006588 in hash_find_first_cb (node=3D0x81cfc28b0, = arg=3D0x3fffffffdf9fc128) at = /usr/src/cddl/contrib/opensolaris/tools/ctf/cvt/hash.c:187 #1 0x0000000010008250 in list_iter (list=3D0x81cfc28b0, func=3D, private=3D0x3fffffffdf9fc128) at = /usr/src/cddl/contrib/opensolaris/tools/ctf/common/list.c:127 #2 0x0000000010006538 in hash_match (hash=3D0x81711dc40, key=3D, fun=3D@0x100f5360: 0x10006584 , = private=3D0x81711dc40) at /usr/src/cddl/contrib/opensolaris/tools/ctf/cvt/hash.c:149 #3 hash_find (hash=3D, key=3D, = value=3D) at = /usr/src/cddl/contrib/opensolaris/tools/ctf/cvt/hash.c:207 #4 0x0000000010000cb8 in alist_find (alist=3D, = name=3D, value=3D0x3fffffffdf9fc280) at = /usr/src/cddl/contrib/opensolaris/tools/ctf/cvt/alist.c:130 #5 0x000000001000a290 in get_mapping (ta=3D, = srcid=3D) at = /usr/src/cddl/contrib/opensolaris/tools/ctf/cvt/merge.c:195 #6 equiv_node (ctdp=3D0x81c109180, mtdp=3D0x81df6dfc0, = ed=3D0x3fffffffdfbfa1f8) at = /usr/src/cddl/contrib/opensolaris/tools/ctf/cvt/merge.c:365 #7 0x0000000010009290 in equiv_su (stdp=3D, = ttdp=3D, ed=3D0x3fffffffdfbfa1f8) at = /usr/src/cddl/contrib/opensolaris/tools/ctf/cvt/merge.c:291 #8 0x000000001000a34c in equiv_node (ctdp=3D0x81c10b740, = mtdp=3D0x81df7a1c0, ed=3D0x3fffffffdfbfa1f8) at = /usr/src/cddl/contrib/opensolaris/tools/ctf/cvt/merge.c:384 #9 0x0000000010009290 in equiv_su (stdp=3D, = ttdp=3D, ed=3D0x3fffffffdfbfa1f8) at = /usr/src/cddl/contrib/opensolaris/tools/ctf/cvt/merge.c:291 #10 0x000000001000a34c in equiv_node (ctdp=3D0x81c1091c0, = mtdp=3D0x81df7a000, ed=3D0x3fffffffdfbfa1f8) at = /usr/src/cddl/contrib/opensolaris/tools/ctf/cvt/merge.c:384 #11 0x000000001000a34c in equiv_node (ctdp=3D0x81c109180, = mtdp=3D0x81df6dfc0, ed=3D0x3fffffffdfbfa1f8) at = /usr/src/cddl/contrib/opensolaris/tools/ctf/cvt/merge.c:384 #12 0x0000000010009290 in equiv_su (stdp=3D, = ttdp=3D, ed=3D0x3fffffffdfbfa1f8) at = /usr/src/cddl/contrib/opensolaris/tools/ctf/cvt/merge.c:291 #13 0x000000001000a34c in equiv_node (ctdp=3D0x81c10b740, = mtdp=3D0x81df7a1c0, ed=3D0x3fffffffdfbfa1f8) at = /usr/src/cddl/contrib/opensolaris/tools/ctf/cvt/merge.c:384 #14 0x0000000010009290 in equiv_su (stdp=3D, = ttdp=3D, ed=3D0x3fffffffdfbfa1f8) at = /usr/src/cddl/contrib/opensolaris/tools/ctf/cvt/merge.c:291 #15 0x000000001000a34c in equiv_node (ctdp=3D0x81c1091c0, = mtdp=3D0x81df7a000, ed=3D0x3fffffffdfbfa1f8) at = /usr/src/cddl/contrib/opensolaris/tools/ctf/cvt/merge.c:384 #16 0x000000001000a34c in equiv_node (ctdp=3D0x81c109180, = mtdp=3D0x81df6dfc0, ed=3D0x3fffffffdfbfa1f8) at = /usr/src/cddl/contrib/opensolaris/tools/ctf/cvt/merge.c:384 #17 0x0000000010009290 in equiv_su (stdp=3D, = ttdp=3D, ed=3D0x3fffffffdfbfa1f8) at = /usr/src/cddl/contrib/opensolaris/tools/ctf/cvt/merge.c:291 #18 0x000000001000a34c in equiv_node (ctdp=3D0x81c10b740, = mtdp=3D0x81df7a1c0, ed=3D0x3fffffffdfbfa1f8) at = /usr/src/cddl/contrib/opensolaris/tools/ctf/cvt/merge.c:384 #19 0x0000000010009290 in equiv_su (stdp=3D, = ttdp=3D, ed=3D0x3fffffffdfbfa1f8) at = /usr/src/cddl/contrib/opensolaris/tools/ctf/cvt/merge.c:291 #20 0x000000001000a34c in equiv_node (ctdp=3D0x81c1091c0, = mtdp=3D0x81df7a000, ed=3D0x3fffffffdfbfa1f8) at = /usr/src/cddl/contrib/opensolaris/tools/ctf/cvt/merge.c:384 #21 0x000000001000a34c in equiv_node (ctdp=3D0x81c109180, = mtdp=3D0x81df6dfc0, ed=3D0x3fffffffdfbfa1f8) at = /usr/src/cddl/contrib/opensolaris/tools/ctf/cvt/merge.c:384 #22 0x0000000010009290 in equiv_su (stdp=3D, = ttdp=3D, ed=3D0x3fffffffdfbfa1f8) at = /usr/src/cddl/contrib/opensolaris/tools/ctf/cvt/merge.c:291 . . . Going the other way: . . . #13082 0x0000000000000000 in ?? () (gdb) down #13081 0x0000000010035010 in .thread_start () (gdb)=20 #13080 0x0000000010005864 in .worker_thread () (gdb)=20 #13079 0x0000000010008900 in .merge_into_master () (gdb)=20 #13078 0x00000000100060c8 in .hash_iter () (gdb)=20 #13077 0x0000000010008250 in .list_iter () (gdb)=20 #13076 0x0000000010009794 in .merge_type_cb () (gdb)=20 #13075 0x000000001000d500 in .iitraverse () (gdb)=20 #13074 0x000000001000d320 in .tdtraverse () (gdb)=20 #13073 0x000000001000d654 in .tdtrav_plain () (gdb)=20 #13072 0x000000001000d320 in .tdtraverse () (gdb)=20 #13071 0x000000001000d654 in .tdtrav_plain () (gdb)=20 #13070 0x000000001000d320 in .tdtraverse () (gdb)=20 #13069 0x000000001000d764 in .tdtrav_func () (gdb)=20 #13068 0x000000001000d320 in .tdtraverse () (gdb)=20 #13067 0x000000001000d654 in .tdtrav_plain () (gdb)=20 #13066 0x000000001000d320 in .tdtraverse () (gdb)=20 #13065 0x000000001000d7f8 in .tdtrav_su () (gdb)=20 #13064 0x000000001000d320 in .tdtraverse () (gdb)=20 #13063 0x000000001000d654 in .tdtrav_plain () (gdb)=20 #13062 0x000000001000d320 in .tdtraverse () (gdb)=20 #13061 0x000000001000d7f8 in .tdtrav_su () (gdb)=20 #13060 0x000000001000d368 in .tdtraverse () (gdb)=20 #13059 0x000000001000a468 in .map_td_tree_post () (gdb)=20 #13058 0x00000000100063dc in .hash_find_iter () (gdb)=20 #13057 0x0000000010008250 in .list_iter () (gdb)=20 #13056 0x0000000010006494 in .hash_find_list_cb () (gdb)=20 #13055 0x000000001000a0dc in .equiv_cb () (gdb)=20 #13054 0x000000001000a34c in .equiv_node () (gdb)=20 #13053 0x000000001000a34c in .equiv_node () (gdb)=20 #13052 0x0000000010009290 in .equiv_su () (gdb)=20 #13051 0x000000001000a34c in .equiv_node () (gdb)=20 #13050 0x0000000010009290 in .equiv_su () (gdb)=20 #13049 0x000000001000a34c in .equiv_node () (gdb)=20 #13048 0x000000001000a34c in .equiv_node () (gdb)=20 #13047 0x0000000010009290 in .equiv_su () (gdb)=20 #13046 0x000000001000a34c in .equiv_node () . . . =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) From owner-freebsd-ppc@freebsd.org Tue Mar 5 21:39:48 2019 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9F84B15144DA for ; Tue, 5 Mar 2019 21:39:48 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from freefall.freebsd.org (freefall.freebsd.org [96.47.72.132]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "freefall.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4438689E34 for ; Tue, 5 Mar 2019 21:39:48 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: by freefall.freebsd.org (Postfix) id 1FEC0110BD; Tue, 5 Mar 2019 21:39:48 +0000 (UTC) Delivered-To: powerpc@localmail.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [96.47.72.80]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (Client CN "mx1.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by freefall.freebsd.org (Postfix) with ESMTPS id 1AC32110BC for ; Tue, 5 Mar 2019 21:39:48 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.ysv.freebsd.org (mxrelay.ysv.freebsd.org [IPv6:2001:1900:2254:206a::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.ysv.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id CB71A89E30 for ; Tue, 5 Mar 2019 21:39:47 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.ysv.freebsd.org (Postfix) with ESMTPS id F0F5F1EBC5 for ; Tue, 5 Mar 2019 21:39:46 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id x25LdkCa084005 for ; Tue, 5 Mar 2019 21:39:46 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id x25LdkXd084004 for powerpc@FreeBSD.org; Tue, 5 Mar 2019 21:39:46 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: powerpc@FreeBSD.org Subject: [Bug 236188] devel/boost-libs and BOOST_MATH_NO_LONG_DOUBLE_MATH_FUNCTIONS Date: Tue, 05 Mar 2019 21:39:46 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Ports & Packages X-Bugzilla-Component: Individual Port(s) X-Bugzilla-Version: Latest X-Bugzilla-Keywords: needs-patch, needs-qa X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: mi@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: office@FreeBSD.org X-Bugzilla-Flags: maintainer-feedback? X-Bugzilla-Changed-Fields: blocked Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-Rspamd-Queue-Id: 4438689E34 X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [-2.97 / 15.00]; local_wl_from(0.00)[freebsd.org]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; NEURAL_HAM_SHORT(-0.97)[-0.970,0]; ASN(0.00)[asn:11403, ipnet:96.47.64.0/20, country:US]; NEURAL_HAM_LONG(-1.00)[-1.000,0] X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Mar 2019 21:39:48 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D236188 Mikhail Teterin changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks|233864 | Referenced Bugs: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D233864 [Bug 233864] finance/quantlib: update 1.13 -> 1.14, fix stage-qa --=20 You are receiving this mail because: You are on the CC list for the bug.= From owner-freebsd-ppc@freebsd.org Tue Mar 5 22:07:25 2019 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A1296151824D for ; Tue, 5 Mar 2019 22:07:25 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic314-21.consmr.mail.ne1.yahoo.com (sonic314-21.consmr.mail.ne1.yahoo.com [66.163.189.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id CE7438B287 for ; Tue, 5 Mar 2019 22:07:23 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: qQ2oWJgVM1lUO35L5CJiMTkawtOl2BE8TBB63u05b0d1UmA7LsAop2b2vxtF691 nOjeX86iQFLdL5Xh.z562QhEijzdvvFNQYI4.xgnq40jyIm04ALEE7mNXDqZC4G.yxL8czG8sXH5 1MNWpA8zqIOiWCkT9cd3i0lE1q3OpHSYi6YCzGeTd1jdAIoYSH0Htq6k_fQXQX72h8e7Sz9gY3eW yVFyx8Aw0MZyLPQSewUE4xv8csuHuizh556byayPT4SAjrcekBvdvzvMexbFY7qgbRgy_rYXBivi oRFZBFYe7hElGgGam6pZK6TLF4v9whurnBu5T_fsSkY7MVL6BeZ8PEsO.W4MMghFhFgilFnXNbvO V82map4G5ZuHVVPBmEghtFbHBgZyCCRS48Ujaj7pBIEya5AIUq4VsoDhKnq0m8Cd_za0H5YJL6Vz IV3LYuV9GH.8pghy1aMvo9X5O6t02gOb5fNrwZrkhWdqOc0W0HxP4yIDCL1VVIVENu5oNowe5f2n m2AIr6y3wBzIRH_XNI5OU_.pIilqoJoYoZyGhjdUgUTLPRbZMqkc3t5MKvnOOJ_dkNcu.AOLyZrX TcoZpIem.YR_HEy2u.RrHVdKE6DLyLdhy2kulv5b3XYbO.srjVwzWurlqYSsui5enRN.WimR3p1s wDvS3hahlEALBz6qILL_Q6APuRdNDJ2Lb__NsXZoHHBAfqzE784i_TP2EEdGlBm8kjTGWMHMIL62 A48QDBZ0Tdela3e1JNSxngW6Bl2o64UNTke9I5Fy_XfylQqZITL0qjEYzz.TAGETwzRfG336l4j7 y9fOW6TvvVxUSyFGGL4hHQm8buF63qt8ERDBKd_Pl3j6VYFF7brUFgRmIywOsgYeqUMvYrHPZNOm YeYLFCVoE0DEwNCfhhFpRj4bS2FnI1tLjUo9XZTN1ytvuyjdlhHaspO69YH1xsoA7MdkIkELo2fV foNNj0nU625xPwfFNBwUwrMaIVRu93ov_3B2wvxkHpL4E3xT2jUD3R2g2P55uFCkDpdJHkh47KQP tV6J8pm9C3WKVGuktt27gPKOUfrwA7INtwT1c7VuryG3K9NZsBWHl1xaqpuome.61r5eyTgzEsiC 1sCmhP4tTeQKQBkViqJfQO1k2x.O1EaLSukh7Q7XoKQ-- Received: from sonic.gate.mail.ne1.yahoo.com by sonic314.consmr.mail.ne1.yahoo.com with HTTP; Tue, 5 Mar 2019 22:07:17 +0000 Received: from c-67-170-167-181.hsd1.or.comcast.net (EHLO [192.168.1.115]) ([67.170.167.181]) by smtp401.mail.ne1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA ID b984ec35996857788cf96da299e4bc56; Tue, 05 Mar 2019 22:07:14 +0000 (UTC) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\)) Subject: Looks like head's clang 8 update broke compiling at least llvm/tools/lldb/source/API/SBMemoryRegionInfo*.cpp via gcc Message-Id: <962986D0-7F78-4DB4-87A5-3C10A93AC067@yahoo.com> Date: Tue, 5 Mar 2019 14:07:12 -0800 Cc: Dimitry Andric To: FreeBSD PowerPC ML , FreeBSD Toolchain X-Mailer: Apple Mail (2.3445.102.3) X-Rspamd-Queue-Id: CE7438B287 X-Spamd-Bar: ++ X-Spamd-Result: default: False [2.43 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; MV_CASE(0.50)[]; FREEMAIL_FROM(0.00)[yahoo.com]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; MX_GOOD(-0.01)[cached: mta6.am0.yahoodns.net]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36646, ipnet:66.163.184.0/21, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com.dwl.dnswl.org : 127.0.5.0]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; NEURAL_SPAM_SHORT(0.50)[0.495,0]; MIME_GOOD(-0.10)[text/plain]; IP_SCORE(1.31)[ip: (4.36), ipnet: 66.163.184.0/21(1.25), asn: 36646(1.00), country: US(-0.07)]; NEURAL_SPAM_MEDIUM(0.44)[0.443,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_SPAM_LONG(0.69)[0.690,0]; RCVD_IN_DNSWL_NONE(0.00)[147.189.163.66.list.dnswl.org : 127.0.5.0] X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Mar 2019 22:07:25 -0000 For example, = https://ci.freebsd.org/job/FreeBSD-head-amd64-gcc/9167/console shows: 13:07:44 --- API/SBMemoryRegionInfo.o --- 13:07:44 In file included from = /workspace/src/contrib/llvm/tools/lldb/source/API/SBMemoryRegionInfo.cpp:1= 4:0: 13:07:44 = /workspace/src/contrib/llvm/tools/lldb/include/lldb/Target/MemoryRegionInf= o.h:128:54: error: 'template = lldb_private::MemoryRegionInfos::MemoryRegionInfos(_InputIterator, = _InputIterator, const allocator_type&)' inherited from = 'std::__1::vector' 13:07:44 using std::vector::vector; 13:07:44 ^~~~~~ 13:07:44 = /workspace/src/contrib/llvm/tools/lldb/include/lldb/Target/MemoryRegionInf= o.h:128:54: error: conflicts with version inherited from = 'std::__1::vector' 13:07:44 *** [API/SBMemoryRegionInfo.o] Error code 1 13:07:44=20 13:07:44 make[6]: stopped in /workspace/src/lib/clang/liblldb 13:07:44 --- API/SBMemoryRegionInfoList.o --- 13:07:44 In file included from = /workspace/src/contrib/llvm/tools/lldb/source/API/SBMemoryRegionInfoList.c= pp:13:0: 13:07:44 = /workspace/src/contrib/llvm/tools/lldb/include/lldb/Target/MemoryRegionInf= o.h:128:54: error: 'template = lldb_private::MemoryRegionInfos::MemoryRegionInfos(_InputIterator, = _InputIterator, const allocator_type&)' inherited from = 'std::__1::vector' 13:07:44 using std::vector::vector; 13:07:44 ^~~~~~ 13:07:44 = /workspace/src/contrib/llvm/tools/lldb/include/lldb/Target/MemoryRegionInf= o.h:128:54: error: conflicts with version inherited from = 'std::__1::vector' 13:07:45 *** [API/SBMemoryRegionInfoList.o] Error code 1 (I would directly notice such things more via powerpc64 experiments [native builds and cross builds], not via amd64 self-hosted.) Of course, if lldb is not built, the build might complete overall. But the status for such is not obvious. =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) From owner-freebsd-ppc@freebsd.org Tue Mar 5 23:02:53 2019 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5BBFA151A435 for ; Tue, 5 Mar 2019 23:02:53 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from freefall.freebsd.org (freefall.freebsd.org [96.47.72.132]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "freefall.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id EF8F18D212 for ; Tue, 5 Mar 2019 23:02:52 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: by freefall.freebsd.org (Postfix) id DC9EB124E3; Tue, 5 Mar 2019 23:02:52 +0000 (UTC) Delivered-To: powerpc@localmail.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [96.47.72.80]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (Client CN "mx1.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by freefall.freebsd.org (Postfix) with ESMTPS id D79E6124E1 for ; Tue, 5 Mar 2019 23:02:52 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.ysv.freebsd.org (mxrelay.ysv.freebsd.org [IPv6:2001:1900:2254:206a::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.ysv.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 9ACBD8D20F for ; Tue, 5 Mar 2019 23:02:52 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.ysv.freebsd.org (Postfix) with ESMTPS id E41781F8F0 for ; Tue, 5 Mar 2019 23:02:51 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id x25N2pWx025899 for ; Tue, 5 Mar 2019 23:02:51 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id x25N2p7s025898 for powerpc@FreeBSD.org; Tue, 5 Mar 2019 23:02:51 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: powerpc@FreeBSD.org Subject: [Bug 236188] devel/boost-libs and BOOST_MATH_NO_LONG_DOUBLE_MATH_FUNCTIONS Date: Tue, 05 Mar 2019 23:02:50 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Ports & Packages X-Bugzilla-Component: Individual Port(s) X-Bugzilla-Version: Latest X-Bugzilla-Keywords: needs-patch, needs-qa X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: sbruno@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: office@FreeBSD.org X-Bugzilla-Flags: maintainer-feedback? X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-Rspamd-Queue-Id: EF8F18D212 X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [-2.97 / 15.00]; local_wl_from(0.00)[freebsd.org]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; NEURAL_HAM_SHORT(-0.97)[-0.974,0]; ASN(0.00)[asn:11403, ipnet:96.47.64.0/20, country:US]; NEURAL_HAM_LONG(-1.00)[-1.000,0] X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 05 Mar 2019 23:02:53 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D236188 Sean Bruno changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |sbruno@FreeBSD.org --- Comment #5 from Sean Bruno --- ref12-ppc64.freebsd.org does exist and is a jail on real ppc64 hardware if = you need to test things (for freebsd.org committers only). --=20 You are receiving this mail because: You are on the CC list for the bug.= From owner-freebsd-ppc@freebsd.org Wed Mar 6 04:19:59 2019 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 38C8C1525712 for ; Wed, 6 Mar 2019 04:19:59 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic308-12.consmr.mail.ne1.yahoo.com (sonic308-12.consmr.mail.ne1.yahoo.com [66.163.187.35]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 067A9724A8 for ; Wed, 6 Mar 2019 04:19:57 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: hi_4JoMVM1n56TumW9rN2aUU3WVBj0MygTFNr90CDtV6JQ5kGXMeKmYNs6dPhHs 3sQNwBcE03r7CcSATnyEgdvFTptb2thpTte6f5q.4EzwXYnrXXkhL5EzyYaG_naI90W7tWjfYRqp XDDVWCo.FU7Cm4i1LZOPqet42CMi4ChSWHlZ3Q61._b99.2ax1yVL._4vMDKitDISvwIc0q8.J5P ymN_TpP8uaaZ0jD91GGrDBwRLTtDrGAq.Frh9pkslSNt3jP8WoV7TQZKR4iIZDOlaHU4aFlKZJBS 2FzsfazUTR8_HEAyB64tY.F2ANqQA7d8d8nJacX5Dds2s42UKxIh9jo69D3cKWu6ssA1cf5x_Zsz bsF0OSoX_SZN0iukweDOh8r0Wm1_S_Tr85Zx9lyQMDgomS43lmEjlR6ELnKYhvuJbzTRaJmuTM2O teY1WtaR1qgahx4QOKydH2M2gnqwPMVxt9s2dWPgbkYi68gsxyexxjJkwbOReSfv1___o0jPTdJ5 lmp_ZeOvnA0h6_VmXcciQ2LdWKcwiJDKszMLlVuldv.DaeRc0KCWPfYDFQ.xmYGPks2ZNdhyerQ8 _BH3gCWGfyyjPKIQBYEV1l8jEmvzqvopZmdziMW4qDU407DtOAHdyd8rTgo1.sCdUywclg3I5HTI VblPiIatm83PdXV8ITTUNexZaqYoCyqPJrJal9bD2BkoX6wZ0o.ieVwk8pbUsl9TCw6h6Sq1l0K5 STYII0a2CemU2.7Lm9XYoBl6YlCx2DFdfPRgr286_ubXFhy680mYyBqhDSN1P6fnYGHhyh136C4j xetFn0RTxhKtcaZx.l5T.6.DOjlsdiOHK9bNoJr4Ga39KwWp5nEQSIS27TzY4BAnuOE5DAoPsv0d UYbdGSB7ySkpcHuK5kPsh9VsfBorOKhlANlBDlKgSXOgmzo7dCyLtLWUy0jotInPkiIYZF5eLpg9 A9ws7l.86JQM8y4x4WiVbw6Iuyo7KM7GSr.v5OOyjXoc8MUERF9lWBOwA_tlEWOdxek78bIPQU_l s3WJHRBe43UCp__kuSpyFWh4A5igSICwr7oOyvD2FfSRFe9srYv45t9.J8g-- Received: from sonic.gate.mail.ne1.yahoo.com by sonic308.consmr.mail.ne1.yahoo.com with HTTP; Wed, 6 Mar 2019 04:19:51 +0000 Received: from c-67-170-167-181.hsd1.or.comcast.net (EHLO [192.168.1.115]) ([67.170.167.181]) by smtp409.mail.ne1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA ID 3957fa08fc24e321790eb19928b57284; Wed, 06 Mar 2019 04:19:46 +0000 (UTC) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\)) Subject: amd64 -> powerpc64 port base/gcc use: "checking whether the C compiler works... Unable to load interpreter" and "If you meant to cross compile, use `--host'" Message-Id: <71E6ECD7-76B4-4C5A-9071-3CA0A20B4F24@yahoo.com> Date: Tue, 5 Mar 2019 20:19:45 -0800 To: John Baldwin , FreeBSD PowerPC ML , FreeBSD Toolchain X-Mailer: Apple Mail (2.3445.102.3) X-Rspamd-Queue-Id: 067A9724A8 X-Spamd-Bar: ++ X-Spamd-Result: default: False [2.30 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; MV_CASE(0.50)[]; FREEMAIL_FROM(0.00)[yahoo.com]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; MX_GOOD(-0.01)[cached: mta6.am0.yahoodns.net]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36646, ipnet:66.163.184.0/21, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com.dwl.dnswl.org : 127.0.5.0]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; NEURAL_SPAM_SHORT(0.96)[0.962,0]; MIME_GOOD(-0.10)[text/plain]; IP_SCORE(1.16)[ip: (3.62), ipnet: 66.163.184.0/21(1.25), asn: 36646(1.00), country: US(-0.07)]; NEURAL_SPAM_MEDIUM(0.25)[0.253,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_SPAM_LONG(0.44)[0.438,0]; RCVD_IN_DNSWL_NONE(0.00)[35.187.163.66.list.dnswl.org : 127.0.5.0] X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Mar 2019 04:19:59 -0000 In trying to update powerpc64 from head -r344018 based to -r344825 based context via amd64->powerpc64 cross builds: base/binutils worked okay but base/gcc failed. This reports for base/gcc . (I actually only use base/binutils normally but I try base/gcc in case = it turns out that I need it.) # svnlite info /usr/ports | grep "Re[plv]" Relative URL: ^/head Repository Root: svn://svn.freebsd.org/ports Repository UUID: 35697150-7ecd-e111-bb59-0022644237b5 Revision: 494751 Last Changed Rev: 494751 # pwd /usr/ports/base/gcc # make CROSS_TOOLCHAIN=3Dpowerpc64-gcc = CROSS_SYSROOT=3D/usr/obj/DESTDIRs/xtcgcc-powerpc64-installworld package =3D=3D=3D> License GPLv3 GPLv3RLE accepted by the user =3D=3D=3D> freebsd-gcc-6.4.0_2 depends on file: /usr/local/sbin/pkg - = found =3D=3D=3D> Fetching all distfiles required by freebsd-gcc-6.4.0_2 for = building =3D=3D=3D> Extracting for freebsd-gcc-6.4.0_2 =3D> SHA256 Checksum OK for gcc-6.4.0.tar.xz. =3D> SHA256 Checksum OK for mpfr-3.1.6.tar.xz. =3D> SHA256 Checksum OK for gmp-6.1.2.tar.xz. =3D> SHA256 Checksum OK for mpc-1.0.3.tar.gz. cd /wrkdirs/usr/ports/base/gcc/work/gcc-6.4.0; /bin/ln -sf ../mpfr-3.1.6 = mpfr ; /bin/ln -sf ../gmp-6.1.2 gmp ; /bin/ln -sf ../mpc-1.0.3 mpc =3D=3D=3D> Patching for freebsd-gcc-6.4.0_2 =3D=3D=3D> Applying extra patch = /usr/ports/base/gcc/../../devel/powerpc64-gcc/files/freebsd-format-extensi= ons =3D=3D=3D> Applying extra patch = /usr/ports/base/gcc/../../devel/powerpc64-gcc/files/freebsd-libdir =3D=3D=3D> Applying extra patch = /usr/ports/base/gcc/../../devel/powerpc64-gcc/files/patch-gcc-freebsd-mips= =3D=3D=3D> Applying FreeBSD patches for freebsd-gcc-6.4.0_2 =3D=3D=3D> freebsd-gcc-6.4.0_2 depends on executable: gmake - found =3D=3D=3D> freebsd-gcc-6.4.0_2 depends on executable: makeinfo - found =3D=3D=3D> Configuring for freebsd-gcc-6.4.0_2 configure: loading site script /usr/ports/Templates/config.site checking build system type... powerpc64-unknown-freebsd13.0 checking host system type... powerpc64-unknown-freebsd13.0 checking target system type... powerpc64-unknown-freebsd13.0 checking for a BSD-compatible install... /usr/bin/install -c checking whether ln works... yes checking whether ln -s works... yes checking for a sed that does not truncate output... (cached) = /usr/bin/sed checking for gawk... (cached) /usr/bin/awk checking for libatomic support... yes checking for libcilkrts support... no checking for libitm support... yes checking for libsanitizer support... no checking for libvtv support... no checking for libmpx support... no checking for powerpc64-unknown-freebsd13.0-gcc... = /usr/local/bin/powerpc64-unknown-freebsd13.0-gcc = --sysroot=3D/usr/obj/DESTDIRs/xtcgcc-powerpc64-installworld checking for C compiler default output file name... a.out checking whether the C compiler works... Unable to load interpreter configure: error: in `/wrkdirs/usr/ports/base/gcc/work/.build': configure: error: cannot run C compiled programs. If you meant to cross compile, use `--host'. See `config.log' for more details. =3D=3D=3D> Script "configure" failed unexpectedly. Please report the problem to bapt@FreeBSD.org [maintainer] and attach = the "/wrkdirs/usr/ports/base/gcc/work/.build/config.log" including the = output of the failure of your make command. Also, it might be a good idea to = provide an overview of all packages installed on your system (e.g. a /usr/local/sbin/pkg-static info -g -Ea). *** Error code 1 Stop. make: stopped in /usr/ports/base/gcc The config.log has: . . . configure:4413: checking for C compiler default output file name configure:4435: /usr/local/bin/powerpc64-unknown-freebsd13.0-gcc = --sysroot=3D/usr/obj/DESTDIRs/xtcgcc-powerpc64-installworld -O2 -pipe = -g -fno-strict-aliasing conftest.c >&5 configure:4439: $? =3D 0 configure:4476: result: a.out configure:4492: checking whether the C compiler works configure:4501: ./a.out configure:4505: $? =3D 255 configure:4512: error: in `/wrkdirs/usr/ports/base/gcc/work/.build': configure:4516: error: cannot run C compiled programs. If you meant to cross compile, use `--host'. See `config.log' for more details. . . . =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) From owner-freebsd-ppc@freebsd.org Wed Mar 6 05:54:11 2019 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C9D0D1529B01 for ; Wed, 6 Mar 2019 05:54:11 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2610:1c1:1:6074::16:84]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "freefall.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 70BDE7565D for ; Wed, 6 Mar 2019 05:54:11 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: by freefall.freebsd.org (Postfix) id 5C75018746; Wed, 6 Mar 2019 05:54:11 +0000 (UTC) Delivered-To: powerpc@localmail.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (Client CN "mx1.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by freefall.freebsd.org (Postfix) with ESMTPS id 59A5D18745 for ; Wed, 6 Mar 2019 05:54:11 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.ysv.freebsd.org (mxrelay.ysv.freebsd.org [IPv6:2001:1900:2254:206a::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.ysv.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id EDFEA75658 for ; Wed, 6 Mar 2019 05:54:10 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.ysv.freebsd.org (Postfix) with ESMTPS id 36F8A36D0 for ; Wed, 6 Mar 2019 05:54:10 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id x265sA5X048518 for ; Wed, 6 Mar 2019 05:54:10 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id x265sABj048517 for powerpc@FreeBSD.org; Wed, 6 Mar 2019 05:54:10 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: powerpc@FreeBSD.org Subject: [Bug 236188] devel/boost-libs and BOOST_MATH_NO_LONG_DOUBLE_MATH_FUNCTIONS Date: Wed, 06 Mar 2019 05:54:10 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Ports & Packages X-Bugzilla-Component: Individual Port(s) X-Bugzilla-Version: Latest X-Bugzilla-Keywords: needs-patch, needs-qa X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: danfe@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: office@FreeBSD.org X-Bugzilla-Flags: maintainer-feedback? X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-Rspamd-Queue-Id: 70BDE7565D X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [-3.00 / 15.00]; local_wl_from(0.00)[freebsd.org]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; NEURAL_HAM_SHORT(-1.00)[-0.996,0]; ASN(0.00)[asn:11403, ipnet:2610:1c1:1::/48, country:US]; NEURAL_HAM_LONG(-1.00)[-1.000,0] X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Mar 2019 05:54:12 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D236188 --- Comment #6 from Alexey Dokuchaev --- Great! Can you guys document its (ppc64 reference box) existence at the https://www.freebsd.org/internal/machines.html page? --=20 You are receiving this mail because: You are on the CC list for the bug.= From owner-freebsd-ppc@freebsd.org Wed Mar 6 09:54:56 2019 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id F1131150A5FC for ; Wed, 6 Mar 2019 09:54:55 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic305-19.consmr.mail.gq1.yahoo.com (sonic305-19.consmr.mail.gq1.yahoo.com [98.137.64.82]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 7D0998562E for ; Wed, 6 Mar 2019 09:54:54 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: GUpR_BUVM1nvV1PANbcozE5EFgRT1TqXcQtu8GVjBlg7yhKbPcHC_GnkO8eicrl w2paqrF7QwpC4Q4_f4MX3E2y1cpnte8NwkznH96YO1iE_NB4vPwkRkOcGkudiNYL.KOqI.xIDXZc NeJcYR900lzGcEQVQpe3GksUDJYAEuI_jh2bDKOJg9rw7SaTFklhlsfgaUSMcor0N.SXFu5fZcPj aHEyIskdtREX5GfRNktBS9VnRNSqjQBN3DQ8nrLqKJcOqYQiS9PARxE8wJqhrnBNn35pkTQ2NaWM LovBdjWAM3G9A.yDzMHW9kuUNZqOVT9duUKifpCDZ3BXbr4yRTAt1robwYe9Otl.B1XCNY9KD_nt A1PL0j3X6mrqFXH.gCtkSlTiO7B2JBJRb87W0A7o2iyiHwQfLPN9p3MXlO_k_HIfr9Ska2EUMvGP 8i_s0nsB8153k17prFAIpa9TspXA6TV8Nw.qL4B5A5n.w8Wwu6t5AhzpoEAtMMN1eVts_VSjr.Kl XSelFVkuNsM_IgTHm.oCYmodGDXYIBsIAFtLhRTEyGO.q0Jej1nGUKUKrgr0HdcD1RHgY0W2qUQx Ng9xCndqZ6ZpEdxfgu1DhYvQ6KXrVxdx.ofCTm9o29K8TAB.toUoUNDk1RuFIL6HuIn08tEQs3x2 jOWjvgZBZ2hSbA8_7xctR5I6dt.aXwk65Zz8aEZrWVswBOiXgY_8Tam.8lvZSOs.ADJjq5qCzRpF Kum23pyHRDQPX0I_Zb78Vh4oOl2ZBNtpUbErlJIPCmLHvW.4QE7YoT1a9tQif4wczDf5InxRE62O dMc6oA5TTdfZEiPPlu3dY7ffnWj1tsQzS.8CS1jSr8x3c2UHt4g3CXe8siyLeNi4eCmN7S7GlEYS RK.ch_yy2bP4fS_1kB6oqbTn38fhBKvYavjk.Cab.GIlDQD5bQKm8IBiahBhAh2jE5YmfIgM6kBy 2PAlOAqg7oHkg1FUizkdtqpcSVb2LB29EKaMFNOBQxx6negBQNgGiQBenQTt29PNjYV899v78bna wtraNvKDDxbYvsQiLH.Pqkn_4O3b2BqX9HJCtHrDtDtEZvr10ul9VoVgOkg-- Received: from sonic.gate.mail.ne1.yahoo.com by sonic305.consmr.mail.gq1.yahoo.com with HTTP; Wed, 6 Mar 2019 09:54:46 +0000 Received: from c-67-170-167-181.hsd1.or.comcast.net (EHLO [192.168.1.115]) ([67.170.167.181]) by smtp411.mail.gq1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA ID 382b0bff05cc32c7520c73bed8835798; Wed, 06 Mar 2019 09:54:46 +0000 (UTC) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\)) Subject: powerpc64: devel/qt5-core build fails with: "Q_ATOMIC_INT64_IS_SUPPORTED must be defined on a 64-bit platform", more Message-Id: <8E6E5DBC-9935-4D88-BDBF-F7C52429019D@yahoo.com> Date: Wed, 6 Mar 2019 01:54:45 -0800 To: ports-list freebsd , FreeBSD PowerPC ML X-Mailer: Apple Mail (2.3445.102.3) X-Rspamd-Queue-Id: 7D0998562E X-Spamd-Bar: +++ X-Spamd-Result: default: False [3.54 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; MV_CASE(0.50)[]; FREEMAIL_FROM(0.00)[yahoo.com]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; MX_GOOD(-0.01)[cached: mta6.am0.yahoodns.net]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/21, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com.dwl.dnswl.org : 127.0.5.0]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; FROM_HAS_DN(0.00)[]; NEURAL_SPAM_SHORT(0.80)[0.799,0]; MIME_GOOD(-0.10)[text/plain]; IP_SCORE(1.63)[ip: (6.40), ipnet: 98.137.64.0/21(1.01), asn: 36647(0.81), country: US(-0.07)]; NEURAL_SPAM_MEDIUM(0.79)[0.789,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_SPAM_LONG(0.83)[0.834,0]; RCVD_IN_DNSWL_NONE(0.00)[82.64.137.98.list.dnswl.org : 127.0.5.0] X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Mar 2019 09:54:56 -0000 The below is from a ports-mgmt/poudriere-devel run under FreeBSD head -r344825 on an old PowerMac G5 (2 sockets, 2 cores each, powerpc64). The /usr/ports is from head -r494751 . buildworld buildkernel was via devel/powerpc64-xtoolchain-gcc materials and system-clang (8.0.0) was built (and installed as cc) as part of that. /usr/ports/base/binutils was used to supply the system binutils, including ld. (Running the PowerPC G5 for this context does require some hacks in /usr/src/ currently.) --- .obj/qatomic.o --- g++8 -c -O2 -pipe -g -Wl,-rpath=3D/usr/local/lib/gcc8 = -Wl,-rpath=3D/usr/local/lib/gcc8 -Og -std=3Dc++1z -fvisibility=3Dhidden = -fvisibility-inlines-hidden -Wall -W -pthread -fPIC -DQT_GLIB = -DQT_NO_USING_NAMESPACE -DQT_NO_FOREACH = -DQT_NO_NARROWING_CONVERSIONS_IN_CONNECT -DQT_BUILD_CORE_LIB = -DQT_BUILDING_QT -DQT_NO_CAST_TO_ASCII -DQT_ASCII_CAST_WARNINGS = -DQT_MOC_COMPAT -DQT_USE_QSTRINGBUILDER -DQT_DEPRECATED_WARNINGS = -DQT_DISABLE_DEPRECATED_BEFORE=3D0x050000 -D_LARGEFILE64_SOURCE = -D_LARGEFILE_SOURCE -I. -I../3rdparty/zlib/src -Iglobal = -I../3rdparty/harfbuzz/src -I../3rdparty/md5 -I../3rdparty/md4 = -I../3rdparty/sha3 -I../3rdparty -I../3rdparty/double-conversion/include = -I../3rdparty/double-conversion/include/double-conversion = -I../3rdparty/forkfd -I../3rdparty/tinycbor/src -I../../include = -I../../include/QtCore -I../../include/QtCore/5.12.1 = -I../../include/QtCore/5.12.1/QtCore -I.moc -I.tracegen -isystem = /usr/local/include/glib-2.0 -I/usr/local/lib/glib-2.0/include -isystem = /usr/local/include -I/usr/local/lib/qt5/mkspecs/freebsd-g++ -o = .obj/qatomic.o thread/qatomic.cpp thread/qatomic.cpp:1624:4: error: #error "Q_ATOMIC_INT64_IS_SUPPORTED = must be defined on a 64-bit platform" # error "Q_ATOMIC_INT64_IS_SUPPORTED must be defined on a 64-bit = platform" ^~~~~ In file included from ../../include/QtCore/qglobal.h:1, from thread/qatomic.h:41, from thread/qatomic.cpp:41: ../../include/QtCore/../../src/corelib/thread/qbasicatomic.h: In = instantiation of 'class QBasicAtomicInteger': ../../include/QtCore/../../src/corelib/thread/qatomic.h:55:7: required = from 'class QAtomicInteger' thread/qatomic.cpp:1631:1: required from here ../../include/QtCore/../../src/corelib/global/qglobal.h:121:63: error: = static assertion failed: template parameter is an integral of a size not = supported on this platform # define Q_STATIC_ASSERT_X(Condition, Message) = static_assert(bool(Condition), Message) = ^~~~~~~~~~~~~~~ ../../include/QtCore/../../src/corelib/thread/qbasicatomic.h:97:5: note: = in expansion of macro 'Q_STATIC_ASSERT_X' Q_STATIC_ASSERT_X(QAtomicOpsSupport::IsSupported, = "template parameter is an integral of a size not supported on this = platform"); ^~~~~~~~~~~~~~~~~ ../../include/QtCore/../../src/corelib/thread/qbasicatomic.h: In = instantiation of 'class QBasicAtomicInteger': ../../include/QtCore/../../src/corelib/thread/qatomic.h:55:7: required = from 'class QAtomicInteger' thread/qatomic.cpp:1632:1: required from here ../../include/QtCore/../../src/corelib/global/qglobal.h:121:63: error: = static assertion failed: template parameter is an integral of a size not = supported on this platform # define Q_STATIC_ASSERT_X(Condition, Message) = static_assert(bool(Condition), Message) = ^~~~~~~~~~~~~~~ ../../include/QtCore/../../src/corelib/thread/qbasicatomic.h:97:5: note: = in expansion of macro 'Q_STATIC_ASSERT_X' Q_STATIC_ASSERT_X(QAtomicOpsSupport::IsSupported, = "template parameter is an integral of a size not supported on this = platform"); ^~~~~~~~~~~~~~~~~ ../../include/QtCore/../../src/corelib/thread/qbasicatomic.h: In = instantiation of 'class QBasicAtomicInteger': ../../include/QtCore/../../src/corelib/thread/qatomic.h:55:7: required = from 'class QAtomicInteger' thread/qatomic.cpp:1633:1: required from here ../../include/QtCore/../../src/corelib/global/qglobal.h:121:63: error: = static assertion failed: template parameter is an integral of a size not = supported on this platform # define Q_STATIC_ASSERT_X(Condition, Message) = static_assert(bool(Condition), Message) = ^~~~~~~~~~~~~~~ ../../include/QtCore/../../src/corelib/thread/qbasicatomic.h:97:5: note: = in expansion of macro 'Q_STATIC_ASSERT_X' Q_STATIC_ASSERT_X(QAtomicOpsSupport::IsSupported, = "template parameter is an integral of a size not supported on this = platform"); ^~~~~~~~~~~~~~~~~ ../../include/QtCore/../../src/corelib/thread/qbasicatomic.h: In = instantiation of 'class QBasicAtomicInteger': ../../include/QtCore/../../src/corelib/thread/qatomic.h:55:7: required = from 'class QAtomicInteger' thread/qatomic.cpp:1634:1: required from here ../../include/QtCore/../../src/corelib/global/qglobal.h:121:63: error: = static assertion failed: template parameter is an integral of a size not = supported on this platform # define Q_STATIC_ASSERT_X(Condition, Message) = static_assert(bool(Condition), Message) = ^~~~~~~~~~~~~~~ ../../include/QtCore/../../src/corelib/thread/qbasicatomic.h:97:5: note: = in expansion of macro 'Q_STATIC_ASSERT_X' Q_STATIC_ASSERT_X(QAtomicOpsSupport::IsSupported, = "template parameter is an integral of a size not supported on this = platform"); ^~~~~~~~~~~~~~~~~ *** [.obj/qatomic.o] Error code 1 =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) From owner-freebsd-ppc@freebsd.org Wed Mar 6 09:59:29 2019 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id ECC65150A843 for ; Wed, 6 Mar 2019 09:59:28 +0000 (UTC) (envelope-from carlavilla@mailbox.org) Received: from mx1.mailbox.org (mx1.mailbox.org [IPv6:2001:67c:2050:104:0:1:25:1]) (using TLSv1.2 with cipher ECDHE-RSA-CHACHA20-POLY1305 (256/256 bits)) (Client CN "*.mailbox.org", Issuer "SwissSign Server Silver CA 2014 - G22" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 3A68D858DD for ; Wed, 6 Mar 2019 09:59:28 +0000 (UTC) (envelope-from carlavilla@mailbox.org) Received: from smtp1.mailbox.org (smtp1.mailbox.org [IPv6:2001:67c:2050:105:465:1:1:0]) (using TLSv1.2 with cipher ECDHE-RSA-CHACHA20-POLY1305 (256/256 bits)) (No client certificate requested) by mx1.mailbox.org (Postfix) with ESMTPS id 3B1F14C055 for ; Wed, 6 Mar 2019 10:59:25 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=mailbox.org; h= content-transfer-encoding:content-type:content-type:mime-version :subject:subject:message-id:from:from:date:date:received; s= mail20150812; t=1551866362; bh=H2JzV/2L0UhHZ/tlJQntLqSfGQmkRThcW FyQa5IoZm8=; b=dDAbCOG2UuWLKVefNxHrToDazr9/VAPro1RnYcbEtp5+PoOTy Of1jUirDbuyQWVMy1PFFkoqqK9Vb9Jk0l3nFTVdHnUQDj2G26p5YTjpKaCo2XxGI lDQLh3Bxwmux01TXcU0eDDkdOifrBHgaHWAaHIP+E36pkipKgm5+7pt3wsMPMSmc az8UsQIwrTySlTS1hDmQqIBKeKHtbcDL/U4bjP5ptYrniEK0R6oBR+BOyiuGpbfn HxhoUipAHJoBYiDT3ZgMDziwfRVcUR71XQSFDdXSxKFu8q5aLb43/BgLm8wnm1ZR azKKhogvicPTTc1PVETO4lw+5JZOPiv8jYxLw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mailbox.org; s=mail20150812; t=1551866363; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references; bh=H2JzV/2L0UhHZ/tlJQntLqSfGQmkRThcWFyQa5IoZm8=; b=FdQ+coLdGUseBVAhzJinqVQWCwEzJJgsOXTZLqYRfr3M5btcp8oL/356jW3KzfKqPaEQZr sy+xORoUft7Q1jAXchV3gXuA6xD6ykkNIgRyxY3Qpz4b25K3ZbWAnZAoJdlwERwcq6QUeU EevjkUd7bfGISJfbD32xTL/K5sgyT3B8JyAmf/eaot574M645CznyMjHL7NC/am4vCo+98 IlSDGGjfxyeoAtZuWDsaMXDn22rZ6no79LlrS0b2m7DPiXTU3u1xuKXh2HJTKpmLJOeKAe p6CbJleSmf4Y8p7e/BbbF03spUoTtSANBQKeRfM2Auo+wp0jkSWapsGVrCVgsw== X-Virus-Scanned: amavisd-new at heinlein-support.de Received: from smtp1.mailbox.org ([80.241.60.240]) by gerste.heinlein-support.de (gerste.heinlein-support.de [91.198.250.173]) (amavisd-new, port 10030) with ESMTP id kCE3mVYtv_3o for ; Wed, 6 Mar 2019 10:59:22 +0100 (CET) Date: Wed, 6 Mar 2019 10:59:22 +0100 (CET) From: Sergio Carlavilla To: freebsd-ppc@freebsd.org Message-ID: <966736134.9036.1551866362184@office.mailbox.org> Subject: Mac G5 and XServe G5 support MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Priority: 3 Importance: Medium X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Mar 2019 09:59:29 -0000 Hi, I want to know how is the state of FreeBSD in Mac G5 and XServe G5. Support all the hardware? I checked the wiki page https://wiki.freebsd.org/powerpc=EF=BB=BF but I don= 't finded anything specific to Mac G5 and XServe G5 hardware support. Bye. From owner-freebsd-ppc@freebsd.org Wed Mar 6 17:20:15 2019 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id EF9CF1520067; Wed, 6 Mar 2019 17:20:14 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4883494D6E; Wed, 6 Mar 2019 17:20:14 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id x26HK4Km092433 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 6 Mar 2019 19:20:07 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua x26HK4Km092433 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id x26HK3r1092419; Wed, 6 Mar 2019 19:20:03 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 6 Mar 2019 19:20:03 +0200 From: Konstantin Belousov To: Bruce Evans Cc: Mark Millard , freebsd-hackers Hackers , FreeBSD PowerPC ML Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed] Message-ID: <20190306172003.GD2492@kib.kiev.ua> References: <20190302105140.GC68879@kib.kiev.ua> <20190302225513.W3408@besplex.bde.org> <20190302142521.GE68879@kib.kiev.ua> <20190303041441.V4781@besplex.bde.org> <20190303111931.GI68879@kib.kiev.ua> <20190303223100.B3572@besplex.bde.org> <20190303161635.GJ68879@kib.kiev.ua> <20190304043416.V5640@besplex.bde.org> <20190304114150.GM68879@kib.kiev.ua> <20190305031010.I4610@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190305031010.I4610@besplex.bde.org> User-Agent: Mutt/1.11.3 (2019-02-01) X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM, NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on tom.home X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Mar 2019 17:20:15 -0000 On Tue, Mar 05, 2019 at 05:17:14AM +1100, Bruce Evans wrote: > On Mon, 4 Mar 2019, Konstantin Belousov wrote: > > > On Mon, Mar 04, 2019 at 05:29:48AM +1100, Bruce Evans wrote: > >> On Sun, 3 Mar 2019, Konstantin Belousov wrote: > >> > >>> On Mon, Mar 04, 2019 at 12:32:12AM +1100, Bruce Evans wrote: > >>>> On Sun, 3 Mar 2019, Konstantin Belousov wrote: > >>>> > >>>>> On Sun, Mar 03, 2019 at 04:43:20AM +1100, Bruce Evans wrote: > >>>>>> On Sat, 2 Mar 2019, Konstantin Belousov wrote: > >>>>>> > >>>>>>> On Sun, Mar 03, 2019 at 12:03:18AM +1100, Bruce Evans wrote: > >>>>>>>> On Sat, 2 Mar 2019, Konstantin Belousov wrote: > >>> * ... > >>>> However, shrd in rdtsc-low (tsc_get_timecount_low()) does a slow combining > >>>> step. i386 used to be faster here -- the first masking step of discarding > >>>> %edx doesn't take any code. amd64 has to mask out the top bits in %rax. > >>>> Now for the tsc-low pessimization, i386 has to do a slow shrd, and amd64 > >>>> has to do a not so slow shr. > >>> i386 cannot discard %edx after RDTSC since some bits from %edx come into > >>> the timecounter value. > >> > >> These bits are part of the tsc-low pessimization. The shift count should > >> always be 1, giving a TSC frequency of > INT32_MAX (usually) and > UINT32_MAX > >> sometimes. > >> > >> When tsc-low was new, the shift count was often larger (as much as 8), > >> and it is still changeable by a read-only tunable, but now it is 1 in > >> almost all cases. The code only limits the timecounter frequency > >> to UINT_MAX, except the tunable defaults to 1 so average CPUs running > >> at nearly 4 GHz are usually limited to about 2 GHz. The comment about > >> this UINT_MAX doesn't match the code. The comment says int, but the > >> code says UINT. > >> > >> All that a shoft count of 1 does is waste time to lose 1 bit of accuracy. > >> This much accuracy is noise for most purposes. > >> > >> The tunable is fairly undocumented. Its description is "Shift to apply > >> for the maximum TSC frequency". Of course, it has no effect on the TSC > >> frequency. It only affects the TSC timecounter frequency. > > I suspect that the shift of 1 (at least) hides cross-socket inaccuracy. > > Otherwise, I think, some multi-socket machines would start showing the > > detectable backward-counting bintime(). At the frequencies at 4GHz and > > above (Intel has 5Ghz part numbers) I do not think that stability of > > 100MHz crystall and on-board traces is enough to avoid that. > > I think it is just a kludge that reduced the problem before it was fixed > properly using fences. > > Cross-socket latency is over 100 cycles according to jhb's tscskew > benchmark: on Haswell 4x2: > > CPU | TSC skew (min/avg/max/stddev) > ----+------------------------------ > 0 | 0 0 0 0.000 > 1 | 24 49 84 14.353 > 2 | 164 243 308 47.811 > 3 | 164 238 312 47.242 > 4 | 168 242 332 49.593 > 5 | 168 243 324 48.722 > 6 | 172 242 320 52.596 > 7 | 172 240 316 53.014 > > freefall is similar. Latency is apparently measured relative to CPU 0. > It is much lower to CPU 1 since that is on the same core. > > I played with this program a lot 3 and a half years ago, but forgot > mist of what I learned :-(. I tried different fencing in it. This > seems to make little difference when the program is rerun. With the > default TESTS = 1024, the min skew sometimes goes negative on freefall, > but with TESTS = 1024000 that doesn't happen. This is the opposite > of what I would expect. freefall has load average about 1. > > Removing the only fencing in it reduces average latency by 10-20 cycles > and minimum latency by over 100 cycles, except on freefall it is > reduced from 33 to 6. On Haswell it is 24 with fencing and I didn't > test it with no fencing. > > I think tscskew doesn't really measure tsc skew. What it measures is > the time taken for a locking protocol, using the TSCs on different > CPUs to make the start and end timestamps. If the TSCs have a lot of > skew or jitter, then this will show up indirectly as inconsistent and > possibly negative differences. > > A shift of just 1 can't hide latencies of hundreds of cycles on single- > socket machines. Even a shift of 8 only works sometimes, by reducing > the chance of observing the TSC going backwards by a factor of 256. > E.g., assume for simplicity that all instructions and IPCs take 0-1 > cycles, and that unfenced rdtsc's differ by at most +-5 cycles (with > the 11 values between -5 and 5 uniformly distributed. Then with a > shift of 0 and no fences, a CPU that updates the timehands is ahead of > another CPU that spins reading the timehands about 5/11 of the time. > With a shift of 8, the CPUs are close enough when the first one reads > at least 5 above and at least 5 below a 256-boundary. The chance of > seeing a negative difference is reduced by at least a factor of 10/256. > > > I suspect that the shift of 1 (at least) hides cross-socket inaccuracy. > > Otherwise, I think, some multi-socket machines would start showing the > > detectable backward-counting bintime(). At the frequencies at 4GHz and > > above (Intel has 5Ghz part numbers) I do not think that stability of > > 100MHz crystall and on-board traces is enough to avoid that. > > Why would losing just 1 bit fix that? > > Fences for rdtsc of course only serialize it for the CPU that runs it. > The locking (ordering) protocol (for the generation count) orders the > CPUs too. It takes longer than we would like, much more than the 1- > cycle error that might be hidden by ignoring the low bit. Surely the > ordering protocol must work across sockets? It then gives ordering of > rdtsc's. > > TSC-low was added in 2011. That was long before the ordering was fixed. > You added fences in 2012 and memory ordering for the generation count in > 2016. Fences slowed everything down by 10-20+ cycles and probably hide > bugs in the memory ordering better than TSC-low. Memory ordering plus > fences slow down the cross-core case by more than 100 cycles according > to tscskew. That is enough to hide large hardware bugs. > > > We can try to set the tsc-low shift count to 0 (but keep lfence) and see > > what is going on in HEAD, but I am afraid that the HEAD users population > > is not representative enough to catch the issue with the certainity. > > More, it is unclear to me how to diagnose the cause, e.g. I would expect > > the sleeps to hang on timeouts, as was reported from the very beginning > > of this thread. How would we root-cause it ? > > Negative time differences cause lots of overflows so break the timecounter. > The fix under discussion actually gives larger overflows in the positive > direction. E.g., a delta of -1 first overflows to 0xffffffff. The fix > prevents overflow on multiplication by that. When the timecounter > frequency is small, say 1 MHz, 0xffffffff means 4294 seconds, so the > timecounter advances by that. > > >>> amd64 cannot either, but amd64 does not need to mask out top bits in %rax, > >>> since the whole shrdl calculation occurs in 32bit registers, and the result > >>> is in %rax where top word is cleared by shrdl instruction automatically. > >>> But the clearing is not required since result is unsigned int anyway. > >>> > >>> Dissassemble of tsc_get_timecount_low() is very clear: > >>> 0xffffffff806767e4 <+4>: mov 0x30(%rdi),%ecx > >>> 0xffffffff806767e7 <+7>: rdtsc > >>> 0xffffffff806767e9 <+9>: shrd %cl,%edx,%eax > >>> ... > >>> 0xffffffff806767ed <+13>: retq > >>> (I removed frame manipulations). > > I checked that all compilers still produce horrible code for the better > source code 'return (rdtsc() << (intptr_t)tc->tc_priv);'. 64-bit shifts > are apparently pessimal for compatibility. The above is written mostly > in asm to avoid 2-5 extra instructions. > > >>>> ... > >>>> Similarly in bintime(). > >>> I merged two functions, finally. Having to copy the same code is too > >>> annoying for this change. > > I strongly disklike the merge. > > >>> So I verified that: > >>> - there is no 64bit multiplication in the generated code, for i386 both > >>> for clang 7.0 and gcc 8.3; > >>> - that everything is inlined, the only call from bintime/binuptime is > >>> the indirect call to get the timecounter value. > >> > >> I will have to fix it for compilers that I use. > > Ok, I will add __inline. > > That will make it fast enough, but still hard to read. > > >>> + *bt = *bts; > >>> + scale = th->th_scale; > >>> + delta = tc_delta(th); > >>> +#ifdef _LP64 > >>> + if (__predict_false(th->th_large_delta <= delta)) { > >>> + /* Avoid overflow for scale * delta. */ > >>> + bintime_helper(bt, scale, delta); > >>> + bintime_addx(bt, (scale & 0xffffffff) * delta); > >>> + } else { > >>> + bintime_addx(bt, scale * delta); > >>> + } > >>> +#else > >>> + /* > >>> + * Use bintime_helper() unconditionally, since the fast > >>> + * path in the above method is not so fast here, since > >>> + * the 64 x 32 -> 64 bit multiplication is usually not > >>> + * available in hardware and emulating it using 2 > >>> + * 32 x 32 -> 64 bit multiplications uses code much > >>> + * like that in bintime_helper(). > >>> + */ > >>> + bintime_helper(bt, scale, delta); > >>> + bintime_addx(bt, (uint64_t)(uint32_t)scale * delta); > >>> +#endif > >> > >> Check that this method is really better. Without this, the complicated > >> part is about half as large and duplicating it is smaller than this > >> version. > > Better in what sence ? I am fine with the C code, and asm code looks > > good. > > Better in terms of actually running significantly faster. I fear the > 32-bit method is actually slightly slower for the fast path. > > >>> - do { > >>> - th = timehands; > >>> - gen = atomic_load_acq_int(&th->th_generation); > >>> - *bt = th->th_bintime; > >>> - bintime_addx(bt, th->th_scale * tc_delta(th)); > >>> - atomic_thread_fence_acq(); > >>> - } while (gen == 0 || gen != th->th_generation); > >> > >> Duplicating this loop is much better than obfuscating it using inline > >> functions. This loop was almost duplicated (except for the delta > >> calculation) in no less than 17 functions in kern_tc.c (9 tc ones and > >> 8 fflock ones). Now it is only duplicated 16 times. > > How did you counted the 16 ? I can see only 4 instances in the unpatched > > kern_tc.c, and 3 in patched, but it is 3 and not 1 only because I do not > > touch ffclock until the patch is finalized. After that, it would be > > 1 instance for kernel and 1 for userspace. > > Grep for the end condition in this loop. There are actually 20 of these. > I'm counting the loops and not the previously-simple scaling operation in > it. The scaling is indeed only done for 4 cases. I prefer the 20 > duplications (except I only want about 6 of the functions). Duplication > works even better for only 4 cases. Ok, I merged these as well. Now there are only four loops left in kernel. I do not think that merging them is beneficial, since they have sufficiently different bodies. I disagree with you characterization of it as obfuscation, IMO it improves the maintainability of the code by reducing number of places which need careful inspection of the lock-less algorithm. > > This should be written as a function call to 1 new function to replace > the line with the overflowing multiplication. The line is always the > same, so the new function call can look like bintime_xxx(bt, th). Again, please provide at least of a pseudocode of your preference. The current patch becomes to large already, I want to test/commit what I already have, and I will need to split it for the commit. diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c index 2656fb4d22f..7114a0e5219 100644 --- a/sys/kern/kern_tc.c +++ b/sys/kern/kern_tc.c @@ -72,6 +72,7 @@ struct timehands { struct timecounter *th_counter; int64_t th_adjustment; uint64_t th_scale; + uint64_t th_large_delta; u_int th_offset_count; struct bintime th_offset; struct bintime th_bintime; @@ -200,22 +201,77 @@ tc_delta(struct timehands *th) * the comment in for a description of these 12 functions. */ -#ifdef FFCLOCK -void -fbclock_binuptime(struct bintime *bt) +static __inline void +bintime_helper(struct bintime *bt, uint64_t scale, u_int delta) +{ + uint64_t x; + + x = (scale >> 32) * delta; + bt->sec += x >> 32; + bintime_addx(bt, x << 32); +} + +static __inline void +binnouptime(struct bintime *bt, u_int off) { struct timehands *th; - unsigned int gen; + struct bintime *bts; + uint64_t scale; + u_int delta, gen; do { th = timehands; gen = atomic_load_acq_int(&th->th_generation); - *bt = th->th_offset; - bintime_addx(bt, th->th_scale * tc_delta(th)); + bts = (struct bintime *)(vm_offset_t)th + off; + *bt = *bts; + scale = th->th_scale; + delta = tc_delta(th); +#ifdef _LP64 + if (__predict_false(th->th_large_delta <= delta)) { + /* Avoid overflow for scale * delta. */ + bintime_helper(bt, scale, delta); + bintime_addx(bt, (scale & 0xffffffff) * delta); + } else { + bintime_addx(bt, scale * delta); + } +#else + /* + * Use bintime_helper() unconditionally, since the fast + * path in the above method is not so fast here, since + * the 64 x 32 -> 64 bit multiplication is usually not + * available in hardware and emulating it using 2 + * 32 x 32 -> 64 bit multiplications uses code much + * like that in bintime_helper(). + */ + bintime_helper(bt, scale, delta); + bintime_addx(bt, (uint64_t)(uint32_t)scale * delta); +#endif atomic_thread_fence_acq(); } while (gen == 0 || gen != th->th_generation); } +static __inline void +getbinnouptime(void *out, size_t out_size, u_int off) +{ + struct timehands *th; + u_int gen; + + do { + th = timehands; + gen = atomic_load_acq_int(&th->th_generation); + memcpy(out, (char *)th + off, out_size); + atomic_thread_fence_acq(); + } while (gen == 0 || gen != th->th_generation); +} + +#ifdef FFCLOCK +void +fbclock_binuptime(struct bintime *bt) +{ + + binnouptime(bt, __offsetof(struct timehands, th_offset)); +} + void fbclock_nanouptime(struct timespec *tsp) { @@ -237,16 +293,8 @@ fbclock_microuptime(struct timeval *tvp) void fbclock_bintime(struct bintime *bt) { - struct timehands *th; - unsigned int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *bt = th->th_bintime; - bintime_addx(bt, th->th_scale * tc_delta(th)); - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + binnouptime(bt, __offsetof(struct timehands, th_bintime)); } void @@ -270,100 +318,61 @@ fbclock_microtime(struct timeval *tvp) void fbclock_getbinuptime(struct bintime *bt) { - struct timehands *th; - unsigned int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *bt = th->th_offset; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getbinnouptime(bt, sizeof(*bt), __offsetof(struct timehands, + th_offset)); } void fbclock_getnanouptime(struct timespec *tsp) { - struct timehands *th; - unsigned int gen; + struct bintime bt; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - bintime2timespec(&th->th_offset, tsp); - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getbinnouptime(&bt, sizeof(bt), __offsetof(struct timehands, + th_offset)); + bintime2timespec(&bt, tsp); } void fbclock_getmicrouptime(struct timeval *tvp) { - struct timehands *th; - unsigned int gen; + struct bintime bt; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - bintime2timeval(&th->th_offset, tvp); - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getbinnouptime(&bt, sizeof(bt), __offsetof(struct timehands, + th_offset)); + bintime2timeval(&bt, tvp); } void fbclock_getbintime(struct bintime *bt) { - struct timehands *th; - unsigned int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *bt = th->th_bintime; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getbinnouptime(bt, sizeof(*bt), __offsetof(struct timehands, + th_bintime)); } void fbclock_getnanotime(struct timespec *tsp) { - struct timehands *th; - unsigned int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *tsp = th->th_nanotime; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getbinnouptime(tsp, sizeof(*tsp), __offsetof(struct timehands, + th_nanotime)); } void fbclock_getmicrotime(struct timeval *tvp) { - struct timehands *th; - unsigned int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *tvp = th->th_microtime; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getbinnouptime(tvp, sizeof(*tvp), __offsetof(struct timehands, + th_microtime)); } #else /* !FFCLOCK */ + void binuptime(struct bintime *bt) { - struct timehands *th; - u_int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *bt = th->th_offset; - bintime_addx(bt, th->th_scale * tc_delta(th)); - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + binnouptime(bt, __offsetof(struct timehands, th_offset)); } void @@ -387,16 +396,8 @@ microuptime(struct timeval *tvp) void bintime(struct bintime *bt) { - struct timehands *th; - u_int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *bt = th->th_bintime; - bintime_addx(bt, th->th_scale * tc_delta(th)); - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + binnouptime(bt, __offsetof(struct timehands, th_bintime)); } void @@ -420,85 +421,53 @@ microtime(struct timeval *tvp) void getbinuptime(struct bintime *bt) { - struct timehands *th; - u_int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *bt = th->th_offset; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getbinnouptime(bt, sizeof(*bt), __offsetof(struct timehands, + th_offset)); } void getnanouptime(struct timespec *tsp) { - struct timehands *th; - u_int gen; + struct bintime bt; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - bintime2timespec(&th->th_offset, tsp); - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getbinnouptime(&bt, sizeof(bt), __offsetof(struct timehands, + th_offset)); + bintime2timespec(&bt, tsp); } void getmicrouptime(struct timeval *tvp) { - struct timehands *th; - u_int gen; + struct bintime bt; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - bintime2timeval(&th->th_offset, tvp); - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getbinnouptime(&bt, sizeof(bt), __offsetof(struct timehands, + th_offset)); + bintime2timeval(&bt, tvp); } void getbintime(struct bintime *bt) { - struct timehands *th; - u_int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *bt = th->th_bintime; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getbinnouptime(bt, sizeof(*bt), __offsetof(struct timehands, + th_bintime)); } void getnanotime(struct timespec *tsp) { - struct timehands *th; - u_int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *tsp = th->th_nanotime; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getbinnouptime(tsp, sizeof(*tsp), __offsetof(struct timehands, + th_nanotime)); } void getmicrotime(struct timeval *tvp) { - struct timehands *th; - u_int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *tvp = th->th_microtime; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getbinnouptime(tvp, sizeof(*tvp), __offsetof(struct timehands, + th_microtime)); } #endif /* FFCLOCK */ @@ -514,15 +483,9 @@ getboottime(struct timeval *boottime) void getboottimebin(struct bintime *boottimebin) { - struct timehands *th; - u_int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *boottimebin = th->th_boottime; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getbinnouptime(boottimebin, sizeof(*boottimebin), + __offsetof(struct timehands, th_boottime)); } #ifdef FFCLOCK @@ -1038,15 +1001,9 @@ getmicrotime(struct timeval *tvp) void dtrace_getnanotime(struct timespec *tsp) { - struct timehands *th; - u_int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *tsp = th->th_nanotime; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getbinnouptime(tsp, sizeof(*tsp), __offsetof(struct timehands, + th_nanotime)); } /* @@ -1464,6 +1421,7 @@ tc_windup(struct bintime *new_boottimebin) scale += (th->th_adjustment / 1024) * 2199; scale /= th->th_counter->tc_frequency; th->th_scale = scale * 2; + th->th_large_delta = ((uint64_t)1 << 63) / scale; /* * Now that the struct timehands is again consistent, set the new From owner-freebsd-ppc@freebsd.org Wed Mar 6 20:39:20 2019 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 537BD152660F for ; Wed, 6 Mar 2019 20:39:20 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic307-3.consmr.mail.bf2.yahoo.com (sonic307-3.consmr.mail.bf2.yahoo.com [74.6.134.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id AF7EF6F332 for ; Wed, 6 Mar 2019 20:39:18 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: b2nPwJEVM1naM5CDcx49O5_b4.f6q5iW36iQLGx4VfyRU0wzkt7V_6RmE0ODrc0 m1WiN2i1bryBIqWN_zfdAXRRF7lo6E_XTAxO936Sv_EN4Zaj.SLSEjB8xRjsJfuXV.MqdwLmd3Gq udmEndhxmJ3r4k.RhU2I8DmFsR0bXsHXPzlcb6C96pu4dOv7j_bpGLeOgJyBM4n1LYkfSSd9NdGT m0A70ErM2L8RQ8JqLS7udIGtJQ2J6W4D7Ik96hl3mA_5.4DwpkHA4mZriXMK3kPqlcacPOXqQDym 7Gf3v0eJkdXEJ93tV4avATvZdaaFb87z.fygyLwsuPnF.97tQzFJWPiLwmfZ5AsEogUEG.3oJmM9 pt0RwIp6gDAyjRGxtK5d3dp797lzllcHC1KRm_gHtFg_OaMxc8NNToO3o4t0CWcFEk3XRJ0ZAszl 2TGqIsZ9qSuD8qbOeLNLM854Hstt_..WnJRpih6MReMLe66kytP0dwvCJG93qdjjbVXhpIDNks2Q gTk0N_EAPU5nNxgmCNuw0QiQr9nbE.b6hoxvRZeBb0TqoX57ioZbmS00gMhkeGd33xNswsO4zN.F ppgR_aFEzIjbzLGS5OOmZLg_2lTHf6MD.Ynp.qNRrezJVuq8zxF3B_Ffod72YDgubldbd7YOxjLd vpP0yukJG3_okyTgwuWGggIMrcE9RXet3BQ68fuXkcDobPiaemn8TKtVNUeEmuKeO7lRwrWBpIRe RZXZOCWOeii9JyC5kDXhpT6e3PFRnVntA2WsC0QkZovQkfPoWcAtf0aQgGc1Ewn7laQxUr9bSAh2 jKwpM74rfHTK48ki6c_L56wUZIexs9OdVadqqt7O0o7I1bjWngIIAyjQKtmzWmrA2GugNuozRsVj vR1J3EDu1Ut0ibYRqtcnNL5I38YlzGLsEN68KDTvDPQUDxg7hCgHCMy1y63GQh3YHixRPgmOW9j0 I25E_jw1nAgn_eFjrt4Csv2Vs4xOw175KD6iGRHS8_bGLOjEo60g8X0dU98magBroprF5cZprgVe STpwPodlIoo2JfpmH_9tpp1q60n5KtkSBPMyYhBe00kJ7D6GWZpcwYDoBTtfRHYreAg-- Received: from sonic.gate.mail.ne1.yahoo.com by sonic307.consmr.mail.bf2.yahoo.com with HTTP; Wed, 6 Mar 2019 20:39:11 +0000 Received: from c-67-170-167-181.hsd1.or.comcast.net (EHLO [192.168.1.115]) ([67.170.167.181]) by smtp428.mail.bf1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA ID 830708bc055ebb3c45b5947a406df10d; Wed, 06 Mar 2019 20:39:09 +0000 (UTC) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\)) Subject: powerpc64 head -r344825: system-clang (8.0.0) asserts compiling mesa-dri-18.3.2_2's glsl/ir_clone.cpp: "Target supports vector op, but scalar requires expansion?" Message-Id: <5D54B918-22AE-4395-82D2-569678C01835@yahoo.com> Date: Wed, 6 Mar 2019 12:39:06 -0800 Cc: FreeBSD PowerPC ML , ports-list freebsd To: Dimitry Andric , FreeBSD Toolchain X-Mailer: Apple Mail (2.3445.102.3) X-Rspamd-Queue-Id: AF7EF6F332 X-Spamd-Bar: ++ X-Spamd-Result: default: False [2.79 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; MV_CASE(0.50)[]; FREEMAIL_FROM(0.00)[yahoo.com]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; MX_GOOD(-0.01)[cached: mta6.am0.yahoodns.net]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_LAST(0.00)[]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:26101, ipnet:74.6.128.0/21, country:US]; MID_RHS_MATCH_FROM(0.00)[]; SUBJECT_HAS_QUESTION(0.00)[]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; NEURAL_SPAM_SHORT(0.77)[0.768,0]; MIME_GOOD(-0.10)[text/plain]; DWL_DNSWL_NONE(0.00)[yahoo.com.dwl.dnswl.org : 127.0.5.0]; NEURAL_SPAM_MEDIUM(0.80)[0.800,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_SPAM_LONG(0.53)[0.531,0]; RCVD_IN_DNSWL_NONE(0.00)[42.134.6.74.list.dnswl.org : 127.0.5.0]; IP_SCORE(1.20)[ip: (3.45), ipnet: 74.6.128.0/21(1.46), asn: 26101(1.17), country: US(-0.07)]; RWL_MAILSPIKE_POSSIBLE(0.00)[42.134.6.74.rep.mailspike.net : 127.0.0.17] X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Mar 2019 20:39:20 -0000 The below is from a ports-mgmt/poudriere-devel run under FreeBSD head -r344825 on an old PowerMac G5 (2 sockets, 2 cores each, powerpc64). The /usr/ports is from head -r494751 . buildworld buildkernel was via devel/powerpc64-xtoolchain-gcc materials and system-clang (8.0.0) was built (and installed as cc/c++) as part of that. /usr/ports/base/binutils was used to supply the system binutils, including ld. (Running the PowerPC G5 for this context does require some hacks in /usr/src/ currently.) Being a poudriere-devel run, the /tmp/nir_constant_expressions-be5a21.* are not available (not recorded in the tar of the failure). (Too bad there is a mismatch betting poudriere's capture and where such files are placed.) But I do have a gdb based backtrace from the: work/mesa-18.3.2/src/compiler/cc.15701.core The assert was the one in: case ISD::FTRUNC: { // We're going to widen this vector op to a legal type by padding = with undef // elements. If the wide vector op is eventually going to be = expanded to // scalar libcalls, then unroll into scalar ops now to avoid = unnecessary // libcalls on the undef elements. We are assuming that if the = scalar op // requires expanding, then the vector op needs expanding too. EVT VT =3D N->getValueType(0); if (TLI.isOperationExpand(N->getOpcode(), VT.getScalarType())) { EVT WideVecVT =3D TLI.getTypeToTransformTo(*DAG.getContext(), VT); assert(!TLI.isOperationLegalOrCustom(N->getOpcode(), WideVecVT) && "Target supports vector op, but scalar requires = expansion?"); Res =3D DAG.UnrollVectorOp(N, WideVecVT.getVectorNumElements()); break; } (gdb) info threads Id Target Id Frame=20 * 1 LWP 100119 0x00000000137251e8 in .__sys_thr_kill () at = thr_kill.S:3 (gdb) bt #0 0x00000000137251e8 in .__sys_thr_kill () at thr_kill.S:3 #1 0x00000000137247bc in __raise (s=3D) at = /usr/src/lib/libc/gen/raise.c:52 #2 0x00000000136e6410 in abort () at = /usr/src/lib/libc/stdlib/abort.c:79 #3 0x0000000013712fd8 in __assert (func=3D, = file=3D, line=3D, failedexpr=3D) at /usr/src/lib/libc/gen/assert.c:51 #4 0x000000001351f20c in llvm::DAGTypeLegalizer::WidenVectorResult () = at = /usr/src/contrib/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp:253= 1 #5 0x0000000012f1d7d8 in llvm::DAGTypeLegalizer::run () at = /usr/src/contrib/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:281 #6 0x0000000012f1eab4 in llvm::SelectionDAG::LegalizeTypes () at = /usr/src/contrib/llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp:1115 #7 0x0000000012db9e50 in llvm::SelectionDAGISel::CodeGenAndEmitDAG () = at = /usr/src/contrib/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:776 #8 0x0000000012dbf934 in llvm::SelectionDAGISel::SelectAllBasicBlocks = () at = /usr/src/contrib/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:1784 #9 0x0000000012dc2668 in llvm::SelectionDAGISel::runOnMachineFunction = () at = /usr/src/contrib/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:471 #10 0x00000000120edfc8 in runOnMachineFunction () at = /usr/src/contrib/llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp:155 #11 0x00000000130514b4 in llvm::MachineFunctionPass::runOnFunction () at = /usr/src/contrib/llvm/lib/CodeGen/MachineFunctionPass.cpp:74 #12 0x000000001296d640 in llvm::FPPassManager::runOnFunction () at = /usr/src/contrib/llvm/lib/IR/LegacyPassManager.cpp:1644 #13 0x000000001296d908 in llvm::FPPassManager::runOnModule () at = /usr/src/contrib/llvm/lib/IR/LegacyPassManager.cpp:1679 #14 0x000000001296e780 in runOnModule () at = /usr/src/contrib/llvm/lib/IR/LegacyPassManager.cpp:1744 #15 llvm::legacy::PassManagerImpl::run () at = /usr/src/contrib/llvm/lib/IR/LegacyPassManager.cpp:1857 #16 0x0000000010bbac7c in EmitAssembly () at = /usr/src/contrib/llvm/tools/clang/lib/CodeGen/BackendUtil.cpp:882 #17 0x0000000010bbc9a8 in clang::EmitBackendOutput () at = /usr/src/contrib/llvm/tools/clang/lib/CodeGen/BackendUtil.cpp:1318 #18 0x00000000103cd0b0 in clang::BackendConsumer::HandleTranslationUnit = () at = /usr/src/contrib/llvm/tools/clang/lib/CodeGen/CodeGenAction.cpp:295 #19 0x0000000010821d40 in clang::ParseAST () at = /usr/src/contrib/llvm/tools/clang/lib/Parse/ParseAST.cpp:170 #20 0x000000001080d528 in clang::ASTFrontendAction::ExecuteAction () at = /usr/src/contrib/llvm/tools/clang/lib/Frontend/FrontendAction.cpp:1037 #21 0x00000000103cc108 in clang::CodeGenAction::ExecuteAction () at = /usr/src/contrib/llvm/tools/clang/lib/CodeGen/CodeGenAction.cpp:1048 #22 0x0000000010811eb0 in clang::FrontendAction::Execute () at = /usr/src/contrib/llvm/tools/clang/lib/Frontend/FrontendAction.cpp:935 #23 0x00000000111b1960 in clang::CompilerInstance::ExecuteAction () at = /usr/src/contrib/llvm/tools/clang/lib/Frontend/CompilerInstance.cpp:955 #24 0x00000000103b602c in clang::ExecuteCompilerInvocation () at = /usr/src/contrib/llvm/tools/clang/lib/FrontendTool/ExecuteCompilerInvocati= on.cpp:268 #25 0x00000000103a47c8 in cc1_main () at = /usr/src/contrib/llvm/tools/clang/tools/driver/cc1_main.cpp:219 #26 0x0000000010346908 in ExecuteCC1Tool () at = /usr/src/contrib/llvm/tools/clang/tools/driver/driver.cpp:310 #27 main () at = /usr/src/contrib/llvm/tools/clang/tools/driver/driver.cpp:382 For reference: libtool: compile: c++ -DPACKAGE_NAME=3D\"Mesa\" = -DPACKAGE_TARNAME=3D\"mesa\" -DPACKAGE_VERSION=3D\"18.3.2\" = "-DPACKAGE_STRING=3D\"Mesa 18.3.2\"" = "-DPACKAGE_BUGREPORT=3D\"https://bugs.freedesktop.org/enter_bug.cgi?produc= t=3DMesa\"" -DPACKAGE_URL=3D\"\" -DPACKAGE=3D\"mesa\" = -DVERSION=3D\"18.3.2\" -DSTDC_HEADERS=3D1 -DHAVE_SYS_TYPES_H=3D1 = -DHAVE_SYS_STAT_H=3D1 -DHAVE_STDLIB_H=3D1 -DHAVE_STRING_H=3D1 = -DHAVE_MEMORY_H=3D1 -DHAVE_STRINGS_H=3D1 -DHAVE_INTTYPES_H=3D1 = -DHAVE_STDINT_H=3D1 -DHAVE_UNISTD_H=3D1 -DHAVE_DLFCN_H=3D1 = -DLT_OBJDIR=3D\".libs/\" -DYYTEXT_POINTER=3D1 -DHAVE___BUILTIN_BSWAP32=3D1= -DHAVE___BUILTIN_BSWAP64=3D1 -DHAVE___BUILTIN_CLZ=3D1 = -DHAVE___BUILTIN_CLZLL=3D1 -DHAVE___BUILTIN_CTZ=3D1 = -DHAVE___BUILTIN_EXPECT=3D1 -DHAVE___BUILTIN_FFS=3D1 = -DHAVE___BUILTIN_FFSLL=3D1 -DHAVE___BUILTIN_POPCOUNT=3D1 = -DHAVE___BUILTIN_POPCOUNTLL=3D1 -DHAVE___BUILTIN_UNREACHABLE=3D1 = -DHAVE_FUNC_ATTRIBUTE_CONST=3D1 -DHAVE_FUNC_ATTRIBUTE_FLATTEN=3D1 = -DHAVE_FUNC_ATTRIBUTE_FORMAT=3D1 -DHAVE_FUNC_ATTRIBUTE_MALLOC=3D1 = -DHAVE_FUNC_ATTRIBUTE_PACKED=3D1 -DHAVE_FUNC_ATTRIBUTE_PURE=3D1 = -DHAVE_FUNC_ATTRIBUTE_RETURNS_NONNULL=3D1 -DHAVE_FUNC_ATTRIBUTE_UNUSED=3D1= -DHAVE_FUNC_ATTRIBUTE_VISIBILITY=3D1 = -DHAVE_FUNC_ATTRIBUTE_WARN_UNUSED_RESULT=3D1 = -DHAVE_FUNC_ATTRIBUTE_WEAK=3D1 -DHAVE_FUNC_ATTRIBUTE_ALIAS=3D1 = -DHAVE_FUNC_ATTRIBUTE_NORETURN=3D1 -DHAVE_DLADDR=3D1 = -DHAVE_CLOCK_GETTIME=3D1 -DHAVE_CLOCK_NANOSLEEP=3D1 = -DHAVE_PTHREAD_PRIO_INHERIT=3D1 -DHAVE_PTHREAD=3D1 -DHAVE_SYS_UMTX_H=3D1 = -DENABLE_ST_OMX_BELLAGIO=3D0 -DENABLE_ST_OMX_TIZONIA=3D0 -I. = -I../../include -I../../src -I../../src/mapi -I../../src/mesa/ = -I../../src/compiler/glsl -I../../src/compiler/glsl = -I../../src/compiler/glsl/glcpp -I../../src/compiler/nir = -I../../src/compiler/nir -I../../src/compiler/spirv = -I../../src/gallium/include -I../../src/gallium/auxiliary = -I../../src/gtest/include -D__STDC_CONSTANT_MACROS = -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -DUSE_GCC_ATOMIC_BUILTINS = -DNDEBUG -DHAVE_XLOCALE_H -DHAVE_SYS_SYSCTL_H -DHAVE_DLFCN_H = -DHAVE_STRTOF -DHAVE_MKOSTEMP -DHAVE_TIMESPEC_GET -DHAVE_STRTOD_L = -DHAVE_DL_ITERATE_PHDR -DHAVE_POSIX_MEMALIGN -DHAVE_ZLIB = -DHAVE_PTHREAD_SETAFFINITY -DHAVE_LINUX_FUTEX_H -DHAVE_LIBDRM = -DGLX_USE_DRM -DGLX_INDIRECT_RENDERING -DGLX_DIRECT_RENDERING = -DHAVE_X11_PLATFORM -DHAVE_SURFACELESS_PLATFORM -DHAVE_DRM_PLATFORM = -DHAVE_WAYLAND_PLATFORM -DWL_HIDE_DEPRECATED -DHAVE_DRI3 = -DHAVE_DRI3_MODIFIERS -DENABLE_SHADER_CACHE -DHAVE_MINCORE = -DHAVE_LLVM=3D0x0600 -DMESA_LLVM_VERSION_PATCH=3D1 -isystem = /usr/local/include -fvisibility=3Dhidden -Werror=3Dpointer-arith = -Werror=3Dvla -O2 -pipe -g -isystem /usr/local/include = -fno-strict-aliasing -isystem /usr/local/include -Wall -fno-math-errno = -fno-trapping-math -Wno-missing-field-initializers -Qunused-arguments = -MT glsl/ir_builder.lo -MD -MP -MF glsl/.deps/ir_builder.Tpo -c = glsl/ir_builder.cpp -fPIC -DPIC -o glsl/.libs/ir_builder.o libtool: compile: c++ -DPACKAGE_NAME=3D\"Mesa\" = -DPACKAGE_TARNAME=3D\"mesa\" -DPACKAGE_VERSION=3D\"18.3.2\" = "-DPACKAGE_STRING=3D\"Mesa 18.3.2\"" = "-DPACKAGE_BUGREPORT=3D\"https://bugs.freedesktop.org/enter_bug.cgi?produc= t=3DMesa\"" -DPACKAGE_URL=3D\"\" -DPACKAGE=3D\"mesa\" = -DVERSION=3D\"18.3.2\" -DSTDC_HEADERS=3D1 -DHAVE_SYS_TYPES_H=3D1 = -DHAVE_SYS_STAT_H=3D1 -DHAVE_STDLIB_H=3D1 -DHAVE_STRING_H=3D1 = -DHAVE_MEMORY_H=3D1 -DHAVE_STRINGS_H=3D1 -DHAVE_INTTYPES_H=3D1 = -DHAVE_STDINT_H=3D1 -DHAVE_UNISTD_H=3D1 -DHAVE_DLFCN_H=3D1 = -DLT_OBJDIR=3D\".libs/\" -DYYTEXT_POINTER=3D1 -DHAVE___BUILTIN_BSWAP32=3D1= -DHAVE___BUILTIN_BSWAP64=3D1 -DHAVE___BUILTIN_CLZ=3D1 = -DHAVE___BUILTIN_CLZLL=3D1 -DHAVE___BUILTIN_CTZ=3D1 = -DHAVE___BUILTIN_EXPECT=3D1 -DHAVE___BUILTIN_FFS=3D1 = -DHAVE___BUILTIN_FFSLL=3D1 -DHAVE___BUILTIN_POPCOUNT=3D1 = -DHAVE___BUILTIN_POPCOUNTLL=3D1 -DHAVE___BUILTIN_UNREACHABLE=3D1 = -DHAVE_FUNC_ATTRIBUTE_CONST=3D1 -DHAVE_FUNC_ATTRIBUTE_FLATTEN=3D1 = -DHAVE_FUNC_ATTRIBUTE_FORMAT=3D1 -DHAVE_FUNC_ATTRIBUTE_MALLOC=3D1 = -DHAVE_FUNC_ATTRIBUTE_PACKED=3D1 -DHAVE_FUNC_ATTRIBUTE_PURE=3D1 = -DHAVE_FUNC_ATTRIBUTE_RETURNS_NONNULL=3D1 -DHAVE_FUNC_ATTRIBUTE_UNUSED=3D1= -DHAVE_FUNC_ATTRIBUTE_VISIBILITY=3D1 = -DHAVE_FUNC_ATTRIBUTE_WARN_UNUSED_RESULT=3D1 = -DHAVE_FUNC_ATTRIBUTE_WEAK=3D1 -DHAVE_FUNC_ATTRIBUTE_ALIAS=3D1 = -DHAVE_FUNC_ATTRIBUTE_NORETURN=3D1 -DHAVE_DLADDR=3D1 = -DHAVE_CLOCK_GETTIME=3D1 -DHAVE_CLOCK_NANOSLEEP=3D1 = -DHAVE_PTHREAD_PRIO_INHERIT=3D1 -DHAVE_PTHREAD=3D1 -DHAVE_SYS_UMTX_H=3D1 = -DENABLE_ST_OMX_BELLAGIO=3D0 -DENABLE_ST_OMX_TIZONIA=3D0 -I. = -I../../include -I../../src -I../../src/mapi -I../../src/mesa/ = -I../../src/compiler/glsl -I../../src/compiler/glsl = -I../../src/compiler/glsl/glcpp -I../../src/compiler/nir = -I../../src/compiler/nir -I../../src/compiler/spirv = -I../../src/gallium/include -I../../src/gallium/auxiliary = -I../../src/gtest/include -D__STDC_CONSTANT_MACROS = -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -DUSE_GCC_ATOMIC_BUILTINS = -DNDEBUG -DHAVE_XLOCALE_H -DHAVE_SYS_SYSCTL_H -DHAVE_DLFCN_H = -DHAVE_STRTOF -DHAVE_MKOSTEMP -DHAVE_TIMESPEC_GET -DHAVE_STRTOD_L = -DHAVE_DL_ITERATE_PHDR -DHAVE_POSIX_MEMALIGN -DHAVE_ZLIB = -DHAVE_PTHREAD_SETAFFINITY -DHAVE_LINUX_FUTEX_H -DHAVE_LIBDRM = -DGLX_USE_DRM -DGLX_INDIRECT_RENDERING -DGLX_DIRECT_RENDERING = -DHAVE_X11_PLATFORM -DHAVE_SURFACELESS_PLATFORM -DHAVE_DRM_PLATFORM = -DHAVE_WAYLAND_PLATFORM -DWL_HIDE_DEPRECATED -DHAVE_DRI3 = -DHAVE_DRI3_MODIFIERS -DENABLE_SHADER_CACHE -DHAVE_MINCORE = -DHAVE_LLVM=3D0x0600 -DMESA_LLVM_VERSION_PATCH=3D1 -isystem = /usr/local/include -fvisibility=3Dhidden -Werror=3Dpointer-arith = -Werror=3Dvla -O2 -pipe -g -isystem /usr/local/include = -fno-strict-aliasing -isystem /usr/local/include -Wall -fno-math-errno = -fno-trapping-math -Wno-missing-field-initializers -Qunused-arguments = -MT glsl/ir_clone.lo -MD -MP -MF glsl/.deps/ir_clone.Tpo -c = glsl/ir_clone.cpp -fPIC -DPIC -o glsl/.libs/ir_clone.o cc: error: unable to execute command: Abort trap (core dumped) cc: error: clang frontend command failed due to signal (use -v to see = invocation) FreeBSD clang version 8.0.0 (branches/release_80 355313) (based on LLVM = 8.0.0) Target: powerpc64-unknown-freebsd13.0 Thread model: posix InstalledDir: /usr/bin cc: note: diagnostic msg: PLEASE submit a bug report to = https://bugs.freebsd.org/submit/ and include the crash backtrace, = preprocessed source, and associated run script. cc: note: diagnostic msg:=20 ******************** PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT: Preprocessed source(s) and associated run script(s) are located at: cc: note: diagnostic msg: /tmp/nir_constant_expressions-be5a21.c cc: note: diagnostic msg: /tmp/nir_constant_expressions-be5a21.sh cc: note: diagnostic msg:=20 =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) From owner-freebsd-ppc@freebsd.org Wed Mar 6 21:03:53 2019 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 99BE815273FD for ; Wed, 6 Mar 2019 21:03:53 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic305-22.consmr.mail.ne1.yahoo.com (sonic305-22.consmr.mail.ne1.yahoo.com [66.163.185.148]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id BFBCE70561 for ; Wed, 6 Mar 2019 21:03:52 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: ok9IFkgVM1mDn1NyROc6pAsNPoJjWLIQ1eZslNsDEbrtjuz3QthYjLnuT9m_imE x8kHcG0P3LT0gM_jJzyZfpMf_hLJKNikukaSwy1_JgGOmitIXNwkKV9MUYSJibh7_zNic79Ik_wH AsIejzs2Qvx17ShWTM43j5R3Z48XajS8.WZ4BR3rrnhu86bxqHfH89ssV3gXQcoqFUUMse4BOoEy 2wnWtLflah9DxhzKkYynVsa8Hyc7zaRLti3OTgvI5D.oW8_flNX7gRWXITj6crJb3_yTadMYYHP1 _YnQLgHpLTePtvbgrJmgAWzyQhtVctI1HHWtTZBlmiyq795aohDShVi0WML9z1aUnDczI2BhgIeJ WR7n1iVU8Pu1CJ7LVAYgpe5CNtuR1BQxXfwwCEhePGqMDxdqUSZ2rDXYynvNILC5M.QMrZQl4eJK FJoUwKdHfIefHQ2LpRxauxlI6wW9TSsN5eASnZsQGFJMNArscbxaZaBnMZBBcbNNec7o1nNhzi71 g0HmscP41eaFZLmAUMqSZWC0tkmhomHm1ej30..gaVcO6lz.5AYV9ix7UOMNoH1XCiG_0Ksn6MFg y2TrGQRA5QgzfU6B7vAicY4KKN2ojD27hf19gDvhzp8niI8.uusz.bPknnt1ID6WBEQTY_cdUzck Jbt0D6FmbTPZEKo5_5Se1d5F3I4ZwLUhex38sw.h75AK.URMKakP5M36z5rPERZq6oKtadg_Yfh2 sQq6.oV97g3HY6s3u2MQcRTlX6B83ztQ4.mwNeV8qOHhE_HQ3r.h_2CegvO9JwKbTAN9NmY8kqaR Qa9CLX5.hpuVQbmTFIA71rve6anB7DHQROaYsuAwPg8HjVfxN2lsI.qAZkJdtk7O1h8OjpNk6GI_ 0JHiOfJLR0UkGVuYRLSh51WGEaTsa4iylXSDbvFFAof7AlYUHAXjSO7.p4GRAunkU6V55oeap0S0 vTVHZ87aN36JLhrIJyWArJVSjrcvF.mn6JCS4TeGnZpNfXMKBXRnJ6oANhyBmk1mR6vkg68b2WhZ 0YwHYLpzFNh3C8Ean2uWb4G40WYMjFMa7xfQHO9K1mdd2_9vIeLgb99ckKkkOzCQl.Q-- Received: from sonic.gate.mail.ne1.yahoo.com by sonic305.consmr.mail.ne1.yahoo.com with HTTP; Wed, 6 Mar 2019 21:03:46 +0000 Received: from c-67-170-167-181.hsd1.or.comcast.net (EHLO [192.168.1.115]) ([67.170.167.181]) by smtp407.mail.ne1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA ID 989e6db6d298809b3619be81032a35c7; Wed, 06 Mar 2019 21:03:43 +0000 (UTC) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\)) Subject: Re: powerpc64 on PowerMac G5 4-core (system total): a hack that so far seem to avoid the stuck-sleeping issue [self-hosted buildworld/buildkernel completed] Date: Wed, 6 Mar 2019 13:03:42 -0800 References: <76E8BF75-A8F5-4A48-9B7C-6494F4A9520B@yahoo.com> <75A8BB07-3273-423E-9436-798395BC8640@yahoo.com> To: FreeBSD PowerPC ML , Mark Millard via freebsd-hackers In-Reply-To: <75A8BB07-3273-423E-9436-798395BC8640@yahoo.com> Message-Id: <23683875-418E-4E48-BE26-01221EABC906@yahoo.com> X-Mailer: Apple Mail (2.3445.102.3) X-Rspamd-Queue-Id: BFBCE70561 X-Spamd-Bar: +++ X-Spamd-Result: default: False [3.11 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; MV_CASE(0.50)[]; FREEMAIL_FROM(0.00)[yahoo.com]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_ALL(0.00)[]; MX_GOOD(-0.01)[cached: mta6.am0.yahoodns.net]; DKIM_TRACE(0.00)[yahoo.com:+]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36646, ipnet:66.163.184.0/21, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com.dwl.dnswl.org : 127.0.5.0]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; FROM_HAS_DN(0.00)[]; NEURAL_SPAM_SHORT(0.79)[0.791,0]; MIME_GOOD(-0.10)[text/plain]; IP_SCORE(1.31)[ip: (4.36), ipnet: 66.163.184.0/21(1.25), asn: 36646(1.00), country: US(-0.07)]; NEURAL_SPAM_MEDIUM(0.81)[0.808,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_SPAM_LONG(0.71)[0.709,0]; RCVD_IN_DNSWL_NONE(0.00)[148.185.163.66.list.dnswl.org : 127.0.5.0] X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Mar 2019 21:03:53 -0000 [I have a new observed maximum difference, having changed the code record such.] On 2019-Mar-4, at 01:40, Mark Millard wrote: > [I did some testing of other figures than testing for < 0x10.] >=20 > On 2019-Mar-3, at 13:23, Mark Millard wrote: >=20 >> [So far the hack has been successful. Details given later >> below.] >>=20 >> On 2019-Mar-2, at 21:20, Mark Millard wrote: >>=20 >>> [This note goes in a different direction compared to my >>> prior evidence report for overflows and the later activity >>> that has been happening for it. This does *not* involve >>> the patches associated with that report.] >>>=20 >>> I view the following as an evidence-gathering hack: >>> showing the change in behavior with the code changes, >>> not as directly what FreeBSD should do for powerpc64. >>> In code for defined(__powerpc64__) && defined(AIM) >>> I freely use knowledge of the PowerMac G5 context >>> instead of attempting general code. >>>=20 >>> Also: the code is set up to record some information >>> that I've been looking at via ddb. The recording is >>> not part of what changes the behavior but I decided >>> to show that code too. >>>=20 >>> It is preliminary, but, so far, the hack has avoided >>> buf*daemon* threads and pmac_thermal getting stuck >>> sleeping (or, at least, far less frequently). >>>=20 >>>=20 >>> The tbr-value hack: >>>=20 >>> =46rom what I see the G5 various cores have each tbr running at the >>> same rate but have some some offsets as far as the base time >>> goes. cpu_mp_unleash does: >>>=20 >>> ap_awake =3D 1; >>>=20 >>> /* Provide our current DEC and TB values for APs */ >>> ap_timebase =3D mftb() + 10; >>> __asm __volatile("msync; isync"); >>>=20 >>> /* Let APs continue */ >>> atomic_store_rel_int(&ap_letgo, 1); >>>=20 >>> platform_smp_timebase_sync(ap_timebase, 0); >>>=20 >>> and machdep_ap_bootstrap does: >>>=20 >>> /* >>> * Set timebase as soon as possible to meet an implicit = rendezvous >>> * from cpu_mp_unleash(), which sets ap_letgo and then = immediately >>> * sets timebase. >>> * >>> * Note that this is instrinsically racy and is only relevant = on >>> * platforms that do not support better mechanisms. >>> */ >>> platform_smp_timebase_sync(ap_timebase, 1); >>>=20 >>>=20 >>> which attempts to set the tbrs appropriately. >>>=20 >>> But on small scales of differences the various tbr >>> values from different cpus end up not well ordered >>> relative to time, synchronizes with, and the like. >>> Only large enough differences can well indicate an >>> ordering of interest. >>>=20 >>> Note: tc->tc_get_timecount(tc) only provides the >>> least signficant 32 bits of the tbr value. >>> th->th_offset_count is also 32 bits and based on >>> truncated tbr values. >>>=20 >>> So I made binuptime avoid finishing when it sees >>> a small (<0x10) step backwards for a new >>> tc->tc_get_timecount(tc) value vs. the existing >>> th->th_offset_count value (values strongly tied >>> to powerpc64 tbr values): >>>=20 >>> . . . [old code omitted] . . . >>>=20 >>> So far as I can tell, the FreeBSD code is not designed to deal >>> with small differences in tc->tc_get_timecount(tc) not actually >>> indicating a useful < vs. =3D=3D vs. > ordering relation uniquely. >>>=20 >>> (I make no claim that the hack is a proper way to deal with >>> such.) >>=20 >> I did a somewhat over 7 hours buildworld buildkernel on the >> PowerMac G5. Overall the G5 has been up over 13 hours and >> none of the buf*daemon* threads have gotten stuck sleeping. >> Nor has pmac_thermal gotten stuck. Similarly for vnlru >> and syncer: "top -HIStopid" still shows them all as >> periodically active. >>=20 >> Previously for this usefdt=3D1 context (with the modern >> VM_MAX_KERNEL_ADDRESS), going more than a few minutes >> without at least one of those threads getting stuck >> sleeping was rare on the G5 (powerpc64 example). >>=20 >> So this hack has managed to avoid finding sbinuptime() >> in sleepq_timeout being less than the earlier (by call >> structure/code sequencing) sbinuptime() in timercb that >> lead to the sleepq_timeout callout being called in the >> first place. >>=20 >> So in the sleepq_timeout callout's: >>=20 >> if (td->td_sleeptimo > sbinuptime() || td->td_sleeptimo =3D=3D = 0) { >> /* >> * The thread does not want a timeout (yet). >> */ >> } else . . . >>=20 >> td->td_sleeptimo > sbinuptime() ends up false now for small >> enough original differences. >>=20 >> This case does not set up another timeout, it just leaves the >> thread stuck sleeping, no longer doing periodic activities. >>=20 >> As stands what I did (presuming an appropriate definition >> of "small differences in the problematical direction") should >> leave this and other sbinuptime-using code with: >>=20 >> td->td_sleeptimo <=3D sbinuptime() >>=20 >> for what were originally "small" tbr value differences in the >> problematical direction (in case other places require it in >> some way). >>=20 >> If, instead, just sleepq_timeout's test could allow for >> some slop in the ordering, it could be a cheaper hack then >> looping in binuptime . >>=20 >> At this point I've no clue what a correct/efficient FreeBSD >> design for allowing the sloppy match across tbr's for different >> CPUs would be. >=20 > Instead of 0x10 in "&& tim_offset-tim_cnt<0x10" I tried > the each of following and they all failed: >=20 > && tim_offset-tim_cnt<0x2 > && tim_offset-tim_cnt<0x4 > && tim_offset-tim_cnt<0x8 > && tim_offset-tim_cnt<0xc I've now seen a difference of 0x11 that lead to hung up threads, hung waiting for sleep. > 0x2, 0x4, and 0x8 failed for the first boot attempt, > almost mediately having stuck-in-sleep threads. >=20 > 0xc seemed to be working for the first boot (including > a buildworld buildkernel that did not have to rebuild > much). But the 2nd boot attempt had a stuck-in-sleep > thread by the time I logged in. >=20 > By contrast, for: >=20 > && tim_offset-tim_cnt<0x10 >=20 > I've not it fail so far, after many reboots, a full > buildworld buildkernel, and running over 24 hours > (that included the somewhat over 7 hours for build > world buildkernel). But it might be that some boots > would need a bigger figure. >=20 During a ports-mgmt/poudriere-devel run I had some threads hang in sleep when the code was based on less than 0x10 differences. But I'd changed to be recording the maximum "small difference in the problematical direction" observed and so was able to see that it got a: 0x11 difference. The below is the newer code structure as far as what is recorded. It already has 0x14 instead of 0x10 for the bound it uses to control the loop. I omitted #if 0 . . . #endif code that I'm not currently using. #if defined(__powerpc64__) && defined(AIM) void binuptime(struct bintime *bt) { struct timehands *th; u_int gen; u_int tim_cnt, tim_offset; // HACK!!! (for "small difference is = problem direction loop") struct timecounter *tc; // HACK!!! (for recording other data for = inspection via ddb) u_int tim_diff; // HACK!!! uint64_t scale_factor, diff_scaled; // HACK!!! #if 1 u_int tim_wrong_order_diff=3D 0u; // HACK!!! u_int max_wrong_order_diff=3D 0u; // HACK!!! u_int wrong_order_cnt=3D 0u; // HACK!!! u_int wrong_order_offset=3D 0u; // HACK!!! #endif do { do { // HACK!!! th=3D timehands; tc=3D th->th_counter; gen=3D atomic_load_acq_int(&th->th_generation); tim_cnt=3D tc->tc_get_timecount(tc); tim_offset=3D th->th_offset_count; #if 1 tim_wrong_order_diff=3D tim_offset-tim_cnt; if ( tim_cntth_offset; tim_diff=3D (tim_cnt - tim_offset) & = tc->tc_counter_mask; scale_factor=3D th->th_scale; diff_scaled=3D scale_factor * tim_diff; bintime_addx(bt, diff_scaled); atomic_thread_fence_acq(); } while (gen =3D=3D 0 || gen !=3D th->th_generation); #if 1 // Uses direct-map addresses (mapping to the most signficant c = being masked off). // Justin H. reported that some of the 0x0..0xff addresses were = unused // and available. The 2 larger ranges that I observed to stay at = zero // were 0x20..0x7f and 0xa..0xff --so that is what I limited = myself to. if (*(volatile = uint64_t*)0xc0000000000000b0 Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7A4A81527B87 for ; Wed, 6 Mar 2019 21:19:24 +0000 (UTC) (envelope-from chmeeedalf@gmail.com) Received: from mail-it1-x130.google.com (mail-it1-x130.google.com [IPv6:2607:f8b0:4864:20::130]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 5AFB170FD8 for ; Wed, 6 Mar 2019 21:19:23 +0000 (UTC) (envelope-from chmeeedalf@gmail.com) Received: by mail-it1-x130.google.com with SMTP id z124so12484007itc.2 for ; Wed, 06 Mar 2019 13:19:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:in-reply-to:references :mime-version; bh=lSs5w0nLyJhPzq7wVd3HFkCHporylXGoqeKvi0Ux2rc=; b=Kllq7l6N9CdjGS0BJfqfTMfR4r+chOIFfy1XlsDp504Yi2jlfjampuEiX6KP3R5aeh SAvz8NVOVtyDevTZWMhxJjpE+4gDdUHJAJKRmNCcOrZac6UDlWFO7OmLhhqqEIHozkpD gFVhQcbn6vGM1PCIpuKa+KZQwvrpCA4j3WpOzeSq3P5HhAUK8gdPjSxzcrCSCDtoFzdR +P9v438p22RnWmatjU7+QaWKyM6PBMHOLt2P1keRKmV0AGTAcSxVzuO/ly8AQiQF1ylP D6Bm6fnMdfKA+174fznI4WefjldAse+2FL20lnYfcfm4x7d8VpFJGQ4mltJ00if6QqAZ 4UpQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version; bh=lSs5w0nLyJhPzq7wVd3HFkCHporylXGoqeKvi0Ux2rc=; b=L7gPKhCdnGyYNzW1W00ukr9241Src8hgqCjny5hm/oi8/y2foIECvg6N8L3Fukhroa MeSmInyDsQdN8ZPgL6kyW2yrmhelT6ywA0aJPcau+Je0iY1jIw/HdiIQHQExJXfQpniJ G2IOdez5yGX4yC0Lo+ZqNllv6AIWJV9mVsJ4eLn4rri/boRZPHqG4157QKHHPJOyO8zZ 8noWuTjvyVYouBGT+Wf0z+p1UOto5RgLaMBVPsob1548fp1pFNAcjXN1zJh19xtwTmNh VgUpOtL2/+FaUUHPN8qYZJI70q0iQYTt9kJj/3KSKybG+kMLoeG35yPEKQCOlXgeQSbb aFPw== X-Gm-Message-State: APjAAAVyn50dj6pnC32rfAFduTJJtk6Go09k8gc7///gsT2Q2vFxknXl TWfaaxScSDOH3dbsaD0E6Ruc7WXN X-Google-Smtp-Source: APXvYqzWdp6Qde6XkOh8TUPO+kWBFSPmyPpIvtBHwkoAjju1XjbKPJX6bxLe+hamGMHnzRB65WrWoA== X-Received: by 2002:a24:7c46:: with SMTP id a67mr3151684itd.171.1551907162293; Wed, 06 Mar 2019 13:19:22 -0800 (PST) Received: from titan.knownspace (173-25-245-129.client.mchsi.com. [173.25.245.129]) by smtp.gmail.com with ESMTPSA id r2sm1266822itk.5.2019.03.06.13.19.21 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 06 Mar 2019 13:19:21 -0800 (PST) Date: Wed, 6 Mar 2019 15:19:14 -0600 From: Justin Hibbits To: freebsd-ppc Subject: Re: head -r344018 powerpc64 variant on Powermac G5 (2 sockets, 2 cores each): [*buffer arena] shows up more . . .? Message-ID: <20190306151914.44ea831c@titan.knownspace> In-Reply-To: References: X-Mailer: Claws Mail 3.17.3 (GTK+ 2.24.32; powerpc64-portbld-freebsd13.0) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="MP_/95kzJ17J+e.fHE.d+h6njCd" X-Rspamd-Queue-Id: 5AFB170FD8 X-Spamd-Bar: ----- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20161025 header.b=Kllq7l6N; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (mx1.freebsd.org: domain of chmeeedalf@gmail.com designates 2607:f8b0:4864:20::130 as permitted sender) smtp.mailfrom=chmeeedalf@gmail.com X-Spamd-Result: default: False [-5.36 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36]; FREEMAIL_FROM(0.00)[gmail.com]; HAS_ATTACHMENT(0.00)[]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_ALL(0.00)[]; MX_GOOD(-0.01)[cached: alt3.gmail-smtp-in.l.google.com]; DKIM_TRACE(0.00)[gmail.com:+]; RCPT_COUNT_TWO(0.00)[2]; NEURAL_HAM_SHORT(-0.88)[-0.881,0]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+,1:+,2:~]; SUBJECT_ENDS_QUESTION(1.00)[]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; DWL_DNSWL_NONE(0.00)[gmail.com.dwl.dnswl.org : 127.0.5.0]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20161025]; FROM_HAS_DN(0.00)[]; MIME_UNKNOWN(0.10)[text/x-patch]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[multipart/mixed,text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-ppc@freebsd.org]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[0.3.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.4.6.8.4.0.b.8.f.7.0.6.2.list.dnswl.org : 127.0.5.0]; RCVD_TLS_LAST(0.00)[]; FREEMAIL_CC(0.00)[yahoo.com]; IP_SCORE(-2.57)[ip: (-8.03), ipnet: 2607:f8b0::/32(-2.70), asn: 15169(-2.04), country: US(-0.07)] X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Mar 2019 21:19:24 -0000 --MP_/95kzJ17J+e.fHE.d+h6njCd Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Content-Disposition: inline On Mon, 4 Mar 2019 19:43:09 -0800 Mark Millard via freebsd-ppc wrote: > [It is possible that the following is tied to my hack to > avoid threads ending up stuck-sleeping. But I ask about > an alternative that I see in the code.] > > Context: using the modern powerpc64 VM_MAX_KERNEL_ADDRESS > and using usefdt=1 on an old Powermac G5 (2 sockets, 2 cores > each). Hacks are in use to provide fairly reliable booting > and to avoid threads getting stuck sleeping. > > Before the modern VM_MAX_KERNEL_ADDRESS figure there were only > 2 or 3 bufspacedaemon-* threads as I remember. Now there are 8 > (plus bufdaemon and its worker), for example: > > root 23 0.0 0.0 0 288 - DL 15:48 0:00.39 > [bufdaemon/bufdaemon] root 23 0.0 0.0 0 288 - DL > 15:48 0:00.05 [bufdaemon/bufspaced] root 23 0.0 > 0.0 0 288 - DL 15:48 0:00.05 [bufdaemon/bufspaced] > root 23 0.0 0.0 0 288 - DL 15:48 0:00.05 > [bufdaemon/bufspaced] root 23 0.0 0.0 0 288 - DL > 15:48 0:00.05 [bufdaemon/bufspaced] root 23 0.0 > 0.0 0 288 - DL 15:48 0:00.05 [bufdaemon/bufspaced] > root 23 0.0 0.0 0 288 - DL 15:48 0:00.07 > [bufdaemon/bufspaced] root 23 0.0 0.0 0 288 - DL > 15:48 0:00.05 [bufdaemon/bufspaced] root 23 0.0 > 0.0 0 288 - DL 15:48 0:00.56 [bufdaemon// worker] > > I'm sometimes seeing processes showing [*buffer arena] that > seemed to wait for a fairly long time with that status, not > something I'd seen historically for those same types of > processes for a similar overall load (not much). During such > times trying to create processes to look around at what is > going on seems to also wait. (Probably with the same status?) > Hi Mark, Can you try the attached patch? It might be overkill in the synchronization, and I might be using the wrong barriers to be considered correct, but I think this should narrow the race down, and synchronize the timebases to within a very small margin. The real correct fix would be to suspend the timebase on all cores, which is feasible (there's a GPIO for the G4s, and i2c for G5s), but that's a non-trivial extra work. Be warned, I haven't tested it, I've only compiled it (I don't have a G5 to test with anymore). - Justin --MP_/95kzJ17J+e.fHE.d+h6njCd Content-Type: text/x-patch Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=powermac_tb_sync.diff diff --git a/sys/powerpc/powermac/platform_powermac.c b/sys/powerpc/powermac/platform_powermac.c index fe818829dc7..b5d34ef90c3 100644 --- a/sys/powerpc/powermac/platform_powermac.c +++ b/sys/powerpc/powermac/platform_powermac.c @@ -41,6 +41,7 @@ __FBSDID("$FreeBSD$"); #include #include /* For save_vec() */ +#include #include #include #include /* For save_fpu() */ @@ -396,6 +397,19 @@ powermac_smp_start_cpu(platform_t plat, struct pcpu *pc) static void powermac_smp_timebase_sync(platform_t plat, u_long tb, int ap) { + static int cpus; + static int unleash; + + if (ap) { + atomic_add_int(&cpus, 1); + while (!atomic_load_acq_int(&unleash)) + ; + } else { + atomic_add_int(&cpus, 1); + while (atomic_load_int(&cpus) != mp_ncpus) + ; + atomic_store_rel_int(&unleash, 1); + } mttb(tb); } --MP_/95kzJ17J+e.fHE.d+h6njCd-- From owner-freebsd-ppc@freebsd.org Wed Mar 6 23:28:47 2019 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 91437152B036 for ; Wed, 6 Mar 2019 23:28:47 +0000 (UTC) (envelope-from dclarke@blastwave.org) Received: from atl4mhfb02.myregisteredsite.com (atl4mhfb02.myregisteredsite.com [209.17.115.118]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 0517576638 for ; Wed, 6 Mar 2019 23:28:46 +0000 (UTC) (envelope-from dclarke@blastwave.org) Received: from atl4mhob14.registeredsite.com (atl4mhob14.registeredsite.com [209.17.115.52]) by atl4mhfb02.myregisteredsite.com (8.14.4/8.14.4) with ESMTP id x26NS0cs023142 for ; Wed, 6 Mar 2019 18:28:00 -0500 Received: from mailpod.hostingplatform.com (atl4qobmail02pod2.registeredsite.com [10.30.77.36]) by atl4mhob14.registeredsite.com (8.14.4/8.14.4) with ESMTP id x26NRrEp026952 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL) for ; Wed, 6 Mar 2019 18:27:53 -0500 Received: (qmail 12497 invoked by uid 0); 6 Mar 2019 23:27:53 -0000 X-TCPREMOTEIP: 174.118.245.214 X-Authenticated-UID: dclarke@blastwave.org Received: from unknown (HELO ?172.16.35.3?) (dclarke@blastwave.org@174.118.245.214) by 0 with ESMTPA; 6 Mar 2019 23:27:53 -0000 Subject: Re: Mac G5 and XServe G5 support To: freebsd-ppc@freebsd.org References: <966736134.9036.1551866362184@office.mailbox.org> From: Dennis Clarke Openpgp: preference=signencrypt Autocrypt: addr=dclarke@blastwave.org; keydata= mQINBFxoSrYBEAC1M5KicBVclSHf6d81rxTQYgFhIMhNxekNQgNsB39lCWcq3zSZi75Rflb0 Q74b+lIjBi7a5XygweXgFINPNVLpknrG8y7jA/8jrKqVy5qQ/7Mw/uVou4culndNOkXwNyW9 WTNoAzAtKlDEmzIX/pfaqrulAP8se3ci9vqXInIHpRHZithrrvAsWQWuhC200PYvBlA/Vmv6 3UxV26LVa1uNYgJSgiBbCI9VTv14YSnFRG6WWXTRmVksJMiNY7fZnKGNhFkrcnGxVqVKnCgj enG67ms6uwzhkfa/F1C3BPljb5WcApJwph/Iaq+7EpVD6DmE1xYP6pgqFX4yW5MVRMn6XaIR rbkP90CodrCOTedyrB1E7N8xNZKX+sUwWBnfqv7n8rBGnlNzo2GOBHVxqw7EGYoQItlHDmhx deOOgq6VmmL1kZn4D+5BLUw/w2SljDqXpdF/Gnm3WXGe+ooBGcoMXeiqv+4PM5k11CIBLjRK p2cD51upwccFILPDF8Wipy8t6Oc+ToLz80zb5kiBR9dggORbPr4WHCt7VS4s24mAX7wBQ/EB ePRUykvES3WJLuRBdFAPtXBc9m/q0gzU9iPx3eIm8u2SbO7kUMBESexeBpJ8cIfJ7/LX2LV8 UoWxfJieklheUPZtOA06pyMcb37/A/HZNMOUYh83TKVCnv7FxwARAQABtCVEZW5uaXMgQ2xh cmtlIDxkY2xhcmtlQGJsYXN0d2F2ZS5vcmc+iQJOBBMBCAA4FiEE1j0Rv6qd1s9jGqtWj5Fg Cl9xztwFAlxoSrYCGwMFCwkIBwIGFQoJCAsCBBYCAwECHgECF4AACgkQj5FgCl9xztz1Bg/+ KIyWqzrWfTexJ0+9S0EhCNwkb8aCaGKde+dqiqTFFobS5UWphhAtMtLnU4tZG2K+GPIBnMpC 6tC5gxB4TppgcGzqRNle4CjY4Lt7SQs23V+hbTZJLDwlBWbbuqDIvkNiO1pFuaHGNJVYaQ5y qlm156/Y+GmarfVGbVhjelRq3DjDwTcdo1J36UUo3GS8/g1uXX84Va71nAeyivtzwNbU18F4 Bcbmo7fMS0nBUmEqJJWftjmz2ihP1opz2HOEzv9q7uU8q3yfg1pweT8Zscx+Y5dtUd3d4dRL iXJxm2Z2dVcWabMmlhOnLqhPaf39WjKkxr2mHiYN2sUJ5S6yKUM6HKVM7ZE/1HRYo1OZgsEC PQka65hK36ezldtQplKcGlG7DjIW3Vi1BK6o70/7Hvdyfqdeft3qY1bs8BcHfNyan/DBGgTe 34eGnqqU+YY0mRTCpukbC2/MYYEYdeS9/RYiwCf1Tn8x232iVpX6wYx8+L8Nb3QEkTNM3VP0 ArAoF1EE9RZ2jLBV9g+vKRRiatPN8pGMv9on0pO6HhAp19Db4owW/pcgsAXsLS/mjjkxo1Br Gu0shJZ6o6SqDfMpfdNyUVdzvAgAUwWtdSXlgXpn6oCn7B7YhEkj+jQ9p8Y398o9YAybe70v 7GLkZqcPkCv9GQ3Cw5a+i/FNm4JCDeD99ZC5Ag0EXGhKtgEQAMZCBzuT2z/PWurlNcc/ChFy 4sRHrDXL/pwGOy9Ue0s/busdKxPWomOMbFA4PIILaxrT0L1w6xb1Svj2CgYbhSDsW12SdqsA C5MrqQi/j5S/H4rEsZt8nsSbSx6JF+tP5x0i14zG2GXv7+DjxrDMfFThejeEeIcHU//Ip1MQ CF7uGv4ug3WUSKHR7wVTceq5T3oR9kLguszBhavyJZrYte6r0TDG0GdFAGQMAau4FcHsOHyf 46Gx66rGoWmgH+938kodF71d7a0FXpUUI9RAhL1MepR78QkyjGTocBKRbrcXZPO8ya9/Tcmp fRxlJNeMM9TQKND3GYSzZrsYWdmXPdx18R0rzfBOCdDPUjVJhcV9AbeH4EApDPxjDSADQ0X9 SmSoMd27MjU8rFG+Mfu0gbK/OG4kPga/2MO5lU3sublv0PMYcsQqYOcqSBDxBdkAZMDFt376 lCSxau0Ijj2bb49ippjjH6gQU5iA6ASLSFN8AWs80dVeIUt964RAc/XY8QAW621Qe6OaSqh3 M+Umdf38Cc6qySjphSEF6i+YQ1FlbmK09yyEEpDuaFejgRXXaMxj6sF+b/g4JTqxlHDEc9Nd 8+L/zrtPkUXWAss9a8jtm5hGquc37EjyZyLr+35dtyEJBJ2o0G9Len2F9+mfDdRRKJAiqqLL 3JxHKFTZ4cShABEBAAGJAjYEGAEIACAWIQTWPRG/qp3Wz2Maq1aPkWAKX3HO3AUCXGhKtgIb DAAKCRCPkWAKX3HO3MYdEACW614cKJJT9/M2wPyYecKj+KR5tv+oTdGdcZl87mG47XWn4fKI kpyTR9EGVHGbSbrCyG8qMvz+vhe+Aj9SbJ4ccr+1KIaNkBcACOSJdU2UC2sqOBxckki0ArbB ds3efHBaAEKCZv4Qfj5sHILLkImaCtR+FjvP0fr5ankJkbOeucqgxPmkKJxFBgiotWQxPp59 Sl5uzNGeLPBmkleYQMQFAOK6Yhrgsh35AmYNgNoPR6KWsfaIh9BPgEOOxc3Zl99fsZogbt1U 2YUj7L0nCa5s1AMTftZDTBsqZyotDO8/TpwSEC0EOHvcg/GAj+ocMgVPTHaTrgCV2Yy2lCVG u1Mu2T7zsCRMDJNvhC7LA3Qo8Fdc7SFJekr7TllTWB4mbQyYj9/vjQINxoKZV6v7Yfw/rYcm xY2fVsSdxZFvDIM/VRryQpoqzPv9YQrDVWDEb139NtvrNEeUnIXv+cRBKFMBxQ0PIHDkwNAb cmXY5/R58QiqnGE23je0WQNg+iBrbJN9P7inp178m6j9SFor+5pW567vYakASRQn5GPqHqt9 fRQvz5E3aa8xDscR6Gs9HQAhsA5kDqvH/XxQRD7Y1jG9T73WMlS6j928qHfMwQ6EvNuIQwqN PToVd6cMhrTJKE5gUVLVs9Oa81zr/5pNCKJ9upm6cU349JNDO/SDKSTtLA== Message-ID: Date: Wed, 6 Mar 2019 18:27:52 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.5.3 MIME-Version: 1.0 In-Reply-To: <966736134.9036.1551866362184@office.mailbox.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 0517576638 X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [-6.97 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; REPLY(-4.00)[]; NEURAL_HAM_SHORT(-0.97)[-0.972,0] X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Mar 2019 23:28:47 -0000 On 3/6/19 4:59 AM, Sergio Carlavilla wrote: > Hi, > > I want to know how is the state of FreeBSD in Mac G5 and XServe G5. > Support all the hardware? > > I checked the wiki page https://wiki.freebsd.org/powerpc but I don't finded anything specific to Mac G5 and XServe G5 hardware support. > You can install FreeBSD 12 on a PowerMac G5 quad unit without too much trouble if ( and only if ) you know how to tell the loader to insert a few variables such as usefdt=1 and kern.smp.disabled=1 just to get it to boot. After install you may build the current sources for the kernel and the whole buildworld for that matter. That all works and I have done it over and over. You may even get all four cores running. However this is all somewhat similar to having a 1937 Plymouth car in your garage and tinkering on it. Hardly modern in even the most thin sense of the word. But it works fine. -- Dennis Clarke RISC-V/SPARC/PPC/ARM/CISC UNIX and Linux spoken GreyBeard and suspenders optional From owner-freebsd-ppc@freebsd.org Thu Mar 7 00:39:43 2019 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 361DF152CB2C for ; Thu, 7 Mar 2019 00:39:43 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic303-22.consmr.mail.ne1.yahoo.com (sonic303-22.consmr.mail.ne1.yahoo.com [66.163.188.148]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 74C4F80FED for ; Thu, 7 Mar 2019 00:39:41 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: zu8mKcgVM1lK5W0KqjY4NncAhmEnGjNw8C_14mD.UYNaemmLh3v22Q4adXvu6So 9MFnjesOW2Xxls4ona3pUZ5zA0o8Vu893P.S2f8SEysMuHN9ZAFMMKwvbMI3_q2IYxyPDAenAC.F yFLqkOWRvxVL5A.JaZaeaEz.r5bclhcbXWKnEbSUUnUPNfRzz3PNMc0CWamUV19Ebx1eIy_wdwyq 9XpeNFxj9tjqqyulM1l9G48Z5ZWXE72Krt3JEJ2QUoP99ZYj88mHZ6wHb8e0PKdCz_ndxNhp0ogd DAxwYV7q9BJYHbhzEyW0YcY6Up..uFUD38ckoGURG6r5PP6CBYNOtrj8ZxnCfiPgZQR3SiHOis4s OChAchFCXApEPhKkZE2YcA5O480EpjcBrj1rXP1QPBxvfWBHAMm_k6BcvSMlP7LXPCsQawT3DtBb pTFclWqFnc7BeAEAeiPZbEZekKDK4qdhJJbTNodMDAZn0shoIywJyTH0gTbIz5rgiziePht8GGSq qQLmkR5tGQL5ZvyaEVBJ_c22Jsm8MLbe.CULV7nYp0__mymcsHg.Cw2a_8N1lhTwHjYlmd3ld0zB xdBMg3MiUUtyevlmuHyW1ARP0dN7jopeQ_jtYNB47SgxR4Ak5_Y54iNw8Lx.y2XU6KFUQZUCg37P 02ZF3DTgTTVd2NMI43he_3qqb7.AggZicYZmhWKtoyGXgXIyfkCVGycRoYAP9QZwEicf.1zB08Jg wqAHXIKC2kDhfP45h5uSF2ia27GVprm8wWJ1DzRGfp5IGNBv4ahO60S7mjrutXMS84nRswxJLzJc Ds_sl80Pw_Bg3R_QpiSm21StXVGl9lD.7qdn_.nW4.1HuUug7heUTGy5FTdR0FkFW6t98_I2jHsU mP2nY1UikwxHsE5JxydamjK7jvfxwuvG6Btd03ScxS4CZ5oO07ceG8rZGGYZpvTUMdFj18oxZ2Wv nq5LdHYna4nvGGurK9I0cnrZRSqpOZS3qzxkajEiTvJBiU5PF1NEfmjbhCy0Rb0MP4VpedMTuKRY DeEB9sMKHpfnamRQY_SlxsG4MKtwYjVkHm0SjW1po.9b0yi0ZRduE_VIh1HCHhCxspsy3cdxKXAz nEdym3w-- Received: from sonic.gate.mail.ne1.yahoo.com by sonic303.consmr.mail.ne1.yahoo.com with HTTP; Thu, 7 Mar 2019 00:39:34 +0000 Received: from c-67-170-167-181.hsd1.or.comcast.net (EHLO [192.168.1.113]) ([67.170.167.181]) by smtp409.mail.ne1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA ID 0e675305e94033af7ef2499a82435e18; Thu, 07 Mar 2019 00:39:33 +0000 (UTC) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\)) Subject: Re: head -r344018 powerpc64 variant on Powermac G5 (2 sockets, 2 cores each): [*buffer arena] shows up more . . .? From: Mark Millard In-Reply-To: <20190306151914.44ea831c@titan.knownspace> Date: Wed, 6 Mar 2019 16:39:31 -0800 Cc: freebsd-ppc Content-Transfer-Encoding: 7bit Message-Id: <8668AAF7-9E6A-4278-9D1B-2ECDBD3804AA@yahoo.com> References: <20190306151914.44ea831c@titan.knownspace> To: Justin Hibbits X-Mailer: Apple Mail (2.3445.102.3) X-Rspamd-Queue-Id: 74C4F80FED X-Spamd-Bar: +++ X-Spamd-Result: default: False [3.82 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; MV_CASE(0.50)[]; FREEMAIL_FROM(0.00)[yahoo.com]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; MX_GOOD(-0.01)[cached: mta6.am0.yahoodns.net]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; FREEMAIL_TO(0.00)[gmail.com]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_LAST(0.00)[]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; SUBJECT_ENDS_QUESTION(1.00)[]; MID_RHS_MATCH_FROM(0.00)[]; ASN(0.00)[asn:36646, ipnet:66.163.184.0/21, country:US]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; FROM_HAS_DN(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com.dwl.dnswl.org : 127.0.5.0]; NEURAL_SPAM_SHORT(0.93)[0.927,0]; MIME_GOOD(-0.10)[text/plain]; IP_SCORE(0.93)[ip: (2.48), ipnet: 66.163.184.0/21(1.25), asn: 36646(1.00), country: US(-0.07)]; NEURAL_SPAM_MEDIUM(0.69)[0.686,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_SPAM_LONG(0.78)[0.782,0]; RCVD_IN_DNSWL_NONE(0.00)[148.188.163.66.list.dnswl.org : 127.0.5.0]; RWL_MAILSPIKE_POSSIBLE(0.00)[148.188.163.66.rep.mailspike.net : 127.0.0.17] X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Mar 2019 00:39:43 -0000 On 2019-Mar-6, at 13:19, Justin Hibbits wrote: > On Mon, 4 Mar 2019 19:43:09 -0800 > Mark Millard via freebsd-ppc wrote: > >> [It is possible that the following is tied to my hack to >> avoid threads ending up stuck-sleeping. But I ask about >> an alternative that I see in the code.] >> >> Context: using the modern powerpc64 VM_MAX_KERNEL_ADDRESS >> and using usefdt=1 on an old Powermac G5 (2 sockets, 2 cores >> each). Hacks are in use to provide fairly reliable booting >> and to avoid threads getting stuck sleeping. >> >> Before the modern VM_MAX_KERNEL_ADDRESS figure there were only >> 2 or 3 bufspacedaemon-* threads as I remember. Now there are 8 >> (plus bufdaemon and its worker), for example: >> >> root 23 0.0 0.0 0 288 - DL 15:48 0:00.39 >> [bufdaemon/bufdaemon] root 23 0.0 0.0 0 288 - DL >> 15:48 0:00.05 [bufdaemon/bufspaced] root 23 0.0 >> 0.0 0 288 - DL 15:48 0:00.05 [bufdaemon/bufspaced] >> root 23 0.0 0.0 0 288 - DL 15:48 0:00.05 >> [bufdaemon/bufspaced] root 23 0.0 0.0 0 288 - DL >> 15:48 0:00.05 [bufdaemon/bufspaced] root 23 0.0 >> 0.0 0 288 - DL 15:48 0:00.05 [bufdaemon/bufspaced] >> root 23 0.0 0.0 0 288 - DL 15:48 0:00.07 >> [bufdaemon/bufspaced] root 23 0.0 0.0 0 288 - DL >> 15:48 0:00.05 [bufdaemon/bufspaced] root 23 0.0 >> 0.0 0 288 - DL 15:48 0:00.56 [bufdaemon// worker] >> >> I'm sometimes seeing processes showing [*buffer arena] that >> seemed to wait for a fairly long time with that status, not >> something I'd seen historically for those same types of >> processes for a similar overall load (not much). During such >> times trying to create processes to look around at what is >> going on seems to also wait. (Probably with the same status?) >> > > Hi Mark, > > Can you try the attached patch? It might be overkill in the > synchronization, and I might be using the wrong barriers to be > considered correct, but I think this should narrow the race down, and > synchronize the timebases to within a very small margin. The real > correct fix would be to suspend the timebase on all cores, which is > feasible (there's a GPIO for the G4s, and i2c for G5s), but that's a > non-trivial extra work. > > Be warned, I haven't tested it, I've only compiled it (I don't have a > G5 to test with anymore). > Sure, I'll try it when the G5 is again available: it is doing a time consuming build. I do see one possible oddity: tracing another platform_smp_timebase_sync use in the code . . . DEVMETHOD(cpufreq_drv_set, pmufreq_set) static int pmufreq_set(device_t dev, const struct cf_setting *set) { . . . error = pmu_set_speed(speed_sel); . . . } int pmu_set_speed(int low_speed) { . . . platform_sleep(); . . . } PLATFORMMETHOD(platform_sleep, powermac_sleep), void powermac_sleep(platform_t platform) { *(unsigned long *)0x80 = 0x100; cpu_sleep(); } void cpu_sleep() { . . . platform_smp_timebase_sync(timebase, 0); . . . } PLATFORMMETHOD(platform_smp_timebase_sync, powermac_smp_timebase_sync), The issue: I do not see any matching platform_smp_timebase_sync(timebase, 1) or other CPUs doing a powermac_smp_timebase_sync in this sequence. (If this makes testing the patch inappropriate, let me know.) === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) From owner-freebsd-ppc@freebsd.org Thu Mar 7 00:57:24 2019 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 36F08152D174; Thu, 7 Mar 2019 00:57:24 +0000 (UTC) (envelope-from jhb@FreeBSD.org) Received: from smtp.freebsd.org (smtp.freebsd.org [IPv6:2610:1c1:1:606c::24b:4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B8C4581FD8; Thu, 7 Mar 2019 00:57:23 +0000 (UTC) (envelope-from jhb@FreeBSD.org) Received: from John-Baldwins-MacBook-Pro-3.local (ralph.baldwin.cx [66.234.199.215]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) (Authenticated sender: jhb) by smtp.freebsd.org (Postfix) with ESMTPSA id 4A19E803F; Thu, 7 Mar 2019 00:57:23 +0000 (UTC) (envelope-from jhb@FreeBSD.org) Subject: Re: amd64 -> powerpc64 port base/gcc use: "checking whether the C compiler works... Unable to load interpreter" and "If you meant to cross compile, use `--host'" To: Mark Millard , FreeBSD PowerPC ML , FreeBSD Toolchain References: <71E6ECD7-76B4-4C5A-9071-3CA0A20B4F24@yahoo.com> From: John Baldwin Openpgp: preference=signencrypt Autocrypt: addr=jhb@FreeBSD.org; keydata= mQGiBETQ+XcRBADMFybiq69u+fJRy/0wzqTNS8jFfWaBTs5/OfcV7wWezVmf9sgwn8TW0Dk0 c9MBl0pz+H01dA2ZSGZ5fXlmFIsee1WEzqeJzpiwd/pejPgSzXB9ijbLHZ2/E0jhGBcVy5Yo /Tw5+U/+laeYKu2xb0XPvM0zMNls1ah5OnP9a6Ql6wCgupaoMySb7DXm2LHD1Z9jTsHcAQMD /1jzh2BoHriy/Q2s4KzzjVp/mQO5DSm2z14BvbQRcXU48oAosHA1u3Wrov6LfPY+0U1tG47X 1BGfnQH+rNAaH0livoSBQ0IPI/8WfIW7ub4qV6HYwWKVqkDkqwcpmGNDbz3gfaDht6nsie5Z pcuCcul4M9CW7Md6zzyvktjnbz61BADGDCopfZC4of0Z3Ka0u8Wik6UJOuqShBt1WcFS8ya1 oB4rc4tXfSHyMF63aPUBMxHR5DXeH+EO2edoSwViDMqWk1jTnYza51rbGY+pebLQOVOxAY7k do5Ordl3wklBPMVEPWoZ61SdbcjhHVwaC5zfiskcxj5wwXd2E9qYlBqRg7QeSm9obiBCYWxk d2luIDxqaGJARnJlZUJTRC5vcmc+iGAEExECACAFAkTQ+awCGwMGCwkIBwMCBBUCCAMEFgID AQIeAQIXgAAKCRBy3lIGd+N/BI6RAJ9S97fvbME+3hxzE3JUyUZ6vTewDACdE1stFuSfqMvM jomvZdYxIYyTUpC5Ag0ERND5ghAIAPwsO0B7BL+bz8sLlLoQktGxXwXQfS5cInvL17Dsgnr3 1AKa94j9EnXQyPEj7u0d+LmEe6CGEGDh1OcGFTMVrof2ZzkSy4+FkZwMKJpTiqeaShMh+Goj XlwIMDxyADYvBIg3eN5YdFKaPQpfgSqhT+7El7w+wSZZD8pPQuLAnie5iz9C8iKy4/cMSOrH YUK/tO+Nhw8Jjlw94Ik0T80iEhI2t+XBVjwdfjbq3HrJ0ehqdBwukyeJRYKmbn298KOFQVHO EVbHA4rF/37jzaMadK43FgJ0SAhPPF5l4l89z5oPu0b/+5e2inA3b8J3iGZxywjM+Csq1tqz hltEc7Q+E08AAwUIAL+15XH8bPbjNJdVyg2CMl10JNW2wWg2Q6qdljeaRqeR6zFus7EZTwtX sNzs5bP8y51PSUDJbeiy2RNCNKWFMndM22TZnk3GNG45nQd4OwYK0RZVrikalmJY5Q6m7Z16 4yrZgIXFdKj2t8F+x613/SJW1lIr9/bDp4U9tw0V1g3l2dFtD3p3ZrQ3hpoDtoK70ioIAjjH aIXIAcm3FGZFXy503DOA0KaTWwvOVdYCFLm3zWuSOmrX/GsEc7ovasOWwjPn878qVjbUKWwx Q4QkF4OhUV9zPtf9tDSAZ3x7QSwoKbCoRCZ/xbyTUPyQ1VvNy/mYrBcYlzHodsaqUDjHuW+I SQQYEQIACQUCRND5ggIbDAAKCRBy3lIGd+N/BCO8AJ9j1dWVQWxw/YdTbEyrRKOY8YZNwwCf afMAg8QvmOWnHx3wl8WslCaXaE8= Message-ID: <3c4f5708-a7e7-7f2e-2e97-0efad5e04e1d@FreeBSD.org> Date: Wed, 6 Mar 2019 16:57:09 -0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:60.0) Gecko/20100101 Thunderbird/60.5.1 MIME-Version: 1.0 In-Reply-To: <71E6ECD7-76B4-4C5A-9071-3CA0A20B4F24@yahoo.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: B8C4581FD8 X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [-2.98 / 15.00]; local_wl_from(0.00)[FreeBSD.org]; NEURAL_HAM_MEDIUM(-1.00)[-0.999,0]; NEURAL_HAM_SHORT(-0.98)[-0.980,0]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; ASN(0.00)[asn:11403, ipnet:2610:1c1:1::/48, country:US] X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Mar 2019 00:57:24 -0000 On 3/5/19 8:19 PM, Mark Millard wrote: > In trying to update powerpc64 from head -r344018 based to -r344825 based > context via amd64->powerpc64 cross builds: base/binutils worked okay > but base/gcc failed. This reports for base/gcc . > > (I actually only use base/binutils normally but I try base/gcc in case it > turns out that I need it.) Patch pending review at https://reviews.freebsd.org/D19484 Thanks. -- John Baldwin From owner-freebsd-ppc@freebsd.org Thu Mar 7 02:35:51 2019 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D89001508BED for ; Thu, 7 Mar 2019 02:35:50 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic310-22.consmr.mail.gq1.yahoo.com (sonic310-22.consmr.mail.gq1.yahoo.com [98.137.69.148]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 768E787457 for ; Thu, 7 Mar 2019 02:35:48 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: 5cwPnK8VM1lt9.6L8.6BDwdkK8aC6W65Vpp1AElh_l7Y_baNVbU1f0H96ZkWdQ7 PF5LSH5jx7JtilAoanUVPe9udcVMi0nOGj95EAGK7jbTeEJ7RJvTE2rHSxKyFOJ0YoZMkIIbnVf8 H4Mn8zpYgILGMjyMlOpQX5PUp_Yk7PPrHgJqmqqBOz_jkC2fdRp8Dyo0eH6r1qAwDk7sph_q2JTi RgbAHqtCLAUwtLMJwTl5PSFU4uH7GukR2HnB06ohMOOMH19V5bRNulkbfpwQv0zZqm7nF23iuLLW EdBRrB5SlpDoPhlpVabpAPml.RyuHI6dLoVcwYCft43.iCRfrGYHwN67iqVJbLTEBavbUskhr22_ FxBfWAs0uTx5DDQXRw_N5b7rfNFh90ADiHHlbvrXmo_XxwSPj_JBi5Oh9tYvnMwSQZOZtVI0B.nf c0ul4JQqtlUTYgmDZWYi898CkW2zfLFQydwLIUJiCWG.BwT_AKPzqXOgkOdEKEtvkKaewzyIbivx FY_S.3PmaNRZ.xF.OapWUWGmXelhJLQtXlSeh.SnN64rRo_IxwRmqyTkTUJ7Jx3lCNmT9qX_hrtA e7GMurI_qEQESGGXHY9UP1AUr4tKsiX0lx4PM4w8BM74___TlpEBC2fLPiqJa7AVmjqElw_AYoQz dAQEjJuGf3zITqJr2YnUEdwl3WzJUk89B_WK5pPowRx.X8LVCgbsByKHDt86j5iKNnML6zwHyW.P _Iqce8GCjjJquWMun9SZdz8SEDjcyhsOUFYhN9hfC4G7w.7Wz1Ishu5vUKBGdAHZSYf_Hqiy73.t 9I_lTGiA7SlhJlx3VdTflPSSpafsOgYcSEZa977j9lAbpi59.TGFJHzVyJW5lgKLPDUCrr1U4TsH SG6eySjiQKfCyWm.6O_f2VMb2IeGLrxQs8134tiWprNtcDBOvXUztJdHY_fLFPpDRnBHt7YeTiDD HgsCju9T52SpKUAc8710YyH0wLIQt5XknleUmZun51APcGgKMjLv51w2LQb4EkG5OP7osPoRqCbb H53hwi_kWkHawFv8IQT9EbCiy6cRD3po3th5k_Bsm5D5oNTiwdV8RBElD8RMAJw9JgIabwXxPyOa UstYfU9A- Received: from sonic.gate.mail.ne1.yahoo.com by sonic310.consmr.mail.gq1.yahoo.com with HTTP; Thu, 7 Mar 2019 02:35:46 +0000 Received: from c-67-170-167-181.hsd1.or.comcast.net (EHLO [192.168.1.115]) ([67.170.167.181]) by smtp402.mail.gq1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA ID 82ae82b154c65dbf4065e85c2c3af6d2; Thu, 07 Mar 2019 02:35:43 +0000 (UTC) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\)) Subject: Re: head -r344018 powerpc64 variant on Powermac G5 (2 sockets, 2 cores each): [*buffer arena] shows up more . . .? From: Mark Millard In-Reply-To: <8668AAF7-9E6A-4278-9D1B-2ECDBD3804AA@yahoo.com> Date: Wed, 6 Mar 2019 18:35:42 -0800 Cc: freebsd-ppc Content-Transfer-Encoding: 7bit Message-Id: <99AD89F8-0F90-48BE-A060-DA12FD7129E6@yahoo.com> References: <20190306151914.44ea831c@titan.knownspace> <8668AAF7-9E6A-4278-9D1B-2ECDBD3804AA@yahoo.com> To: Justin Hibbits X-Mailer: Apple Mail (2.3445.102.3) X-Rspamd-Queue-Id: 768E787457 X-Spamd-Bar: ++ X-Spamd-Result: default: False [2.31 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; MV_CASE(0.50)[]; FREEMAIL_FROM(0.00)[yahoo.com]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; MX_GOOD(-0.01)[cached: mta6.am0.yahoodns.net]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; FREEMAIL_TO(0.00)[gmail.com]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; SUBJECT_ENDS_QUESTION(1.00)[]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/21, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com.dwl.dnswl.org : 127.0.5.0]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; FROM_HAS_DN(0.00)[]; NEURAL_SPAM_SHORT(0.95)[0.953,0]; NEURAL_HAM_LONG(-0.23)[-0.227,0]; MIME_GOOD(-0.10)[text/plain]; RCVD_TLS_LAST(0.00)[]; NEURAL_SPAM_MEDIUM(0.45)[0.445,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[148.69.137.98.list.dnswl.org : 127.0.5.0]; IP_SCORE(0.65)[ip: (1.51), ipnet: 98.137.64.0/21(1.00), asn: 36647(0.80), country: US(-0.07)] X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Mar 2019 02:35:51 -0000 [The patch is definitely wrong via a 3rd use of platform_smp_timebase_sync that I'd not noted before. Details at the end.] On 2019-Mar-6, at 16:39, Mark Millard wrote: > On 2019-Mar-6, at 13:19, Justin Hibbits wrote: > >> On Mon, 4 Mar 2019 19:43:09 -0800 >> Mark Millard via freebsd-ppc wrote: >> >>> [It is possible that the following is tied to my hack to >>> avoid threads ending up stuck-sleeping. But I ask about >>> an alternative that I see in the code.] >>> >>> Context: using the modern powerpc64 VM_MAX_KERNEL_ADDRESS >>> and using usefdt=1 on an old Powermac G5 (2 sockets, 2 cores >>> each). Hacks are in use to provide fairly reliable booting >>> and to avoid threads getting stuck sleeping. >>> >>> Before the modern VM_MAX_KERNEL_ADDRESS figure there were only >>> 2 or 3 bufspacedaemon-* threads as I remember. Now there are 8 >>> (plus bufdaemon and its worker), for example: >>> >>> root 23 0.0 0.0 0 288 - DL 15:48 0:00.39 >>> [bufdaemon/bufdaemon] root 23 0.0 0.0 0 288 - DL >>> 15:48 0:00.05 [bufdaemon/bufspaced] root 23 0.0 >>> 0.0 0 288 - DL 15:48 0:00.05 [bufdaemon/bufspaced] >>> root 23 0.0 0.0 0 288 - DL 15:48 0:00.05 >>> [bufdaemon/bufspaced] root 23 0.0 0.0 0 288 - DL >>> 15:48 0:00.05 [bufdaemon/bufspaced] root 23 0.0 >>> 0.0 0 288 - DL 15:48 0:00.05 [bufdaemon/bufspaced] >>> root 23 0.0 0.0 0 288 - DL 15:48 0:00.07 >>> [bufdaemon/bufspaced] root 23 0.0 0.0 0 288 - DL >>> 15:48 0:00.05 [bufdaemon/bufspaced] root 23 0.0 >>> 0.0 0 288 - DL 15:48 0:00.56 [bufdaemon// worker] >>> >>> I'm sometimes seeing processes showing [*buffer arena] that >>> seemed to wait for a fairly long time with that status, not >>> something I'd seen historically for those same types of >>> processes for a similar overall load (not much). During such >>> times trying to create processes to look around at what is >>> going on seems to also wait. (Probably with the same status?) >>> >> >> Hi Mark, >> >> Can you try the attached patch? It might be overkill in the >> synchronization, and I might be using the wrong barriers to be >> considered correct, but I think this should narrow the race down, and >> synchronize the timebases to within a very small margin. The real >> correct fix would be to suspend the timebase on all cores, which is >> feasible (there's a GPIO for the G4s, and i2c for G5s), but that's a >> non-trivial extra work. >> >> Be warned, I haven't tested it, I've only compiled it (I don't have a >> G5 to test with anymore). >> > > Sure, I'll try it when the G5 is again available: it is doing > a time consuming build. > > I do see one possible oddity: tracing another > platform_smp_timebase_sync use in the code . . . > > DEVMETHOD(cpufreq_drv_set, pmufreq_set) > > static int > pmufreq_set(device_t dev, const struct cf_setting *set) > { > . . . > error = pmu_set_speed(speed_sel); > . . . > } > > int > pmu_set_speed(int low_speed) > { > . . . > platform_sleep(); > . . . > } > > PLATFORMMETHOD(platform_sleep, powermac_sleep), > > void > powermac_sleep(platform_t platform) > { > > *(unsigned long *)0x80 = 0x100; > cpu_sleep(); > } > > void > cpu_sleep() > { > . . . > platform_smp_timebase_sync(timebase, 0); > . . . > } > > PLATFORMMETHOD(platform_smp_timebase_sync, powermac_smp_timebase_sync), > > The issue: > > I do not see any matching platform_smp_timebase_sync(timebase, 1) > or other CPUs doing a powermac_smp_timebase_sync in this sequence. > > (If this makes testing the patch inappropriate, let me know.) > More important: There is also a use of: /* The following is needed for restoring from sleep. */ platform_smp_timebase_sync(0, 1); in cpudep_ap_setup . That in turn happens during cpu_reset_handler before machdep_ap_bootstrap is called (which does platform_smp_timebase_sync as well) : cpu_reset_handler: GET_TOCBASE(%r2) ld %r1,TOC_REF(tmpstk)(%r2) /* get new SP */ addi %r1,%r1,(TMPSTKSZ-48) bl CNAME(cpudep_ap_early_bootstrap) /* Set PCPU */ nop lis %r3,1@l bl CNAME(pmap_cpu_bootstrap) /* Turn on virtual memory */ nop bl CNAME(cpudep_ap_bootstrap) /* Set up PCPU and stack */ nop mr %r1,%r3 /* Use new stack */ bl CNAME(cpudep_ap_setup) nop GET_CPUINFO(%r5) ld %r3,(PC_RESTORE)(%r5) cmpldi %cr0,%r3,0 beq %cr0,2f nop li %r4,1 bl CNAME(longjmp) nop 2: #ifdef SMP bl CNAME(machdep_ap_bootstrap) /* And away! */ nop #endif Thus overall for ap's there is the sequence: platform_smp_timebase_sync(0, 1); . . . while (ap_letgo == 0) __asm __volatile("or 31,31,31"); __asm __volatile("or 6,6,6"); /* * Set timebase as soon as possible to meet an implicit rendezvous * from cpu_mp_unleash(), which sets ap_letgo and then immediately * sets timebase. * * Note that this is instrinsically racy and is only relevant on * platforms that do not support better mechanisms. */ platform_smp_timebase_sync(ap_timebase, 1); for each ap . So the (ap) case in powermac_smp_timebase_sync will wait with tb==0 (from cpudep_ap_setup) and the later calls from machdep_ap_bootstrap will not wait and will be after the unleash but not just local to powermac_smp_timebase_sync: static void powermac_smp_timebase_sync(platform_t plat, u_long tb, int ap) { static int cpus; static int unleash; if (ap) { atomic_add_int(&cpus, 1); while (!atomic_load_acq_int(&unleash)) ; } else { atomic_add_int(&cpus, 1); while (atomic_load_int(&cpus) != mp_ncpus) ; atomic_store_rel_int(&unleash, 1); } mttb(tb); } In the end cpus will have double counts of the ap cpus instead of matching mp_ncpus. cpufreq_drv_set activity is a seperate, additional issue from this. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) From owner-freebsd-ppc@freebsd.org Thu Mar 7 04:36:17 2019 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 87C0F150F1D5 for ; Thu, 7 Mar 2019 04:36:17 +0000 (UTC) (envelope-from chmeeedalf@gmail.com) Received: from mail-it1-x141.google.com (mail-it1-x141.google.com [IPv6:2607:f8b0:4864:20::141]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 2FBA58CB43 for ; Thu, 7 Mar 2019 04:36:16 +0000 (UTC) (envelope-from chmeeedalf@gmail.com) Received: by mail-it1-x141.google.com with SMTP id v2so13259301ith.3 for ; Wed, 06 Mar 2019 20:36:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=odaZ+eHeEuQU5nP4iQ1cxO7Md3T16x77bONXqDCdJx0=; b=E3zTAW6R082gzoeNwlAnHlkNuhNrdGznkBBYplqVF/0EzQ1cECI44brKqBARhcIfYe ex2ik8+6Zer/fIiP2Ai8Nxrx6cwyv3cXkUCfBWhARsj36OdxLJ5M68ngB6j2ngn0fPXS V1gKLaqaNpmW/YUjX453rjzzdvMV47A2s4u3zbc8aFIiu5q6kKQztpKZEWJ9uYLbP5I4 pEzjcGGDcfn564/XX6Of1R4RIlmxCknSgHk4T1aTR7/X0TLUshS7p/LhjgQZ744esTyk OxTwavH3vJIJGCjvIvVKyHWjsSVEKyEi6WfqgZznEwVU5XfCZ//Y1GTlLOt5sWpj6Qy4 Ad8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=odaZ+eHeEuQU5nP4iQ1cxO7Md3T16x77bONXqDCdJx0=; b=RetybH1rjHdScKbWMRTa7+KPva+VnLvc7tj81qvqhhiTXbyUlCcTE4Qp2RsFA6tP3P kzclrcpSKHOPpcq/MGOVQLsL/WnhU3o9uHPeMxp2th1vzl2UX3lcG6X4sDlFp+IQOehE Sra5uNQ1h0DS2bEsjLTLP8jXhul6bwRc/WnAgJX/FYt/Nzy89M4lUeLr9kAOvwQqWoGa s4939883zcWlLiPVyrFValdyORfDBBtGfZh5dDUnO+nSxHepPslpbv4beLXY8xdaJCGv eiuNaGXwdTVX7XKQcs/kivmTHCtx5ncyWr6IGMzQIzvCw+6VeKQabQJVxquMx3i4oI9u Ms/g== X-Gm-Message-State: APjAAAW72wBC0Tx2T48XUYE0I1CxAiny6BgUNpdun1kaIyfT5hI1G9/p jWS12FmrpFM1DsqskOzzExM= X-Google-Smtp-Source: APXvYqwpmUArYNx3EAdzkr0tmrRXNtbiKPasQcU/a+Qjra7uXiKo3prjENzBbzvifbtRx8uyIS1K+Q== X-Received: by 2002:a02:c786:: with SMTP id n6mr6619808jao.49.1551933375379; Wed, 06 Mar 2019 20:36:15 -0800 (PST) Received: from titan.knownspace (173-25-245-129.client.mchsi.com. [173.25.245.129]) by smtp.gmail.com with ESMTPSA id d184sm1889266itc.17.2019.03.06.20.36.14 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 06 Mar 2019 20:36:15 -0800 (PST) Date: Wed, 6 Mar 2019 22:36:11 -0600 From: Justin Hibbits To: Mark Millard Cc: freebsd-ppc Subject: Re: head -r344018 powerpc64 variant on Powermac G5 (2 sockets, 2 cores each): [*buffer arena] shows up more . . .? Message-ID: <20190306223611.75c8a87e@titan.knownspace> In-Reply-To: <99AD89F8-0F90-48BE-A060-DA12FD7129E6@yahoo.com> References: <20190306151914.44ea831c@titan.knownspace> <8668AAF7-9E6A-4278-9D1B-2ECDBD3804AA@yahoo.com> <99AD89F8-0F90-48BE-A060-DA12FD7129E6@yahoo.com> X-Mailer: Claws Mail 3.17.3 (GTK+ 2.24.32; powerpc64-portbld-freebsd13.0) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 2FBA58CB43 X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20161025 header.b=E3zTAW6R; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (mx1.freebsd.org: domain of chmeeedalf@gmail.com designates 2607:f8b0:4864:20::141 as permitted sender) smtp.mailfrom=chmeeedalf@gmail.com X-Spamd-Result: default: False [-2.89 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36]; FREEMAIL_FROM(0.00)[gmail.com]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; MX_GOOD(-0.01)[alt3.gmail-smtp-in.l.google.com,alt4.gmail-smtp-in.l.google.com,gmail-smtp-in.l.google.com,alt2.gmail-smtp-in.l.google.com,alt1.gmail-smtp-in.l.google.com]; FREEMAIL_TO(0.00)[yahoo.com]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; SUBJECT_ENDS_QUESTION(1.00)[]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; DWL_DNSWL_NONE(0.00)[gmail.com.dwl.dnswl.org : 127.0.5.0]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20161025]; FROM_HAS_DN(0.00)[]; NEURAL_HAM_SHORT(-0.33)[-0.327,0]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-ppc@freebsd.org]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[1.4.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.4.6.8.4.0.b.8.f.7.0.6.2.list.dnswl.org : 127.0.5.0]; MIME_TRACE(0.00)[0:+]; IP_SCORE(-0.55)[ip: (2.07), ipnet: 2607:f8b0::/32(-2.71), asn: 15169(-2.04), country: US(-0.07)] X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Mar 2019 04:36:17 -0000 On Wed, 6 Mar 2019 18:35:42 -0800 Mark Millard wrote: > [The patch is definitely wrong via a 3rd use of > platform_smp_timebase_sync that I'd not noted before. Details at the > end.] > > On 2019-Mar-6, at 16:39, Mark Millard wrote: > > > > > On 2019-Mar-6, at 13:19, Justin Hibbits > > wrote: > >> On Mon, 4 Mar 2019 19:43:09 -0800 > >> Mark Millard via freebsd-ppc wrote: > >> > >>> [It is possible that the following is tied to my hack to > >>> avoid threads ending up stuck-sleeping. But I ask about > >>> an alternative that I see in the code.] > >>> > >>> Context: using the modern powerpc64 VM_MAX_KERNEL_ADDRESS > >>> and using usefdt=1 on an old Powermac G5 (2 sockets, 2 cores > >>> each). Hacks are in use to provide fairly reliable booting > >>> and to avoid threads getting stuck sleeping. > >>> > >>> Before the modern VM_MAX_KERNEL_ADDRESS figure there were only > >>> 2 or 3 bufspacedaemon-* threads as I remember. Now there are 8 > >>> (plus bufdaemon and its worker), for example: > >>> > >>> root 23 0.0 0.0 0 288 - DL 15:48 0:00.39 > >>> [bufdaemon/bufdaemon] root 23 0.0 0.0 0 288 - > >>> DL 15:48 0:00.05 [bufdaemon/bufspaced] root 23 0.0 > >>> 0.0 0 288 - DL 15:48 0:00.05 [bufdaemon/bufspaced] > >>> root 23 0.0 0.0 0 288 - DL 15:48 0:00.05 > >>> [bufdaemon/bufspaced] root 23 0.0 0.0 0 288 - > >>> DL 15:48 0:00.05 [bufdaemon/bufspaced] root 23 0.0 > >>> 0.0 0 288 - DL 15:48 0:00.05 [bufdaemon/bufspaced] > >>> root 23 0.0 0.0 0 288 - DL 15:48 0:00.07 > >>> [bufdaemon/bufspaced] root 23 0.0 0.0 0 288 - > >>> DL 15:48 0:00.05 [bufdaemon/bufspaced] root 23 0.0 > >>> 0.0 0 288 - DL 15:48 0:00.56 [bufdaemon// worker] > >>> > >>> I'm sometimes seeing processes showing [*buffer arena] that > >>> seemed to wait for a fairly long time with that status, not > >>> something I'd seen historically for those same types of > >>> processes for a similar overall load (not much). During such > >>> times trying to create processes to look around at what is > >>> going on seems to also wait. (Probably with the same status?) > >>> > >> > >> Hi Mark, > >> > >> Can you try the attached patch? It might be overkill in the > >> synchronization, and I might be using the wrong barriers to be > >> considered correct, but I think this should narrow the race down, > >> and synchronize the timebases to within a very small margin. The > >> real correct fix would be to suspend the timebase on all cores, > >> which is feasible (there's a GPIO for the G4s, and i2c for G5s), > >> but that's a non-trivial extra work. > >> > >> Be warned, I haven't tested it, I've only compiled it (I don't > >> have a G5 to test with anymore). > >> > > > > Sure, I'll try it when the G5 is again available: it is doing > > a time consuming build. > > > > I do see one possible oddity: tracing another > > platform_smp_timebase_sync use in the code . . . > > > > DEVMETHOD(cpufreq_drv_set, pmufreq_set) > > > > static int > > pmufreq_set(device_t dev, const struct cf_setting *set) > > { > > . . . > > error = pmu_set_speed(speed_sel); > > . . . > > } > > > > int > > pmu_set_speed(int low_speed) > > { > > . . . > > platform_sleep(); > > . . . > > } > > > > PLATFORMMETHOD(platform_sleep, powermac_sleep), > > > > void > > powermac_sleep(platform_t platform) > > { > > > > *(unsigned long *)0x80 = 0x100; > > cpu_sleep(); > > } > > > > void > > cpu_sleep() > > { > > . . . > > platform_smp_timebase_sync(timebase, 0); > > . . . > > } > > > > PLATFORMMETHOD(platform_smp_timebase_sync, > > powermac_smp_timebase_sync), > > > > The issue: > > > > I do not see any matching platform_smp_timebase_sync(timebase, 1) > > or other CPUs doing a powermac_smp_timebase_sync in this sequence. > > > > (If this makes testing the patch inappropriate, let me know.) > > > > More important: There is also a use of: > > /* The following is needed for restoring from sleep. */ > platform_smp_timebase_sync(0, 1); > > in cpudep_ap_setup . That in turn happens during cpu_reset_handler > before machdep_ap_bootstrap is called (which does > platform_smp_timebase_sync as well) : > > cpu_reset_handler: > GET_TOCBASE(%r2) > > ld %r1,TOC_REF(tmpstk)(%r2) /* get new SP */ > addi %r1,%r1,(TMPSTKSZ-48) > > bl CNAME(cpudep_ap_early_bootstrap) /* Set PCPU */ > nop > lis %r3,1@l > bl CNAME(pmap_cpu_bootstrap) /* Turn on virtual > memory */ nop > bl CNAME(cpudep_ap_bootstrap) /* Set up PCPU and > stack */ nop > mr %r1,%r3 /* Use new stack */ > bl CNAME(cpudep_ap_setup) > nop > GET_CPUINFO(%r5) > ld %r3,(PC_RESTORE)(%r5) > cmpldi %cr0,%r3,0 > beq %cr0,2f > nop > li %r4,1 > bl CNAME(longjmp) > nop > 2: > #ifdef SMP > bl CNAME(machdep_ap_bootstrap) /* And away! */ > nop > #endif > > Thus overall for ap's there is the sequence: > > platform_smp_timebase_sync(0, 1); > . . . > while (ap_letgo == 0) > __asm __volatile("or 31,31,31"); > __asm __volatile("or 6,6,6"); > > /* > * Set timebase as soon as possible to meet an implicit > rendezvous > * from cpu_mp_unleash(), which sets ap_letgo and then > immediately > * sets timebase. > * > * Note that this is instrinsically racy and is only relevant > on > * platforms that do not support better mechanisms. > */ > platform_smp_timebase_sync(ap_timebase, 1); > > for each ap . So the (ap) case in powermac_smp_timebase_sync > will wait with tb==0 (from cpudep_ap_setup) and the later calls > from machdep_ap_bootstrap will not wait and will be after the > unleash but not just local to powermac_smp_timebase_sync: > > static void > powermac_smp_timebase_sync(platform_t plat, u_long tb, int ap) > { > static int cpus; > static int unleash; > > if (ap) { > atomic_add_int(&cpus, 1); > while (!atomic_load_acq_int(&unleash)) > ; > } else { > atomic_add_int(&cpus, 1); > while (atomic_load_int(&cpus) != mp_ncpus) > ; > atomic_store_rel_int(&unleash, 1); > } > > mttb(tb); > } > > In the end cpus will have double counts of the ap cpus instead > of matching mp_ncpus. > > cpufreq_drv_set activity is a seperate, additional issue from this. > > === > Mark Millard > marklmi at yahoo.com > ( dsl-only.net went > away in early 2018-Mar) > As mentioned, I had only compiled it. Your examination of the code path demonstrates that the patch is insufficient, and would hang at unleash anyway. The sleep/wake logic probably needs to be updated anyway. It was written for a G4 powerbook primarily for the PMU-based cpufreq driver, so some bits might need to be moved around. Orthogonal to this issue, though. - Justin From owner-freebsd-ppc@freebsd.org Thu Mar 7 08:07:57 2019 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9D2551522C08 for ; Thu, 7 Mar 2019 08:07:57 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic313-10.consmr.mail.ne1.yahoo.com (sonic313-10.consmr.mail.ne1.yahoo.com [66.163.185.33]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 7A5286F724 for ; Thu, 7 Mar 2019 08:07:56 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: Hg7VX.UVM1kFSNX0OLPFlQBJ4kZ695Vq3wK7S1ZCzNz7pAHmP.FnBF3XUWN5aHY qbeEwlv_gi1fP_T4ha9JpbzwQNrWVDRXf14MUZGBaDrb1.dAbgVeE1NUfRpdHe8ysFtPZ6MndVv4 Mbglfntg_I8phJeHYJsoS.p0dYybLF9PK18IX5XUHv2EWQ2gO6LCWfiUmuYViZdjCQuOkJpx.Tgs NqAqf_inBhkqxV._gAOGHV3VZ.in6reBbRbfthPsBnD9ZPB53rDLKhRRnbtjomzpjcbJlAo1fY.0 WEKF.M8zdBh0FUqNuZzdELlyKfL_BZ9skhBM5gB3OBG2o6CprW_jM38kPdAzJPIVOGQDXxXiUSD1 eHfeFPEeHSeYxcpKnJhXNUEFXFPxIgHIyOAnxhkJ7xgJYKfrqP54mVJYUfGRv_4RJngagCbJ4DWW rVakRH5e9sDyTYEBEopK8rJTUf8mGgNxIjjgCkh1TK0YudEZNz752IeKoV2dOYikLCbo8krtVP9w 6HAofsqSodZZ0Y7AKmwu4aAWvcb2JN5UGBTaYOftoeyjPFEAV2h3jFG_kW9SImypFgSBt90OR.Z7 .tfTDZdEe8ow08.W8aZE7Ksc.v6qcRgiW3A0puV1MnGTOPpeK_SGcsWrPIx8QjcVDjHe4iXTME2P Xfj9JgvJcABoH0buAJCRR_l6u5sFGYS5I79R_vHiJXTwDiJYCSE.mb9VQHKKcfa_z2SZDd8PfXS. qNtifZk6DZbAOd7mZo9I5.myGEJNosnHrf7uOdpPkpyWOmPI0.IUAikV5xNhcu46heIAorY04MyU SOwMpFNgRtPwNpo.5aJfK8hw8Krkg0_4BH.kt8_FeT9i2ShNW5MODaecsP19uQyZu3lYoKvCFksX nLT9Ys7QST8OYveo0WsCHBNkc.bAf6oWMpwRD7CY_0w__BcvMG5M7He0Zskixo09aFrElZCZDot7 Q3TFKhS3bqCGYlkegJOjNKgmU0GSQUnW1WgR3dYDBCBdOtjlA5i8oQoYw.0Mopg36dSlxl3yzqKL OEHAeIIYk.hpnkVnuAYPUZ7xmqnJ_7rwceF0ZuZOslgP9VHKnVqoqQyL_SlAjQJDuNsv99TP1mw- - Received: from sonic.gate.mail.ne1.yahoo.com by sonic313.consmr.mail.ne1.yahoo.com with HTTP; Thu, 7 Mar 2019 08:07:54 +0000 Received: from c-67-170-167-181.hsd1.or.comcast.net (EHLO [192.168.1.113]) ([67.170.167.181]) by smtp407.mail.ne1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA ID ca89b1c30a869f7402942a13805b02de; Thu, 07 Mar 2019 07:37:31 +0000 (UTC) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\)) Subject: Re: head -r344018 powerpc64 variant on Powermac G5 (2 sockets, 2 cores each): [*buffer arena] shows up more . . .? From: Mark Millard In-Reply-To: <20190306223611.75c8a87e@titan.knownspace> Date: Wed, 6 Mar 2019 23:37:29 -0800 Cc: freebsd-ppc Content-Transfer-Encoding: 7bit Message-Id: <34E7AD29-2616-4E49-BABA-FB5B5713F338@yahoo.com> References: <20190306151914.44ea831c@titan.knownspace> <8668AAF7-9E6A-4278-9D1B-2ECDBD3804AA@yahoo.com> <99AD89F8-0F90-48BE-A060-DA12FD7129E6@yahoo.com> <20190306223611.75c8a87e@titan.knownspace> To: Justin Hibbits X-Mailer: Apple Mail (2.3445.102.3) X-Rspamd-Queue-Id: 7A5286F724 X-Spamd-Bar: ++++ X-Spamd-Result: default: False [4.83 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; MV_CASE(0.50)[]; FREEMAIL_FROM(0.00)[yahoo.com]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; MX_GOOD(-0.01)[cached: mta6.am0.yahoodns.net]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; FREEMAIL_TO(0.00)[gmail.com]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; SUBJECT_ENDS_QUESTION(1.00)[]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36646, ipnet:66.163.184.0/21, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com.dwl.dnswl.org : 127.0.5.0]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; FROM_HAS_DN(0.00)[]; NEURAL_SPAM_SHORT(0.96)[0.960,0]; MIME_GOOD(-0.10)[text/plain]; RCVD_TLS_LAST(0.00)[]; NEURAL_SPAM_MEDIUM(0.99)[0.986,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_SPAM_LONG(0.93)[0.931,0]; RCVD_IN_DNSWL_NONE(0.00)[33.185.163.66.list.dnswl.org : 127.0.5.0]; IP_SCORE(1.46)[ip: (5.13), ipnet: 66.163.184.0/21(1.25), asn: 36646(1.00), country: US(-0.07)]; RWL_MAILSPIKE_POSSIBLE(0.00)[33.185.163.66.rep.mailspike.net : 127.0.0.17] X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Mar 2019 08:07:57 -0000 On 2019-Mar-6, at 20:36, Justin Hibbits wrote: > On Wed, 6 Mar 2019 18:35:42 -0800 > Mark Millard wrote: > >> . . . >> > > As mentioned, I had only compiled it. Your examination of the code > path demonstrates that the patch is insufficient, and would hang at > unleash anyway. The sleep/wake logic probably needs to be updated > anyway. It was written for a G4 powerbook primarily for the PMU-based > cpufreq driver, so some bits might need to be moved around. Orthogonal > to this issue, though. No problem. I looked before I leaped in this case. I had time to do so because of the on-going build. While I do not have access right now, I do sometimes have access to a couple of 2-processor PowerMac G4s, but they are desktop machines. For all I know they might also have the tbr mis-match problems with th->th_offset_count in binputime as well. So at some point I may be able to test such a context. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) From owner-freebsd-ppc@freebsd.org Thu Mar 7 14:31:41 2019 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D4DDC153112F; Thu, 7 Mar 2019 14:31:40 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au [211.29.132.42]) by mx1.freebsd.org (Postfix) with ESMTP id 74F1388991; Thu, 7 Mar 2019 14:31:39 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id 7BC2C3D92DB; Fri, 8 Mar 2019 01:31:32 +1100 (AEDT) Date: Fri, 8 Mar 2019 01:31:30 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Konstantin Belousov cc: Bruce Evans , Mark Millard , freebsd-hackers Hackers , FreeBSD PowerPC ML Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed] In-Reply-To: <20190306172003.GD2492@kib.kiev.ua> Message-ID: <20190308001005.M2756@besplex.bde.org> References: <20190302105140.GC68879@kib.kiev.ua> <20190302225513.W3408@besplex.bde.org> <20190302142521.GE68879@kib.kiev.ua> <20190303041441.V4781@besplex.bde.org> <20190303111931.GI68879@kib.kiev.ua> <20190303223100.B3572@besplex.bde.org> <20190303161635.GJ68879@kib.kiev.ua> <20190304043416.V5640@besplex.bde.org> <20190304114150.GM68879@kib.kiev.ua> <20190305031010.I4610@besplex.bde.org> <20190306172003.GD2492@kib.kiev.ua> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=FNpr/6gs c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=GReyFr9QJwj15KPVhA0A:9 a=CjuIK1q_8ugA:10 X-Rspamd-Queue-Id: 74F1388991 X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [-6.91 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; REPLY(-4.00)[]; NEURAL_HAM_SHORT(-0.91)[-0.914,0] X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Mar 2019 14:31:41 -0000 On Wed, 6 Mar 2019, Konstantin Belousov wrote: > On Tue, Mar 05, 2019 at 05:17:14AM +1100, Bruce Evans wrote: >> On Mon, 4 Mar 2019, Konstantin Belousov wrote: >> >>> On Mon, Mar 04, 2019 at 05:29:48AM +1100, Bruce Evans wrote: >>>> On Sun, 3 Mar 2019, Konstantin Belousov wrote: >>>> >>>>> On Mon, Mar 04, 2019 at 12:32:12AM +1100, Bruce Evans wrote: >* ... >> I strongly disklike the merge. >> >>>>> So I verified that: >>>>> - there is no 64bit multiplication in the generated code, for i386 both >>>>> for clang 7.0 and gcc 8.3; >>>>> - that everything is inlined, the only call from bintime/binuptime is >>>>> the indirect call to get the timecounter value. >>>> >>>> I will have to fix it for compilers that I use. >>> Ok, I will add __inline. >> >> That will make it fast enough, but still hard to read. >> >>>>> + *bt = *bts; >>>>> + scale = th->th_scale; >>>>> + delta = tc_delta(th); >>>>> +#ifdef _LP64 >>>>> + if (__predict_false(th->th_large_delta <= delta)) { >>>>> + /* Avoid overflow for scale * delta. */ >>>>> + bintime_helper(bt, scale, delta); >>>>> + bintime_addx(bt, (scale & 0xffffffff) * delta); >>>>> + } else { >>>>> + bintime_addx(bt, scale * delta); >>>>> + } >>>>> +#else >>>>> + /* >>>>> + * Use bintime_helper() unconditionally, since the fast >>>>> + * path in the above method is not so fast here, since >>>>> + * the 64 x 32 -> 64 bit multiplication is usually not >>>>> + * available in hardware and emulating it using 2 >>>>> + * 32 x 32 -> 64 bit multiplications uses code much >>>>> + * like that in bintime_helper(). >>>>> + */ >>>>> + bintime_helper(bt, scale, delta); >>>>> + bintime_addx(bt, (uint64_t)(uint32_t)scale * delta); >>>>> +#endif >>>> >>>> Check that this method is really better. Without this, the complicated >>>> part is about half as large and duplicating it is smaller than this >>>> version. >>> Better in what sence ? I am fine with the C code, and asm code looks >>> good. >> >> Better in terms of actually running significantly faster. I fear the >> 32-bit method is actually slightly slower for the fast path. I checked that it is just worse. Significantly slower and more complicated. I wrote and run a lot of timing benchmarks of various versions. All times in cycles on Haswell @4.08 GHz. On i386 except where noted: - the fastest case is when compiled by clang with the default of -O2. binuptime() in a loop then takes 34 cycles. This is faster than possible for latency, since rdtsc alone has a latency of 24 cycles. There must be several iterations of the loop running in parallel. - the slowest case is when compiled by gcc-4.2.1 with my config of -Os. binuptime() in a loop then takes 116 cycles. -Os does at least the following pessimization: use memcpy() for copying the 12-byte struct bitime. - gcc-4.2.1 -O2 takes 74 cycles. -O2 still does the following pessimization: do a 64 x 32 -> 64 bit multiplication after not noticing that the first operand has been reduced to 32 bits by a shift or mask. The above tests were done with the final version. The version which tested alternatives used switch (method) and takes about 20 cycles longer for the fastest version, presumably by defeating parallelism. Times for various methods: - with clang -Os, about 54 cycles for the old method that allowed overflow, and the same for the version with the check of the overflow threshold (but with the threshold never reached), and 59 cycles for the branch- free method. 100-116 cycles with gcc-4.2.1 -Os, with the branch-free method taking 5-10 cycles longer. - on amd64, only a couple of cycles faster (49-50 cycles in best cases), and gcc-4.2.1 only taking a ouple of cycles longer. The branch-free method still takes about 59 cycles so it is relatively worse. In userland, using the syscall for syscall for clock_gettime(), the extra 5-10 cycles for the branch-free method is relatively insignificat. It is about 2 nanonseconds. Other pessimizatations are more significant. Times for this syscall: - amd64 now: 224 nsec (with gcc-4.2.1 -Os) - i386 4+4 nopae: 500-580 nsec (depending on clang/gcc and -O2/-Os) even getpid(2) takes 280 nsec. Add at least 140 more nsec for pae. - i386 3+1: 224 nsec (with gcc 4.2.1 -Os) - i386 FreeBSD-5 UP: 193 nsec (with gcc-3.3.3 -O). - i386 4+4 nopae old library version of clock_gettime() compiled by clang: 29 nsec. In some tests, the version with the branch was even a cycle or two faster. In the tests, the branch was always perfectly predicted, so costs nothing except possibly by changing scheduling in an accidentally good way. The tests were too small to measure the cost of using branch prediction resources. I've never noticed a case where 1 more branch causes thrashing. >>>>> - do { >>>>> - th = timehands; >>>>> - gen = atomic_load_acq_int(&th->th_generation); >>>>> - *bt = th->th_bintime; >>>>> - bintime_addx(bt, th->th_scale * tc_delta(th)); >>>>> - atomic_thread_fence_acq(); >>>>> - } while (gen == 0 || gen != th->th_generation); >>>> >>>> Duplicating this loop is much better than obfuscating it using inline >>>> functions. This loop was almost duplicated (except for the delta >>>> calculation) in no less than 17 functions in kern_tc.c (9 tc ones and >>>> 8 fflock ones). Now it is only duplicated 16 times. >>> How did you counted the 16 ? I can see only 4 instances in the unpatched >>> kern_tc.c, and 3 in patched, but it is 3 and not 1 only because I do not >>> touch ffclock until the patch is finalized. After that, it would be >>> 1 instance for kernel and 1 for userspace. >> >> Grep for the end condition in this loop. There are actually 20 of these. >> I'm counting the loops and not the previously-simple scaling operation in >> it. The scaling is indeed only done for 4 cases. I prefer the 20 >> duplications (except I only want about 6 of the functions). Duplication >> works even better for only 4 cases. > Ok, I merged these as well. Now there are only four loops left in kernel. > I do not think that merging them is beneficial, since they have sufficiently > different bodies. This is exacly what I don't want. > > I disagree with you characterization of it as obfuscation, IMO it improves > the maintainability of the code by reducing number of places which need > careful inspection of the lock-less algorithm. It makes the inspection and changes more difficult for each instance. General functions are more difficult to work with since they need more args to control them and can't be changed without affecting all callers. In another thread, you didn't like similar churn for removing td args. Here there isn't even a bug, since overflow only occurs when an invariant is violated. >> This should be written as a function call to 1 new function to replace >> the line with the overflowing multiplication. The line is always the >> same, so the new function call can look like bintime_xxx(bt, th). > Again, please provide at least of a pseudocode of your preference. The following is a complete tested and benchmarked implementation, with a couple more minor fixes: XX Index: kern_tc.c XX =================================================================== XX --- kern_tc.c (revision 344852) XX +++ kern_tc.c (working copy) XX @@ -72,6 +72,7 @@ XX struct timecounter *th_counter; XX int64_t th_adjustment; XX uint64_t th_scale; XX + u_int th_large_delta; XX u_int th_offset_count; XX struct bintime th_offset; XX struct bintime th_bintime; Improvement not already discussed: use a u_int limit for the u_int variable. XX @@ -90,6 +91,7 @@ XX static struct timehands th0 = { XX .th_counter = &dummy_timecounter, XX .th_scale = (uint64_t)-1 / 1000000, XX + .th_large_delta = 1000000, XX .th_offset = { .sec = 1 }, XX .th_generation = 1, XX .th_next = &th1 Fix not already discussed: th_large_delta was used in the dummy timehands before it was initialized. Static initialization to 0 gives fail-safe behaviour and unintended exercizing of the slow path. The dummy timecounter has a low frequency, so its overflow threshold is quite low. I think it is not used even 1000000 times unless there is a bug in the boot code, so it doesn't overflow in practice. I did see some strange crashes at boot time while testing this. XX @@ -351,6 +353,26 @@ XX } while (gen == 0 || gen != th->th_generation); XX } XX #else /* !FFCLOCK */ XX + XX +static __inline void XX +bintime_adddelta(struct bintime *bt, struct timehands *th) Only 1 utility function now. XX +{ XX + uint64_t scale, x; XX + u_int delta; XX + XX + scale = th->th_scale; XX + delta = tc_delta(th); XX + if (__predict_false(delta < th->th_large_delta)) { XX + /* Avoid overflow for scale * delta. */ XX + x = (scale >> 32) * delta; XX + bt->sec += x >> 32; XX + bintime_addx(bt, x << 32); XX + bintime_addx(bt, (scale & 0xffffffff) * delta); This is clearer with all the scaling code together. I thought of renaming x to x95_32 to sort of document that it holds bits 95..32 in a component of the product. XX + } else { XX + bintime_addx(bt, scale * delta); XX + } XX +} XX + XX void XX binuptime(struct bintime *bt) XX { XX @@ -361,7 +383,7 @@ XX th = timehands; XX gen = atomic_load_acq_int(&th->th_generation); XX *bt = th->th_offset; XX - bintime_addx(bt, th->th_scale * tc_delta(th)); XX + bintime_adddelta(bt, th); XX atomic_thread_fence_acq(); XX } while (gen == 0 || gen != th->th_generation); XX } This is the kind of non-churning change that I like. The function name bintime_adddelta() isn't so good, but it is in the same style as bintime_addx() where the names are worse. bintime_addx() is global so it needs a descriptive name more. 'delta' is more descriptive than 'x' (x means a scalar and not a bintime). The 'bintime' prefix is verbose. It should be bt, especially in non-global APIs. XX @@ -394,7 +416,7 @@ XX th = timehands; XX gen = atomic_load_acq_int(&th->th_generation); XX *bt = th->th_bintime; XX - bintime_addx(bt, th->th_scale * tc_delta(th)); XX + bintime_adddelta(bt, th); XX atomic_thread_fence_acq(); XX } while (gen == 0 || gen != th->th_generation); XX } XX @@ -1464,6 +1486,7 @@ XX scale += (th->th_adjustment / 1024) * 2199; XX scale /= th->th_counter->tc_frequency; XX th->th_scale = scale * 2; XX + th->th_large_delta = MIN(((uint64_t)1 << 63) / scale, UINT_MAX); XX XX /* XX * Now that the struct timehands is again consistent, set the new Clamp this to UINT_MAX now that it is stored in a u_int. > The current patch becomes to large already, I want to test/commit what > I already have, and I will need to split it for the commit. It was already too large. > > diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c > index 2656fb4d22f..7114a0e5219 100644 > --- a/sys/kern/kern_tc.c > +++ b/sys/kern/kern_tc.c > ... > @@ -200,22 +201,77 @@ tc_delta(struct timehands *th) > * the comment in for a description of these 12 functions. > */ > > -#ifdef FFCLOCK > -void > -fbclock_binuptime(struct bintime *bt) > +static __inline void > +bintime_helper(struct bintime *bt, uint64_t scale, u_int delta) This name is not descriptive. > +static __inline void > +binnouptime(struct bintime *bt, u_int off) This name is an example of further problems with the naming scheme. The bintime_ prefix used above is verbose, but it is at least a prefix and is in the normal bintime_ namespace. Here the prefix is 'bin', which is neither of these. It means bintime_ again, but this duplicates 'time'. If I liked churn, then I would have changed all names here long ago. E.g.: - bintime_ -> bt_, and use it consistently - timecounter -> tc except for the timecounter public variable - fb_ -> facebook_ -> /dev/null. Er, fb_ -> fbt_ or -> ft_. - bt -> btp when bt is a pointer. You used bts for a struct in this patch - unsigned int -> u_int. I policed this in early timecounter code. You fixed some instances of this too. - th_generation -> th_gen. Bruce From owner-freebsd-ppc@freebsd.org Thu Mar 7 22:22:35 2019 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id EF9B215280EE; Thu, 7 Mar 2019 22:22:34 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id EACD675CB4; Thu, 7 Mar 2019 22:22:33 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.15.2/8.15.2) with ESMTPS id x27MMMbY024576 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Fri, 8 Mar 2019 00:22:25 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.10.3 kib.kiev.ua x27MMMbY024576 Received: (from kostik@localhost) by tom.home (8.15.2/8.15.2/Submit) id x27MMKjN024519; Fri, 8 Mar 2019 00:22:20 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 8 Mar 2019 00:22:20 +0200 From: Konstantin Belousov To: Bruce Evans Cc: Mark Millard , freebsd-hackers Hackers , FreeBSD PowerPC ML Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed] Message-ID: <20190307222220.GK2492@kib.kiev.ua> References: <20190302142521.GE68879@kib.kiev.ua> <20190303041441.V4781@besplex.bde.org> <20190303111931.GI68879@kib.kiev.ua> <20190303223100.B3572@besplex.bde.org> <20190303161635.GJ68879@kib.kiev.ua> <20190304043416.V5640@besplex.bde.org> <20190304114150.GM68879@kib.kiev.ua> <20190305031010.I4610@besplex.bde.org> <20190306172003.GD2492@kib.kiev.ua> <20190308001005.M2756@besplex.bde.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190308001005.M2756@besplex.bde.org> User-Agent: Mutt/1.11.3 (2019-02-01) X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FROM, NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on tom.home X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Mar 2019 22:22:35 -0000 On Fri, Mar 08, 2019 at 01:31:30AM +1100, Bruce Evans wrote: > On Wed, 6 Mar 2019, Konstantin Belousov wrote: > > > On Tue, Mar 05, 2019 at 05:17:14AM +1100, Bruce Evans wrote: > >> On Mon, 4 Mar 2019, Konstantin Belousov wrote: > >> > >>> On Mon, Mar 04, 2019 at 05:29:48AM +1100, Bruce Evans wrote: > >>>> On Sun, 3 Mar 2019, Konstantin Belousov wrote: > >>>> > >>>>> On Mon, Mar 04, 2019 at 12:32:12AM +1100, Bruce Evans wrote: > >* ... > >> I strongly disklike the merge. > >> > >>>>> So I verified that: > >>>>> - there is no 64bit multiplication in the generated code, for i386 both > >>>>> for clang 7.0 and gcc 8.3; > >>>>> - that everything is inlined, the only call from bintime/binuptime is > >>>>> the indirect call to get the timecounter value. > >>>> > >>>> I will have to fix it for compilers that I use. > >>> Ok, I will add __inline. > >> > >> That will make it fast enough, but still hard to read. > >> > >>>>> + *bt = *bts; > >>>>> + scale = th->th_scale; > >>>>> + delta = tc_delta(th); > >>>>> +#ifdef _LP64 > >>>>> + if (__predict_false(th->th_large_delta <= delta)) { > >>>>> + /* Avoid overflow for scale * delta. */ > >>>>> + bintime_helper(bt, scale, delta); > >>>>> + bintime_addx(bt, (scale & 0xffffffff) * delta); > >>>>> + } else { > >>>>> + bintime_addx(bt, scale * delta); > >>>>> + } > >>>>> +#else > >>>>> + /* > >>>>> + * Use bintime_helper() unconditionally, since the fast > >>>>> + * path in the above method is not so fast here, since > >>>>> + * the 64 x 32 -> 64 bit multiplication is usually not > >>>>> + * available in hardware and emulating it using 2 > >>>>> + * 32 x 32 -> 64 bit multiplications uses code much > >>>>> + * like that in bintime_helper(). > >>>>> + */ > >>>>> + bintime_helper(bt, scale, delta); > >>>>> + bintime_addx(bt, (uint64_t)(uint32_t)scale * delta); > >>>>> +#endif > >>>> > >>>> Check that this method is really better. Without this, the complicated > >>>> part is about half as large and duplicating it is smaller than this > >>>> version. > >>> Better in what sence ? I am fine with the C code, and asm code looks > >>> good. > >> > >> Better in terms of actually running significantly faster. I fear the > >> 32-bit method is actually slightly slower for the fast path. > > I checked that it is just worse. Significantly slower and more complicated. > > I wrote and run a lot of timing benchmarks of various versions. All > times in cycles on Haswell @4.08 GHz. On i386 except where noted: > > - the fastest case is when compiled by clang with the default of -O2. > binuptime() in a loop then takes 34 cycles. This is faster than possible > for latency, since rdtsc alone has a latency of 24 cycles. There must be > several iterations of the loop running in parallel. > > - the slowest case is when compiled by gcc-4.2.1 with my config of -Os. > binuptime() in a loop then takes 116 cycles. -Os does at least the > following pessimization: use memcpy() for copying the 12-byte struct > bitime. > > - gcc-4.2.1 -O2 takes 74 cycles. -O2 still does the following pessimization: > do a 64 x 32 -> 64 bit multiplication after not noticing that the first > operand has been reduced to 32 bits by a shift or mask. > > The above tests were done with the final version. The version which tested > alternatives used switch (method) and takes about 20 cycles longer for the > fastest version, presumably by defeating parallelism. Times for various > methods: > > - with clang -Os, about 54 cycles for the old method that allowed overflow, > and the same for the version with the check of the overflow threshold > (but with the threshold never reached), and 59 cycles for the branch- > free method. 100-116 cycles with gcc-4.2.1 -Os, with the branch-free > method taking 5-10 cycles longer. > > - on amd64, only a couple of cycles faster (49-50 cycles in best cases), > and gcc-4.2.1 only taking a ouple of cycles longer. The branch-free > method still takes about 59 cycles so it is relatively worse. > > In userland, using the syscall for syscall for clock_gettime(), the > extra 5-10 cycles for the branch-free method is relatively insignificat. > It is about 2 nanonseconds. Other pessimizatations are more significant. > Times for this syscall: > - amd64 now: 224 nsec (with gcc-4.2.1 -Os) > - i386 4+4 nopae: 500-580 nsec (depending on clang/gcc and -O2/-Os) > even getpid(2) takes 280 nsec. Add at least 140 more nsec for pae. > - i386 3+1: 224 nsec (with gcc 4.2.1 -Os) > - i386 FreeBSD-5 UP: 193 nsec (with gcc-3.3.3 -O). > - i386 4+4 nopae old library version of clock_gettime() compiled by > clang: 29 nsec. > > In some tests, the version with the branch was even a cycle or two faster. > In the tests, the branch was always perfectly predicted, so costs nothing > except possibly by changing scheduling in an accidentally good way. The > tests were too small to measure the cost of using branch prediction > resources. I've never noticed a case where 1 more branch causes thrashing. About testing such tight loops. There is a known phenomen where Intel CPUs give absurd times when code in the loop has unsuitable alignment. The manifestation of the phenomen is very surprising and hardly controllable. It is due to the way the CPU front-end prefetches blocks of bytes for instruction decoding and jmps locations in the blocks. The only source explaining it is https://www.youtube.com/watch?v=IX16gcX4vDQ the talk of Intel engineer. > > >>>>> - do { > >>>>> - th = timehands; > >>>>> - gen = atomic_load_acq_int(&th->th_generation); > >>>>> - *bt = th->th_bintime; > >>>>> - bintime_addx(bt, th->th_scale * tc_delta(th)); > >>>>> - atomic_thread_fence_acq(); > >>>>> - } while (gen == 0 || gen != th->th_generation); > >>>> > >>>> Duplicating this loop is much better than obfuscating it using inline > >>>> functions. This loop was almost duplicated (except for the delta > >>>> calculation) in no less than 17 functions in kern_tc.c (9 tc ones and > >>>> 8 fflock ones). Now it is only duplicated 16 times. > >>> How did you counted the 16 ? I can see only 4 instances in the unpatched > >>> kern_tc.c, and 3 in patched, but it is 3 and not 1 only because I do not > >>> touch ffclock until the patch is finalized. After that, it would be > >>> 1 instance for kernel and 1 for userspace. > >> > >> Grep for the end condition in this loop. There are actually 20 of these. > >> I'm counting the loops and not the previously-simple scaling operation in > >> it. The scaling is indeed only done for 4 cases. I prefer the 20 > >> duplications (except I only want about 6 of the functions). Duplication > >> works even better for only 4 cases. > > Ok, I merged these as well. Now there are only four loops left in kernel. > > I do not think that merging them is beneficial, since they have sufficiently > > different bodies. > > This is exacly what I don't want. > > > > I disagree with you characterization of it as obfuscation, IMO it improves > > the maintainability of the code by reducing number of places which need > > careful inspection of the lock-less algorithm. > > It makes the inspection and changes more difficult for each instance. > General functions are more difficult to work with since they need more > args to control them and can't be changed without affecting all callers. > > In another thread, you didn't like similar churn for removing td args. It is not similar. I do valid refactoring there (in terms of that thread, I do not like the term refactoring). I eliminate dozen instrances of very intricate loop which implements quite delicate lockless algorithm. Its trickiness can be illustrated by the fact that it is only valid use of thread_fence_acq() which cannot be replaced by load_acq() (similar case is present in sys/seq.h). > Here there isn't even a bug, since overflow only occurs when an invariant > is violated. > > >> This should be written as a function call to 1 new function to replace > >> the line with the overflowing multiplication. The line is always the > >> same, so the new function call can look like bintime_xxx(bt, th). > > Again, please provide at least of a pseudocode of your preference. > > The following is a complete tested and benchmarked implementation, with a > couple more minor fixes: > > XX Index: kern_tc.c > XX =================================================================== > XX --- kern_tc.c (revision 344852) > XX +++ kern_tc.c (working copy) > XX @@ -72,6 +72,7 @@ > XX struct timecounter *th_counter; > XX int64_t th_adjustment; > XX uint64_t th_scale; > XX + u_int th_large_delta; > XX u_int th_offset_count; > XX struct bintime th_offset; > XX struct bintime th_bintime; > > Improvement not already discussed: use a u_int limit for the u_int variable. > > XX @@ -90,6 +91,7 @@ > XX static struct timehands th0 = { > XX .th_counter = &dummy_timecounter, > XX .th_scale = (uint64_t)-1 / 1000000, > XX + .th_large_delta = 1000000, > XX .th_offset = { .sec = 1 }, > XX .th_generation = 1, > XX .th_next = &th1 > > Fix not already discussed: th_large_delta was used in the dummy timehands > before it was initialized. Static initialization to 0 gives fail-safe > behaviour and unintended exercizing of the slow path. > > The dummy timecounter has a low frequency, so its overflow threshold is > quite low. I think it is not used even 1000000 times unless there is a > bug in the boot code, so it doesn't overflow in practice. I did see > some strange crashes at boot time while testing this. > > XX @@ -351,6 +353,26 @@ > XX } while (gen == 0 || gen != th->th_generation); > XX } > XX #else /* !FFCLOCK */ > XX + > XX +static __inline void > XX +bintime_adddelta(struct bintime *bt, struct timehands *th) > > Only 1 utility function now. And in my patch this helper function is called only once, so I inlined it manually. > > XX +{ > XX + uint64_t scale, x; > XX + u_int delta; > XX + > XX + scale = th->th_scale; > XX + delta = tc_delta(th); > XX + if (__predict_false(delta < th->th_large_delta)) { > XX + /* Avoid overflow for scale * delta. */ > XX + x = (scale >> 32) * delta; > XX + bt->sec += x >> 32; > XX + bintime_addx(bt, x << 32); > XX + bintime_addx(bt, (scale & 0xffffffff) * delta); > > This is clearer with all the scaling code together. > > I thought of renaming x to x95_32 to sort of document that it holds bits > 95..32 in a component of the product. > > XX + } else { > XX + bintime_addx(bt, scale * delta); > XX + } > XX +} > XX + > XX void > XX binuptime(struct bintime *bt) > XX { > XX @@ -361,7 +383,7 @@ > XX th = timehands; > XX gen = atomic_load_acq_int(&th->th_generation); > XX *bt = th->th_offset; > XX - bintime_addx(bt, th->th_scale * tc_delta(th)); > XX + bintime_adddelta(bt, th); > XX atomic_thread_fence_acq(); > XX } while (gen == 0 || gen != th->th_generation); > XX } > > This is the kind of non-churning change that I like. Ok. I made all cases where timehands are read, more uniform by moving calculations after the generation loop. This makes the atomic part of the functions easier to see, and loop body has lower chance to hit generation reset. > > The function name bintime_adddelta() isn't so good, but it is in the same > style as bintime_addx() where the names are worse. bintime_addx() is global > so it needs a descriptive name more. 'delta' is more descriptive than 'x' > (x means a scalar and not a bintime). The 'bintime' prefix is verbose. It > should be bt, especially in non-global APIs. > > XX @@ -394,7 +416,7 @@ > XX th = timehands; > XX gen = atomic_load_acq_int(&th->th_generation); > XX *bt = th->th_bintime; > XX - bintime_addx(bt, th->th_scale * tc_delta(th)); > XX + bintime_adddelta(bt, th); > XX atomic_thread_fence_acq(); > XX } while (gen == 0 || gen != th->th_generation); > XX } > XX @@ -1464,6 +1486,7 @@ > XX scale += (th->th_adjustment / 1024) * 2199; > XX scale /= th->th_counter->tc_frequency; > XX th->th_scale = scale * 2; > XX + th->th_large_delta = MIN(((uint64_t)1 << 63) / scale, UINT_MAX); > XX > XX /* > XX * Now that the struct timehands is again consistent, set the new > > Clamp this to UINT_MAX now that it is stored in a u_int. > > > The current patch becomes to large already, I want to test/commit what > > I already have, and I will need to split it for the commit. > > It was already too large. > > > > diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c > > index 2656fb4d22f..7114a0e5219 100644 > > --- a/sys/kern/kern_tc.c > > +++ b/sys/kern/kern_tc.c > > ... > > @@ -200,22 +201,77 @@ tc_delta(struct timehands *th) > > * the comment in for a description of these 12 functions. > > */ > > > > -#ifdef FFCLOCK > > -void > > -fbclock_binuptime(struct bintime *bt) > > +static __inline void > > +bintime_helper(struct bintime *bt, uint64_t scale, u_int delta) > > This name is not descriptive. > > > +static __inline void > > +binnouptime(struct bintime *bt, u_int off) > > This name is an example of further problems with the naming scheme. > The bintime_ prefix used above is verbose, but it is at least a prefix > and is in the normal bintime_ namespace. Here the prefix is 'bin', > which is neither of these. It means bintime_ again, but this duplicates > 'time'. I agree, and I made a name getthmember for the other function which clearly reflect its operation. For this one, I ended with bintime_off(). > > If I liked churn, then I would have changed all names here long ago. > E.g.: > - bintime_ -> bt_, and use it consistently > - timecounter -> tc except for the timecounter public variable > - fb_ -> facebook_ -> /dev/null. Er, fb_ -> fbt_ or -> ft_. > - bt -> btp when bt is a pointer. You used bts for a struct in this patch > - unsigned int -> u_int. I policed this in early timecounter code. > You fixed some instances of this too. > - th_generation -> th_gen. diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c index 2656fb4d22f..8d12847f2cd 100644 --- a/sys/kern/kern_tc.c +++ b/sys/kern/kern_tc.c @@ -72,6 +72,7 @@ struct timehands { struct timecounter *th_counter; int64_t th_adjustment; uint64_t th_scale; + u_int th_large_delta; u_int th_offset_count; struct bintime th_offset; struct bintime th_bintime; @@ -90,6 +91,7 @@ static struct timehands th1 = { static struct timehands th0 = { .th_counter = &dummy_timecounter, .th_scale = (uint64_t)-1 / 1000000, + .th_large_delta = 1000000, .th_offset = { .sec = 1 }, .th_generation = 1, .th_next = &th1 @@ -200,20 +202,56 @@ tc_delta(struct timehands *th) * the comment in for a description of these 12 functions. */ -#ifdef FFCLOCK -void -fbclock_binuptime(struct bintime *bt) +static __inline void +bintime_off(struct bintime *bt, u_int off) { struct timehands *th; - unsigned int gen; + struct bintime *btp; + uint64_t scale, x; + u_int delta, gen, large_delta; do { th = timehands; gen = atomic_load_acq_int(&th->th_generation); - *bt = th->th_offset; - bintime_addx(bt, th->th_scale * tc_delta(th)); + btp = (struct bintime *)((vm_offset_t)th + off); + *bt = *btp; + scale = th->th_scale; + delta = tc_delta(th); + large_delta = th->th_large_delta; atomic_thread_fence_acq(); } while (gen == 0 || gen != th->th_generation); + + if (__predict_false(delta < large_delta)) { + /* Avoid overflow for scale * delta. */ + x = (scale >> 32) * delta; + bt->sec += x >> 32; + bintime_addx(bt, x << 32); + bintime_addx(bt, (scale & 0xffffffff) * delta); + } else { + bintime_addx(bt, scale * delta); + } +} + +static __inline void +getthmember(void *out, size_t out_size, u_int off) +{ + struct timehands *th; + u_int gen; + + do { + th = timehands; + gen = atomic_load_acq_int(&th->th_generation); + memcpy(out, (char *)th + off, out_size); + atomic_thread_fence_acq(); + } while (gen == 0 || gen != th->th_generation); +} + +#ifdef FFCLOCK +void +fbclock_binuptime(struct bintime *bt) +{ + + bintime_off(bt, __offsetof(struct timehands, th_offset)); } void @@ -237,16 +275,8 @@ fbclock_microuptime(struct timeval *tvp) void fbclock_bintime(struct bintime *bt) { - struct timehands *th; - unsigned int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *bt = th->th_bintime; - bintime_addx(bt, th->th_scale * tc_delta(th)); - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + bintime_off(bt, __offsetof(struct timehands, th_bintime)); } void @@ -270,100 +300,61 @@ fbclock_microtime(struct timeval *tvp) void fbclock_getbinuptime(struct bintime *bt) { - struct timehands *th; - unsigned int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *bt = th->th_offset; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getthmember(bt, sizeof(*bt), __offsetof(struct timehands, + th_offset)); } void fbclock_getnanouptime(struct timespec *tsp) { - struct timehands *th; - unsigned int gen; + struct bintime bt; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - bintime2timespec(&th->th_offset, tsp); - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getthmember(&bt, sizeof(bt), __offsetof(struct timehands, + th_offset)); + bintime2timespec(&bt, tsp); } void fbclock_getmicrouptime(struct timeval *tvp) { - struct timehands *th; - unsigned int gen; + struct bintime bt; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - bintime2timeval(&th->th_offset, tvp); - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getthmember(&bt, sizeof(bt), __offsetof(struct timehands, + th_offset)); + bintime2timeval(&bt, tvp); } void fbclock_getbintime(struct bintime *bt) { - struct timehands *th; - unsigned int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *bt = th->th_bintime; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getthmember(bt, sizeof(*bt), __offsetof(struct timehands, + th_bintime)); } void fbclock_getnanotime(struct timespec *tsp) { - struct timehands *th; - unsigned int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *tsp = th->th_nanotime; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getthmember(tsp, sizeof(*tsp), __offsetof(struct timehands, + th_nanotime)); } void fbclock_getmicrotime(struct timeval *tvp) { - struct timehands *th; - unsigned int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *tvp = th->th_microtime; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getthmember(tvp, sizeof(*tvp), __offsetof(struct timehands, + th_microtime)); } #else /* !FFCLOCK */ + void binuptime(struct bintime *bt) { - struct timehands *th; - u_int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *bt = th->th_offset; - bintime_addx(bt, th->th_scale * tc_delta(th)); - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + bintime_off(bt, __offsetof(struct timehands, th_offset)); } void @@ -387,16 +378,8 @@ microuptime(struct timeval *tvp) void bintime(struct bintime *bt) { - struct timehands *th; - u_int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *bt = th->th_bintime; - bintime_addx(bt, th->th_scale * tc_delta(th)); - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + bintime_off(bt, __offsetof(struct timehands, th_bintime)); } void @@ -420,85 +403,53 @@ microtime(struct timeval *tvp) void getbinuptime(struct bintime *bt) { - struct timehands *th; - u_int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *bt = th->th_offset; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getthmember(bt, sizeof(*bt), __offsetof(struct timehands, + th_offset)); } void getnanouptime(struct timespec *tsp) { - struct timehands *th; - u_int gen; + struct bintime bt; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - bintime2timespec(&th->th_offset, tsp); - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getthmember(&bt, sizeof(bt), __offsetof(struct timehands, + th_offset)); + bintime2timespec(&bt, tsp); } void getmicrouptime(struct timeval *tvp) { - struct timehands *th; - u_int gen; + struct bintime bt; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - bintime2timeval(&th->th_offset, tvp); - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getthmember(&bt, sizeof(bt), __offsetof(struct timehands, + th_offset)); + bintime2timeval(&bt, tvp); } void getbintime(struct bintime *bt) { - struct timehands *th; - u_int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *bt = th->th_bintime; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getthmember(bt, sizeof(*bt), __offsetof(struct timehands, + th_bintime)); } void getnanotime(struct timespec *tsp) { - struct timehands *th; - u_int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *tsp = th->th_nanotime; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getthmember(tsp, sizeof(*tsp), __offsetof(struct timehands, + th_nanotime)); } void getmicrotime(struct timeval *tvp) { - struct timehands *th; - u_int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *tvp = th->th_microtime; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getthmember(tvp, sizeof(*tvp), __offsetof(struct timehands, + th_microtime)); } #endif /* FFCLOCK */ @@ -514,15 +465,9 @@ getboottime(struct timeval *boottime) void getboottimebin(struct bintime *boottimebin) { - struct timehands *th; - u_int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *boottimebin = th->th_boottime; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getthmember(boottimebin, sizeof(*boottimebin), + __offsetof(struct timehands, th_boottime)); } #ifdef FFCLOCK @@ -1038,15 +983,9 @@ getmicrotime(struct timeval *tvp) void dtrace_getnanotime(struct timespec *tsp) { - struct timehands *th; - u_int gen; - do { - th = timehands; - gen = atomic_load_acq_int(&th->th_generation); - *tsp = th->th_nanotime; - atomic_thread_fence_acq(); - } while (gen == 0 || gen != th->th_generation); + getthmember(tsp, sizeof(*tsp), __offsetof(struct timehands, + th_nanotime)); } /* @@ -1464,6 +1403,7 @@ tc_windup(struct bintime *new_boottimebin) scale += (th->th_adjustment / 1024) * 2199; scale /= th->th_counter->tc_frequency; th->th_scale = scale * 2; + th->th_large_delta = MIN(((uint64_t)1 << 63) / scale, UINT_MAX); /* * Now that the struct timehands is again consistent, set the new From owner-freebsd-ppc@freebsd.org Fri Mar 8 01:30:04 2019 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id DC172152DFD1 for ; Fri, 8 Mar 2019 01:30:03 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic309-22.consmr.mail.ne1.yahoo.com (sonic309-22.consmr.mail.ne1.yahoo.com [66.163.184.148]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6A52A847A5 for ; Fri, 8 Mar 2019 01:30:03 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: z7zvHfUVM1mQmMyjx4YL2r07_97JyYrstZve3Kprd7hdb8Zf3FX97lkWdYwxOem K2rTM9be1lSJH0yuEcOwo5Y_k8ea01xm6t23GwJ7ygwcqU2OX7MfiRkd0BE_855EnA9xkvZzrzC_ hZIEB7PU9iMioc9RmA06Q51xjPHwK8HRwodgvDtbmjrOrSHA9hYMdbQO_leLQhqk_3mnAPRhCUK9 rg0TX3rXjrlUFb0xy32FJ2ta2UO4Zhlal63JsUZlhIqwr6YrZY8R7L2xM7bWPioYv8i59NaZtYAz kRMLHnqYtjS.U0qigzQ6Lwvf0_w37ZXjOcGhiOes9KFNj96rc4bE3_XQs9MqaRLzeVWYSAatz4tQ wca8vd6IZzAMs3xkUv1Ul4v7gcZzMiC_95BAYGo6kA9sN7z6AgsfTx_lxt7neFgthSyc81OZ2EqI lDsLQOT3Tl3LLqW87xR0kXAUdE9NTZyW7y1B4NNK8RMGnwFeXJec1guAf7RQ.55K89Do.zv9dUEA kBuOK_QvLZ3JkhKpISbVSEOzSFdL5mzHaOsVGMYwBgfGgAVZ.6ZMHDoMVnf5pYY7P_eEv9.XNNsG 8b3QGI7KlOksdMqNie3QfnFb_SMMWVAcSgqWxxpY4f2cYErrY4XJETCph4K4nIzplxe9fgYj30_7 QjhqetzjHOmYVqy4cx8S4LTDT57GfoBd3kRLY.6YVXSjEb1WvjY6LHnnFgggVrhAvSXA.U24EJxE F8VRxFRy04MVERSirC7mn6I0X0kBZ6mjrfRV9ZFdWJFpJ5hSavjOOR6s4.9nebI9iBjy2C51Y3i9 NmA0sT6_1315MSiC156DRUL1MU0t7GGSUCbWyvv8__d0wPatERnPbZqpMzbWvV1IfxJd06pFEvFx fr0X30.4Ga8AeYR_g6jg7F3r2_Wuf7yEz9Sl7536KbCPGZFLnUR.fttF1p6yrL0pz_PzJzSZIe1R VCkXhztBy3VMjKLyGCPFQ8U4MjMTT3B34beonXdJXa_44xixvLGSGrNnVt.7moNBKXiRGrRkzDOO F6Wgpyj565EcsXqeCr8E_H5H7qF5jcT1zzv9ZeZxaTu6vPZNEk03Yp9L4NxU5U8sbmOHTfg-- Received: from sonic.gate.mail.ne1.yahoo.com by sonic309.consmr.mail.ne1.yahoo.com with HTTP; Fri, 8 Mar 2019 01:29:56 +0000 Received: from c-67-170-167-181.hsd1.or.comcast.net (EHLO [192.168.1.113]) ([67.170.167.181]) by smtp428.mail.ne1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA ID f665bae4c52bbab61751dd46e593eb0e; Fri, 08 Mar 2019 01:29:53 +0000 (UTC) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\)) Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed] From: Mark Millard In-Reply-To: <20190307222220.GK2492@kib.kiev.ua> Date: Thu, 7 Mar 2019 17:29:51 -0800 Cc: Bruce Evans , freebsd-hackers Hackers , FreeBSD PowerPC ML Content-Transfer-Encoding: 7bit Message-Id: <5EED3352-2E8C-4BEE-B281-4AC8DE9570C2@yahoo.com> References: <20190302142521.GE68879@kib.kiev.ua> <20190303041441.V4781@besplex.bde.org> <20190303111931.GI68879@kib.kiev.ua> <20190303223100.B3572@besplex.bde.org> <20190303161635.GJ68879@kib.kiev.ua> <20190304043416.V5640@besplex.bde.org> <20190304114150.GM68879@kib.kiev.ua> <20190305031010.I4610@besplex.bde.org> <20190306172003.GD2492@kib.kiev.ua> <20190308001005.M2756@besplex.bde.org> <20190307222220.GK2492@kib.kiev.ua> To: Konstantin Belousov X-Mailer: Apple Mail (2.3445.102.3) X-Rspamd-Queue-Id: 6A52A847A5 X-Spamd-Bar: ------ X-Spamd-Result: default: False [-6.98 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; NEURAL_HAM_SHORT(-0.98)[-0.977,0]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; REPLY(-4.00)[] X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Mar 2019 01:30:04 -0000 A basic question and a small note. Question's context for it tc->tc_get_timecount(tc) values: In the powerpc64 context tc->tc_get_timecount(tc) is the lower 32 bits of the tbr, in my context having a 33,333,333 MHz or so increment rate for a machine with a 2.5 GHz or so clock rate. The truncated 32 bit tbr value wraps every 128 seconds or so. 2 sockets, 2 cores per socket, so 4 separate tbr values. The question is . . . In tc_delta's: tc->tc_get_timecount(tc) - th->th_offset_count is observing tc->tc_get_timecount(tc) < th->th_offset_count ever supposed to be possible in correct operation, other than tc->tc_get_timecount(tc) having wrapped around (and so being newly 0 or "near" 0, no evidence of of having it having been near 128 seconds or more for my context)? The note: On 2019-Mar-7, at 14:22, Konstantin Belousov wrote: > . . . > + > + if (__predict_false(delta < large_delta)) { I thought that delta=large_delta . > + /* Avoid overflow for scale * delta. */ > + x = (scale >> 32) * delta; > + bt->sec += x >> 32; > + bintime_addx(bt, x << 32); > + bintime_addx(bt, (scale & 0xffffffff) * delta); > + } else { > + bintime_addx(bt, scale * delta); > + } > . . . === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar) From owner-freebsd-ppc@freebsd.org Sat Mar 9 07:00:28 2019 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2067F153A1F7; Sat, 9 Mar 2019 07:00:28 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au [211.29.132.249]) by mx1.freebsd.org (Postfix) with ESMTP id 22F7F71414; Sat, 9 Mar 2019 07:00:25 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 84AD9105AD5E; Sat, 9 Mar 2019 18:00:15 +1100 (AEDT) Date: Sat, 9 Mar 2019 18:00:14 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Konstantin Belousov cc: Bruce Evans , Mark Millard , freebsd-hackers Hackers , FreeBSD PowerPC ML Subject: Re: powerpc64 head -r344018 stuck sleeping problems: th->th_scale * tc_delta(th) overflows unsigned 64 bits sometimes [patched failed] In-Reply-To: <20190307222220.GK2492@kib.kiev.ua> Message-ID: <20190309144844.K1166@besplex.bde.org> References: <20190302142521.GE68879@kib.kiev.ua> <20190303041441.V4781@besplex.bde.org> <20190303111931.GI68879@kib.kiev.ua> <20190303223100.B3572@besplex.bde.org> <20190303161635.GJ68879@kib.kiev.ua> <20190304043416.V5640@besplex.bde.org> <20190304114150.GM68879@kib.kiev.ua> <20190305031010.I4610@besplex.bde.org> <20190306172003.GD2492@kib.kiev.ua> <20190308001005.M2756@besplex.bde.org> <20190307222220.GK2492@kib.kiev.ua> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=UJetJGXy c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=vnREMb7VAAAA:8 a=ClMc5Of-GfaXbdAZ3JQA:9 a=f8I4eRmMFRTVFEQH:21 a=DjpI8WK0P_VDdg0N:21 a=CjuIK1q_8ugA:10 X-Rspamd-Queue-Id: 22F7F71414 X-Spamd-Bar: ----- Authentication-Results: mx1.freebsd.org; spf=pass (mx1.freebsd.org: domain of brde@optusnet.com.au designates 211.29.132.249 as permitted sender) smtp.mailfrom=brde@optusnet.com.au X-Spamd-Result: default: False [-6.00 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; RCVD_IN_DNSWL_LOW(-0.10)[249.132.29.211.list.dnswl.org : 127.0.5.1]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:211.29.132.0/23]; FREEMAIL_FROM(0.00)[optusnet.com.au]; MIME_GOOD(-0.10)[text/plain]; MIME_TRACE(0.00)[0:+]; DMARC_NA(0.00)[optusnet.com.au]; RCPT_COUNT_FIVE(0.00)[5]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; MX_GOOD(-0.01)[cached: extmail.optusnet.com.au]; NEURAL_HAM_SHORT(-0.83)[-0.826,0]; IP_SCORE(-2.86)[ip: (-7.21), ipnet: 211.28.0.0/14(-3.92), asn: 4804(-3.13), country: AU(-0.04)]; FREEMAIL_TO(0.00)[gmail.com]; RCVD_NO_TLS_LAST(0.10)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[optusnet.com.au]; ASN(0.00)[asn:4804, ipnet:211.28.0.0/14, country:AU]; FREEMAIL_CC(0.00)[optusnet.com.au]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Mar 2019 07:00:28 -0000 On Fri, 8 Mar 2019, Konstantin Belousov wrote: > On Fri, Mar 08, 2019 at 01:31:30AM +1100, Bruce Evans wrote: >> On Wed, 6 Mar 2019, Konstantin Belousov wrote: >> >>> On Tue, Mar 05, 2019 at 05:17:14AM +1100, Bruce Evans wrote: >>>> On Mon, 4 Mar 2019, Konstantin Belousov wrote: >>>> >>>>> On Mon, Mar 04, 2019 at 05:29:48AM +1100, Bruce Evans wrote: >>>>>> On Sun, 3 Mar 2019, Konstantin Belousov wrote: >>>>>> >>>>>>> On Mon, Mar 04, 2019 at 12:32:12AM +1100, Bruce Evans wrote: >>> * ... >>>> I strongly disklike the merge. I more strongly disclike (sic) the more complete merge. The central APIs have even more parameters and reduced type safety to describe objects as (offset, size) pairs. >* ... >>>>>>> +#else >>>>>>> + /* >>>>>>> + * Use bintime_helper() unconditionally, since the fast >>>>>>> + * path in the above method is not so fast here, since >>>>>>> + * the 64 x 32 -> 64 bit multiplication is usually not >>>>>>> + * available in hardware and emulating it using 2 >>>>>>> + * 32 x 32 -> 64 bit multiplications uses code much >>>>>>> + * like that in bintime_helper(). >>>>>>> + */ >>>>>>> + bintime_helper(bt, scale, delta); >>>>>>> + bintime_addx(bt, (uint64_t)(uint32_t)scale * delta); >>>>>>> +#endif >>>>>> >>>>>> Check that this method is really better. Without this, the complicated >>>>>> part is about half as large and duplicating it is smaller than this >>>>>> version. >>>>> Better in what sence ? I am fine with the C code, and asm code looks >>>>> good. >>>> >>>> Better in terms of actually running significantly faster. I fear the >>>> 32-bit method is actually slightly slower for the fast path. >> >> I checked that it is just worse. Significantly slower and more complicated. >> >> I wrote and run a lot of timing benchmarks of various versions. All >> times in cycles on Haswell @4.08 GHz. On i386 except where noted: >> ... >> The above tests were done with the final version. The version which tested >> alternatives used switch (method) and takes about 20 cycles longer for the >> fastest version, presumably by defeating parallelism. Times for various >> methods: >> >> - with clang -Os, about 54 cycles for the old method that allowed overflow, >> and the same for the version with the check of the overflow threshold >> (but with the threshold never reached), and 59 cycles for the branch- >> free method. 100-116 cycles with gcc-4.2.1 -Os, with the branch-free >> method taking 5-10 cycles longer. >> >> - on amd64, only a couple of cycles faster (49-50 cycles in best cases), >> and gcc-4.2.1 only taking a ouple of cycles longer. The branch-free >> method still takes about 59 cycles so it is relatively worse. >> >> In userland, using the syscall for syscall for clock_gettime(), the >> extra 5-10 cycles for the branch-free method is relatively insignificat. >> It is about 2 nanonseconds. Other pessimizatations are more significant. >> Times for this syscall: >> - amd64 now: 224 nsec (with gcc-4.2.1 -Os) >> - i386 4+4 nopae: 500-580 nsec (depending on clang/gcc and -O2/-Os) >> even getpid(2) takes 280 nsec. Add at least 140 more nsec for pae. >> - i386 3+1: 224 nsec (with gcc 4.2.1 -Os) >> - i386 FreeBSD-5 UP: 193 nsec (with gcc-3.3.3 -O). >> - i386 4+4 nopae old library version of clock_gettime() compiled by >> clang: 29 nsec. >> >> In some tests, the version with the branch was even a cycle or two faster. >> In the tests, the branch was always perfectly predicted, so costs nothing >> except possibly by changing scheduling in an accidentally good way. The >> tests were too small to measure the cost of using branch prediction >> resources. I've never noticed a case where 1 more branch causes thrashing. > About testing such tight loops. There is a known phenomen where Intel > CPUs give absurd times when code in the loop has unsuitable alignment. > The manifestation of the phenomen is very surprising and hardly > controllable. It is due to the way the CPU front-end prefetches blocks > of bytes for instruction decoding and jmps locations in the blocks. > > The only source explaining it is https://www.youtube.com/watch?v=IX16gcX4vDQ > the talk of Intel engineer. I know a little about such tests since I have written thousands and interpreted millions of them (mostly automatically). There are a lot of other side effects of caching resources that usually make more difference than alignment. The most mysterious one that I noticed was apparently due to alignment, but in a makeworld macro-benchmark. Minor changes in even in unused functions or data gave differences of about 2% in real time and many more % in system time. This only showed up on an old Turion2 (early Athlon64) system. I think it is due to limited cache associativity causing many cache misses by lining up unrelated far apart code or data adresses mod some power of 2. Padding to give the same alignment as the best case was too hard, but I eventually found a configuration accidentally giving nearly the best case even with its alignments changed by small modifications the areas that I was working on. >* ... >>>>>>> - do { >>>>>>> - th = timehands; >>>>>>> - gen = atomic_load_acq_int(&th->th_generation); >>>>>>> - *bt = th->th_bintime; >>>>>>> - bintime_addx(bt, th->th_scale * tc_delta(th)); >>>>>>> - atomic_thread_fence_acq(); >>>>>>> - } while (gen == 0 || gen != th->th_generation); >>>>>> >>>>>> Duplicating this loop is much better than obfuscating it using inline >>>>>> functions. This loop was almost duplicated (except for the delta >>>>>> calculation) in no less than 17 functions in kern_tc.c (9 tc ones and >>>>>> 8 fflock ones). Now it is only duplicated 16 times. >>>>> How did you counted the 16 ? I can see only 4 instances in the unpatched >>>>> kern_tc.c, and 3 in patched, but it is 3 and not 1 only because I do not >>>>> touch ffclock until the patch is finalized. After that, it would be >>>>> 1 instance for kernel and 1 for userspace. >>>> >>>> Grep for the end condition in this loop. There are actually 20 of these. >>>> I'm counting the loops and not the previously-simple scaling operation in >>>> it. The scaling is indeed only done for 4 cases. I prefer the 20 >>>> duplications (except I only want about 6 of the functions). Duplication >>>> works even better for only 4 cases. >>> Ok, I merged these as well. Now there are only four loops left in kernel. >>> I do not think that merging them is beneficial, since they have sufficiently >>> different bodies. >> >> This is exacly what I don't want. >>> >>> I disagree with you characterization of it as obfuscation, IMO it improves >>> the maintainability of the code by reducing number of places which need >>> careful inspection of the lock-less algorithm. >> >> It makes the inspection and changes more difficult for each instance. >> General functions are more difficult to work with since they need more >> args to control them and can't be changed without affecting all callers. >> >> In another thread, you didn't like similar churn for removing td args. > It is not similar. I do valid refactoring there (in terms of that > thread, I do not like the term refactoring). I eliminate dozen instrances > of very intricate loop which implements quite delicate lockless algorithm. > Its trickiness can be illustrated by the fact that it is only valid > use of thread_fence_acq() which cannot be replaced by load_acq() (similar > case is present in sys/seq.h). Small delicate loops are ideal for duplicating. They are easier to understand individually and short enough to compare without using diff to see gratuitous and substantive differences. Multiple instances are only hard to write and maintain. Since these multiple instances are already written, they are only harder to maintain. >> XX void >> XX binuptime(struct bintime *bt) >> XX { >> XX @@ -361,7 +383,7 @@ >> XX th = timehands; >> XX gen = atomic_load_acq_int(&th->th_generation); >> XX *bt = th->th_offset; >> XX - bintime_addx(bt, th->th_scale * tc_delta(th)); >> XX + bintime_adddelta(bt, th); >> XX atomic_thread_fence_acq(); >> XX } while (gen == 0 || gen != th->th_generation); >> XX } >> >> This is the kind of non-churning change that I like. > Ok. I made all cases where timehands are read, more uniform by > moving calculations after the generation loop. This makes the > atomic part of the functions easier to see, and loop body has lower > chance to hit generation reset. I think this change is slightly worse: - it increases register pressure. 'scale' and 'delta' must be read in a alost program program before the loop exit test. The above order uses them and stores the results to memory, so more registers are free for the exit test. i386 certainly runs out of registers. IIRC, i386 now spills 'gen'. It would have to spill something to load 'gen' or 'th' for the test. - it enlarges the window between reading 'scale' and 'delta' and the caller seeing the results. Preemption in this window gives results that may be far in the past. >>> diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c >>> index 2656fb4d22f..7114a0e5219 100644 >>> --- a/sys/kern/kern_tc.c >>> +++ b/sys/kern/kern_tc.c >>> ... >>> @@ -200,22 +201,77 @@ tc_delta(struct timehands *th) >>> * the comment in for a description of these 12 functions. >>> */ >>> >>> -#ifdef FFCLOCK >>> -void >>> -fbclock_binuptime(struct bintime *bt) >>> +static __inline void >>> +bintime_helper(struct bintime *bt, uint64_t scale, u_int delta) >> >> This name is not descriptive. >> >>> +static __inline void >>> +binnouptime(struct bintime *bt, u_int off) >> >> This name is an example of further problems with the naming scheme. >> The bintime_ prefix used above is verbose, but it is at least a prefix >> and is in the normal bintime_ namespace. Here the prefix is 'bin', >> which is neither of these. It means bintime_ again, but this duplicates >> 'time'. > I agree, and I made a name getthmember for the other function which clearly > reflect its operation. For this one, I ended with bintime_off(). The 'get' name is another problem. I would like all the get*time functions and not add new names starting with 'get'. The library implementation already doesn't bother optimizing the get*time functions, but always uses the hardware timecounter. getfoo() is a more natural name than foo_get() for the action of getting foo, but the latter is better for consistency, especially in code that puts the subsystem name first in nearby code. The get*time functions would be better if they were more like time_second. Note that time_second is racy if time_t is too larger for the arch so that accesses to it are not atomic, as happens on 32-bit arches with premature 64-bit time_t. However, in this 32/64 case, the race is only run every 136 years, with the next event scheduled in 2038, so this race is even less important now than other events scheduled in 2038. Bintimes are 96 or 128 bits, so directly copying a global like time_second for them would race every 1/2**32 second on 2-bit arches or every 1 second on 64-bit arches. Most of the loops on the generation count are for fixing these races, but perhaps a simpler method would work. On 64-bit arches with atomic 64 accesses on 32-bit boundaries, the following would work: - set the lower 32 bits of the fraction to 0, or ignore them - load the higher 32 bits of the fraction and the lower 32 bits of the seconds - race once every 136 years starting in 2038 reading the higher 32 bits of the seconds non-atomically. - alternatively, break instead of racing in 2038 by setting the higher 32 bits to 0. This is the same as using sbintimes instead of bintimes. - drop a few more lower bits by storing a right-shifted value. Right shifting by just 1 gives a race frequency of once per 272 years, with the next one in 2006. > diff --git a/sys/kern/kern_tc.c b/sys/kern/kern_tc.c > index 2656fb4d22f..8d12847f2cd 100644 > --- a/sys/kern/kern_tc.c > +++ b/sys/kern/kern_tc.c > @@ -200,20 +202,56 @@ tc_delta(struct timehands *th) > * the comment in for a description of these 12 functions. > */ > > -#ifdef FFCLOCK > -void > -fbclock_binuptime(struct bintime *bt) > +static __inline void > +bintime_off(struct bintime *bt, u_int off) > { > struct timehands *th; > - unsigned int gen; > + struct bintime *btp; > + uint64_t scale, x; > + u_int delta, gen, large_delta; > > do { > th = timehands; > gen = atomic_load_acq_int(&th->th_generation); > - *bt = th->th_offset; > - bintime_addx(bt, th->th_scale * tc_delta(th)); You didn't fully obfuscate this by combinining this function with getthmember() so as to deduplicate the loop. > + btp = (struct bintime *)((vm_offset_t)th + off); Ugly conversion to share code. This is technically incorrect. Improving the casts gives: btp = (void *)(uintptr_t)((uintptr_t)(void *)th + off); but this assumes that arithmetic on the intermediate integer does what is espected. uintptr_t is only guaranteed to work when the intermediate representation held in it is not adjusted. Fixing the API gives static __inline void bintime_off(struct bintime *btp, struct bintime *base_btp) where base_btp is &th->th_bintime or &th->th_offset. (th_offset and th_bintime are badly named. th_offset is really a base time and the offset is tc_delta(). th_bintime is also a base time. It is the same as th_offset with another actual offset (the difference between UTC and local time) already added to it as an optimization. In old versions, th_bintime didn't exist, but the related struct members th_nanotime and th_microtime existed, since these benefit more from not converting on every call. My old version even documents the struct members, while -current still has no comments. The comments were lost to staticization. My version mostly adds "duh" to the banal comments after recovering them: XX /* XX * XXX rotted comment cloned from . XX * XX * th_counter is undocumented (duh). XX * XX * th_adjustment [PPM << 16] which means that the smallest unit of correction XX * you can apply amounts to 481.5 usec/year. XX * XX * th_scale is undocumented (duh). XX * XX * th_offset_count is the contents of the counter which corresponds to the XX * XX * rest of the offset_* values. XX * XX * th_offset is undocumented (duh). XX * XX * th_microtime is undocumented (duh). XX * XX * th_nanotime is undocumented (duh). XX * XX * XXX especially massive bitrot here. "three" is now "many"... XX * Each timecounter must supply an array of three timecounters. This is needed XX * to guarantee atomicity in the code. Index zero is used to transport XX * modifications, for instance done with sysctl, into the timecounter being XX * used in a safe way. Such changes may be adopted with a delay of up to 1/HZ. XX * Index one and two are used alternately for the actual timekeeping. XX * XX * th_generation is undocumented (duh). XX * XX * th_next is undocumented (duh). XX */ > + *bt = *btp; > + scale = th->th_scale; > + delta = tc_delta(th); > + large_delta = th->th_large_delta; I had forgotten that th_scale is so volatile (it may be adjusted on every windup). th_large_delta is equally volatile. So moving the calculation outside of the loop gives even more register pressure than I noticed above. > atomic_thread_fence_acq(); > } while (gen == 0 || gen != th->th_generation); > + > + if (__predict_false(delta < large_delta)) { > + /* Avoid overflow for scale * delta. */ > + x = (scale >> 32) * delta; > + bt->sec += x >> 32; > + bintime_addx(bt, x << 32); > + bintime_addx(bt, (scale & 0xffffffff) * delta); > + } else { > + bintime_addx(bt, scale * delta); > + } > +} > + > +static __inline void > +getthmember(void *out, size_t out_size, u_int off) > +{ > + struct timehands *th; > + u_int gen; > + > + do { > + th = timehands; > + gen = atomic_load_acq_int(&th->th_generation); > + memcpy(out, (char *)th + off, out_size); This isn't so ugly or technically incorrect. Now the object is generic, so the reference to it should be passed as (void *objp, size_t objsize) instead of the type-safe (struct bintime *base_bpt). > + atomic_thread_fence_acq(); > + } while (gen == 0 || gen != th->th_generation); > +} I can see a useful use of copying methods like this for sysctls. All sysctl accesses except possibly for aligned register_t's were orginally racy, but we sprinkled mutexes for large objects and reduced race windows for smaller objects. E.g., sysctl_handle_long() still makes a copy with no locking, but this has no effect except on my i386-with-64-bit-longs since longs have the same size as ints so are as atomic as ints on 32-bit arches. sysctl_handle_64() uses the same method. It works to reduce the race window on 32-bit arches. sysctl_handle_string() makes a copy to malloc()ed storage. memcpy() to that risks losing the NUL terminator, and subsequent strlen() on the copy gives buffer overrun if the result has no terminators. sysctl_handle_opaque() uses a generation count method, like the one used by timecounters before the ordering bugs were fixed, but even more primitive and probably even more in need of ordering fixes. It would be good to fix all sysctl using the same generation count method as above. A loop at the top level might work. I wouldn't like a structure like the above where the top level calls individual sysctl functions which do nothing except wrap themselves in a generic function like the above. The above does give this structure to clock_gettime() calls. The top level converts the clock id to a function and the above makes the function essentially convert back to another clock id (the offset of the relevant field in timehands), especially for the get*time functions where the call just copies the relevant field to userland. Unfortunately, the indivual time functions are called directly in the kernel. I prefer this to generic APIs based on ids. So that callers can use simple efficient APIs like nanouptime() and instead of using complicated inefficieciencies like kern_clock_gettime_generic(int clock_id = CLOCK_MONOTONIC, int format_id = CLOCK_TYPE_TIMESPEC, int precision = CLOCK_PRECISION_NSEC, void *dstp = &ts); Bruce