From owner-freebsd-ppc@freebsd.org Wed May 15 00:42:26 2019 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8EC1D15A0EC3 for ; Wed, 15 May 2019 00:42:26 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic304-22.consmr.mail.ne1.yahoo.com (sonic304-22.consmr.mail.ne1.yahoo.com [66.163.191.148]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id A2B908E886 for ; Wed, 15 May 2019 00:42:25 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: irPfQvoVM1koXUFt3RttdAGtkejqxu0uoL1lKpEPZi6x9xcNGLfDzbjWmKehD4r L4Acqn2OvPrm8ONrVIgVetZnzDnxkNKkz._kG9Mt.X2zGvPPWZDPXkAi6fioOkEq1Ouz5XaT.C4r LFecNBzTIW6HI.URoEO4HMjRK3DPOVtN63qfbxm8qBrhKHFQBo0tFd8HUBReaQo6vZpdjwlsNpO6 LYEDKMGIJsE.z11gm7kV285N4UAlfyyl5QnRFd6SLursdOGUM0vX3GHeqR70fIF_X8ZO_ZvcYkHs .8OaB7SfuF7v38vnjt_JdZtHM1I8KNko9b7rMDAWXvhK67I4lxSoGMu5Yf.2X7JGZQIw6ecnzY3q xY.hdfwPaEsSveVHQ7kpYAmaI9xhAjuadaOcuKKTzgS9FxjDf_7xU0oFZY0OipEMTqAiYvsPjFtg j2GjGHYKaXJdJ2P25t36e00vwhFBMPhgP8Tk5ijGbP86IXxwvPNAfNMm2hzuFi54txNF9BXWsY5j NnlgojpeI.qs.HC8NWya3oijNa7U8RAFVQFe4zNLyTYJQ1HeNxiFp5ZNRMJKuHC2Z.CLiambNB0h VedUMjGlKK8tECRIAezpSmGL4fGHxxlqZ8HCt07c486XDVyV7IiBpxpCkStOyrATd8mEuK4voGku l4i1eehkQdjrKzklWZ58tekVJPXy0FTKd7gPba2lYYViv6FHw6qbvjRNDVexjS2qQGqBxWXGp3li GsrB24.k1uih6tM2SqZxdIjWgl9QMhqyrtR64bFLgWhSToPuajIF9TEQ_x.R9NIsYFNW813ZCBAi yMpbkQjV7bx130ifhyVQwW89t3sUTI_qm6YHcG6CbaniFs5IPpf3tj0.jrLNYymKlor6K_WY4XCl ta5wgkpShCxCLSsKdZDMReG3speqsw3BcuuuumVVe1ZJ4C3yCdUB7Q2GKcnu4HKgg3nGNf2riqQj vWUyOcgk0OjE9UpIUWYjil7d8Z.x2VBy3n5RlYWmQuR1GBxYNLkfKB1_aPJt5Ve.lbyhcI4yJPi2 cwLekWnaXbxY3e2DwlGXRypEcDtrZCDXSlt84ef.Ixv4JSwS5pC8QzGmMdBx2TH_A5NZ01RdwOjI 6UkSc4ZYD7qfofUYSWjbPUmoz7B.mpa7asB1wbBcXYAVPYVqRmNCtlkJ6OQ4fVsJyLCDyVN_xzhw dtVWVJifJ8iue Received: from sonic.gate.mail.ne1.yahoo.com by sonic304.consmr.mail.ne1.yahoo.com with HTTP; Wed, 15 May 2019 00:42:19 +0000 Received: from c-76-115-7-162.hsd1.or.comcast.net (EHLO [192.168.1.103]) ([76.115.7.162]) by smtp428.mail.ne1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA ID a3aff1c6dae4c4c68c8153a849471e1d; Wed, 15 May 2019 00:42:16 +0000 (UTC) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.8\)) Subject: Re: An experiment in PowerMac G5 multi-socket/multi-core having better matching mftb() values Date: Tue, 14 May 2019 17:42:15 -0700 References: To: Justin Hibbits , FreeBSD PowerPC ML In-Reply-To: Message-Id: <48525401-7C75-4AE7-98D9-AD7CC7F53DE8@yahoo.com> X-Mailer: Apple Mail (2.3445.104.8) X-Rspamd-Queue-Id: A2B908E886 X-Spamd-Bar: ++ X-Spamd-Result: default: False [2.88 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; MV_CASE(0.50)[]; FREEMAIL_FROM(0.00)[yahoo.com]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_ALL(0.00)[]; MX_GOOD(-0.01)[cached: mta6.am0.yahoodns.net]; DKIM_TRACE(0.00)[yahoo.com:+]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; FREEMAIL_TO(0.00)[gmail.com]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_LAST(0.00)[]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36646, ipnet:66.163.184.0/21, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com.dwl.dnswl.org : 127.0.5.0]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; FROM_HAS_DN(0.00)[]; NEURAL_SPAM_SHORT(0.69)[0.692,0]; MIME_GOOD(-0.10)[text/plain]; IP_SCORE(1.84)[ip: (6.63), ipnet: 66.163.184.0/21(1.45), asn: 36646(1.16), country: US(-0.06)]; NEURAL_SPAM_MEDIUM(0.20)[0.203,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_SPAM_LONG(0.66)[0.663,0]; RCVD_IN_DNSWL_NONE(0.00)[148.191.163.66.list.dnswl.org : 127.0.5.0]; RWL_MAILSPIKE_POSSIBLE(0.00)[148.191.163.66.rep.mailspike.net : 127.0.0.17] X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 May 2019 00:42:26 -0000 [Switching to code not using 64-bit atomics.] On 2019-May-13, at 03:23, Mark Millard wrote: > I've been experimenting with a alternate > technique of dealing with boot-time 970 family > PowerMac G5 tbr value synchronization across > sockets/cores. So far it has narrowed the > range significantly. I've reverted my hack for > tolerating the mismatches in order to see how > it goes. >=20 > . . . >=20 >=20 > # svnlite diff /usr/src/sys/powerpc/powermac/platform_powermac.c = /usr/src/sys/powerpc/powerpc/mp_machdep.c | more > . . . So far the experiment has gone well. But the original code used 64-bit atomic types and so was inappropriate for multi-socket PowerMac G4's. So I've been experimenting with an alternate coding that is not powerpc64 specific. I present the updated code below, still only enabling the new technique for PowerMac's. The example code does show my use of volatile for the ap_pcpu pointer value and so would not match the svn code in that respect: /usr/src/sys/powerpc/aim/aim_machdep.c:extern void * volatile ap_pcpu; /usr/src/sys/powerpc/aim/mp_cpudep.c:void * volatile ap_pcpu; /usr/src/sys/powerpc/pseries/platform_chrp.c:extern void *ap_pcpu; /usr/src/sys/powerpc/powermac/platform_powermac.c:extern void * volatile = ap_pcpu; (booke has an example volatile for what is pointed to. But I've not dealt with examples that I do not have to test and so, thus far, I only changed the above for the issue.) (Whitespace details may not survive such e-mail based handling.) Index: /usr/src/sys/powerpc/powermac/platform_powermac.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- /usr/src/sys/powerpc/powermac/platform_powermac.c (revision = 347549) +++ /usr/src/sys/powerpc/powermac/platform_powermac.c (working copy) @@ -55,7 +55,7 @@ =20 #include "platform_if.h" =20 -extern void *ap_pcpu; +extern void * volatile ap_pcpu; =20 static int powermac_probe(platform_t); static int powermac_attach(platform_t); @@ -333,6 +333,9 @@ return (powermac_smp_fill_cpuref(cpuref, bsp)); } =20 +// platform_powermac.c is implicitly an AIM context: no explicit AIM = test. +extern volatile int alternate_timebase_sync_style; // 0 indicates old = style; 1 indicates new style + static int powermac_smp_start_cpu(platform_t plat, struct pcpu *pc) { @@ -367,6 +370,13 @@ =20 ap_pcpu =3D pc; =20 + // platform_powermac.c is implicitly an AIM context: no explicit = AIM test. + // Part of: Attempt a better-than-historical approximately + // equal timebase value for ap vs. bsp + alternate_timebase_sync_style=3D 1; // So: new style for = PowerMacs + + powerpc_sync(); // for ap_pcpu and alternate_timebase_sync_style + if (rstvec_virtbase =3D=3D NULL) rstvec_virtbase =3D pmap_mapdev(0x80000000, PAGE_SIZE); =20 Index: /usr/src/sys/powerpc/powerpc/mp_machdep.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- /usr/src/sys/powerpc/powerpc/mp_machdep.c (revision 347549) +++ /usr/src/sys/powerpc/powerpc/mp_machdep.c (working copy) @@ -70,6 +70,20 @@ static struct mtx ap_boot_mtx; struct pcb stoppcbs[MAXCPU]; =20 +#if defined(AIM) +// Part of: Attempt a better-than-historical approximately +// equal timebase value for ap vs. bsp + +volatile int alternate_timebase_sync_style=3D 0; // 0 indicates old = style; 1 indicates new style. +volatile uint64_t bsp_timebase_sample=3D 0u; + +volatile unsigned int from_bsp_status_flag=3D 0u; +// stages: 0u, 1u (bsp ready to start), 2u (bsp tbr value available to = ap) + +volatile unsigned int from_ap_status_flag=3D 0u; +// stages: 0u, 1u (ap ready for bsp tbr value to be found and sent) +#endif + void machdep_ap_bootstrap(void) { @@ -77,19 +91,71 @@ PCPU_SET(awake, 1); __asm __volatile("msync; isync"); =20 +#if defined(AIM) + powerpc_sync(); + isync(); + if (1=3D=3Dalternate_timebase_sync_style) + { + // Part of: Attempt a better-than-historical = approximately + // equal timebase value for ap vs. bsp + + register_t oldmsr=3D intr_disable(); + + while (1u!=3Dfrom_bsp_status_flag) + ; // spin waiting for bsp to flag that its ready = to start. + + // Start to measure a round trip:: to the bsp and back. + + isync(); // Be sure below mftb() result is not from = earlier speculative execution. + register_t const start_round_trip_time_on_ap=3D mftb(); + atomic_store_rel_int(&from_ap_status_flag, 1u); // bsp = waits for such before its mftb(). + + while (2u!=3Dfrom_bsp_status_flag) + ; // spin waiting for bsp's tbr value + + // Mid-point of ap round trip and the bsp timebase value = should be approximately equal + // when the tbr's are well matched, absent interruptions = on both sides. + + isync(); // Be sure below mftb() result is not from = earlier speculative execution. + register_t const end_round_trip_time_on_ap=3D mftb(); + + int64_t const approx_round_trip_tbr_detla_on_ap + =3D end_round_trip_time_on_ap - = start_round_trip_time_on_ap; + int64_t const ap_midpoint_tbr_value + =3D start_round_trip_time_on_ap + = (approx_round_trip_tbr_detla_on_ap+1)/2; + + // Establish delta_to_match_bsp_example such that: + // = ap_midpoint_tbr_value+delta_to_match_bsp_example=3D=3Dbsp_timebase_sample + int64_t const delta_to_match_bsp_tbr_example=3D = bsp_timebase_sample-ap_midpoint_tbr_value; + + isync(); // Be sure below mftb() result is not from = earlier speculative execution. + mttb((int64_t)mftb()+delta_to_match_bsp_tbr_example); // = Make the ap tbr adjustment. + + atomic_store_rel_int(&from_bsp_status_flag, 0u); // Get = ready for next ap in bsp loop + atomic_store_rel_int(&from_ap_status_flag, 0u); // also = flaging bsp that this ap is done + + mtmsr(oldmsr); + } +#endif + while (ap_letgo =3D=3D 0) nop_prio_vlow(); nop_prio_medium(); =20 - /* - * Set timebase as soon as possible to meet an implicit = rendezvous - * from cpu_mp_unleash(), which sets ap_letgo and then = immediately - * sets timebase. - * - * Note that this is instrinsically racy and is only relevant on - * platforms that do not support better mechanisms. - */ - platform_smp_timebase_sync(ap_timebase, 1); +#if defined(AIM) + if (0=3D=3Dalternate_timebase_sync_style) +#endif + { + /* + * Set timebase as soon as possible to meet an implicit = rendezvous + * from cpu_mp_unleash(), which sets ap_letgo and then = immediately + * sets timebase. + * + * Note that this is instrinsically racy and is only = relevant on + * platforms that do not support better mechanisms. + */ + platform_smp_timebase_sync(ap_timebase, 1); + } =20 /* Give platform code a chance to do anything else necessary */ platform_smp_ap_init(); @@ -260,6 +326,34 @@ pc->pc_cpuid, = (uintmax_t)pc->pc_hwref, pc->pc_awake); smp_cpus++; + +#if defined(AIM) + // Part of: Attempt a better-than-historical = approximately + // equal timebase value for ap vs. bsp + powerpc_sync(); + isync(); + if (1=3D=3Dalternate_timebase_sync_style) + { + register_t oldmsr=3D intr_disable(); + + = atomic_store_rel_int(&from_bsp_status_flag, 1u); // bsp ready to start. + + while (1u!=3Dfrom_ap_status_flag) + ; // spin waiting for ap to = flag: time to send a tbr. + + isync(); // Be sure below mftb() result = is not from earlier. + bsp_timebase_sample=3D mftb(); + = atomic_store_rel_int(&from_bsp_status_flag, 2u); // bsp tbr available. + + // Most of the rest of the usage is in = machdep_ap_bootstrap, + // other than controling = alternate_timebase_sync_style value. + + while (0u!=3Dfrom_ap_status_flag) + ; // spin waiting for ap to be = done with the sample. + + mtmsr(oldmsr); + } +#endif } else CPU_SET(pc->pc_cpuid, &stopped_cpus); } @@ -266,14 +360,22 @@ =20 ap_awake =3D 1; =20 - /* Provide our current DEC and TB values for APs */ - ap_timebase =3D mftb() + 10; - __asm __volatile("msync; isync"); +#if defined(AIM) + if (0=3D=3Dalternate_timebase_sync_style) +#endif + { + /* Provide our current DEC and TB values for APs */ + ap_timebase =3D mftb() + 10; + __asm __volatile("msync; isync"); + } =20 /* Let APs continue */ atomic_store_rel_int(&ap_letgo, 1); =20 - platform_smp_timebase_sync(ap_timebase, 0); +#if defined(AIM) + if (0=3D=3Dalternate_timebase_sync_style) +#endif + platform_smp_timebase_sync(ap_timebase, 0); =20 while (ap_awake < smp_cpus) ; =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)