From owner-freebsd-hackers@freebsd.org Fri Apr 19 05:17:54 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D843C1585B2A for ; Fri, 19 Apr 2019 05:17:53 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic317-28.consmr.mail.bf2.yahoo.com (sonic317-28.consmr.mail.bf2.yahoo.com [74.6.129.83]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C8E808E3E0 for ; Fri, 19 Apr 2019 05:17:51 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: ONTDAVYVM1mCwjts_5U0JptmTjyOc4J_qbeARDcUcxcO1jZ7zvOlGDXgcuHQl2q iyzkuZBgM1qm0pV.XLKsWLreRKw6hJvtWVRqGqMvvuBTclku3hngm.p6C58JXD4hUT4q5Rtzl99t 6qSOkLZPoxhGSz0wyAc0NlyXwLA1SiHiUZBrLLUGbd4XU4f8Kt8FWDeKx0aVNPkuvPXiMwLr..oj X_7ZmCDnCkAIDChRulJfb0MWJQg0fmg6wP5xxzMje4dfHloAor5KaqlJWk_dKfvRRukF7_Bntg04 CyNmO.UKwdPasuFD3C43Vlti4hBW8EYgLYlU1e5Idlyu4BYkUV80QwJrIBKRwdnDU_FaaTUJzzed 5HWT2UadxgAvUiEiP70yddNsSMDinQI3bdB7BZnDGKs__LKmHcEgsGBjjsJ0BuXYH4_3x0jMEB1w S96lLpwWxbm2mpoG7kf_3KtDfJ00VzcXPHVPMdv33og_uKBhy2xfEQXXF949KC68uBkMVflPdQ4v lepOlP2hhgWttdAtunnFmOagtHQ4O9nTH.F3r9aiNdYZoWzoLSC0iie8WMZ.ya4Ut2Xn5oXbW9OF 4_xWq2BMBL2XEa1jsUYGN_mwlfAFK_SYOvNpmFktivnQJJr84ddNrXH7FaiMvwfSV3DfHVPd851z 5QSd_z9X4HAXkQQnuMmS0sf6zNjrFxTXaZ3syerWsB_iRe7.pBV4oJz_DPeVTX93flWdIafbrp5Z oikdp6FllL8zGsK_IjtoUJ0njk3s39L9IFzxjnM9r3jB3rn7pkV7BJr_ycmh8GkVBvjmXWjeC9zJ g857.dEsOJ_UZXuRDmm1cDMMsxMnjFXWGgRCjhEE1d1GpiuPrr1mCBT5sKlcBEd4iW2PuwhLmus8 TXjdh6oQOXbYuLnxtvQ8iE1rehxD4OCUP9cXLmiFrPRogbiKThAaJFxOxIH029iGQp4u0G67Q7gH Zs1nL5Uo9waevRefWRYsTHx4JmaKqGvA4EaT7OsZckrfeyGpZg2m1Vbn6_95S9Yl1aCmNf0wBp5y 9_goAa_J7TH1rxJGG4QoF456jqqMfUI9q8dxjBm1ygyHKCTouucR2k1jrrgzEMXR_552ZHv6lWi4 lrhd6YDRsB8Pj5xro_cwIgmFnXYUu2hHjkhd7 Received: from sonic.gate.mail.ne1.yahoo.com by sonic317.consmr.mail.bf2.yahoo.com with HTTP; Fri, 19 Apr 2019 05:17:50 +0000 Received: from c-76-115-7-162.hsd1.or.comcast.net (EHLO [192.168.1.103]) ([76.115.7.162]) by smtp408.mail.bf1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA ID bb6fba2b6d368dc95857f40b5c5be28e; Fri, 19 Apr 2019 05:17:49 +0000 (UTC) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.8\)) Subject: Re: powerpc64 or 32-bit power context: FreeBSD lwsync use vs. th->th_generation handling (and related th-> fields) [Correction] From: Mark Millard In-Reply-To: <50CFD7F1-6892-4375-967B-4713517C2520@yahoo.com> Date: Thu, 18 Apr 2019 22:17:46 -0700 Cc: Bruce Evans , Konstantin Belousov Content-Transfer-Encoding: quoted-printable Message-Id: <0FD9ED28-EF4B-4A1C-9FCE-81C4D5BAEBF1@yahoo.com> References: <50CFD7F1-6892-4375-967B-4713517C2520@yahoo.com> To: FreeBSD PowerPC ML , freebsd-hackers Hackers X-Mailer: Apple Mail (2.3445.104.8) X-Rspamd-Queue-Id: C8E808E3E0 X-Spamd-Bar: +++ X-Spamd-Result: default: False [3.46 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; MV_CASE(0.50)[]; FREEMAIL_FROM(0.00)[yahoo.com]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; MX_GOOD(-0.01)[cached: mta6.am0.yahoodns.net]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; FROM_EQ_ENVFROM(0.00)[]; IP_SCORE(1.67)[ip: (5.54), ipnet: 74.6.128.0/21(1.60), asn: 26101(1.28), country: US(-0.06)]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:26101, ipnet:74.6.128.0/21, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com.dwl.dnswl.org : 127.0.5.0]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; NEURAL_SPAM_SHORT(0.99)[0.985,0]; MIME_GOOD(-0.10)[text/plain]; RCVD_TLS_LAST(0.00)[]; NEURAL_SPAM_MEDIUM(0.82)[0.819,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_SPAM_LONG(0.49)[0.493,0]; RCVD_IN_DNSWL_NONE(0.00)[83.129.6.74.list.dnswl.org : 127.0.5.0]; FREEMAIL_CC(0.00)[optusnet.com.au] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Apr 2019 05:17:54 -0000 [I caught my mental mistake.] On 2019-Apr-18, at 21:36, Mark Millard wrote: > First I review below lwsync behavior. It is based on a = comparison/contrast > paper for the powerpc vs. arm memory models. It sets context for later > material specific to powerpc64 or 32-bit powerpc FreeBSD. >=20 > "For a write before a read, separated by a lwsync, the barrier will = ensure that the write is > committed before the read is satisfied but lets the read be satisfied = before the write has > been propagated to any other thread." >=20 > (By contrast, sync, guarantees that the write has propagated to all = threads before the > read in question is satisfied, the read having been separated from the = write by the > sync.) >=20 > Another wording in case it helps (from the same paper): >=20 > "The POWER lwsync does *not* ensure that writes before the barrier = have propagated to > any other thread before sequent actions, though it does keep writes = before and after > an lwsync in order as far as [each thread is] concerned". (Original = used plural form: > "all threads are". I tired to avoid any potential implication of cross = (hardware) > "thread" ordering constraints for seeing the updates when lwsync is = used.) >=20 >=20 > Next I note FreeBSD powerpc64 and 32-bit powerpc details > that happen to involve lwsync, though lwsync is not the > only issue: >=20 > atomic_store_rel_int(&th->th_generation, ogen); >=20 > and: >=20 > gen =3D atomic_load_acq_int(&th->th_generation); >=20 > with: >=20 > static __inline void \ > atomic_store_rel_##TYPE(volatile u_##TYPE *p, u_##TYPE v) \ > { \ > \ > powerpc_lwsync(); \ > *p =3D v; \ > } >=20 > and: >=20 > static __inline u_##TYPE \ > atomic_load_acq_##TYPE(volatile u_##TYPE *p) \ > { \ > u_##TYPE v; \ > \ > v =3D *p; \ > powerpc_lwsync(); \ > return (v); \ > } \ >=20 > also: >=20 > static __inline void > atomic_thread_fence_acq(void) > { >=20 > powerpc_lwsync(); > } >=20 >=20 >=20 > First I list a simpler-than-full-context example to > try to make things clearer . . . >=20 > Here is a sequence, listing in an overall time > order, omitting other activity, despite the distinct > cpus, (N!=3DM): >=20 >=20 > (Presume th->th_generation=3D=3Dogen-1 initially, then:) >=20 > cpu N: atomic_store_rel_int(&th->th_generation, ogen); > (same th value as for cpu M below) >=20 > cpu M: gen =3D atomic_load_acq_int(&th->th_generation); >=20 >=20 > For the above sequence: >=20 > There is no barrier between the store and the later > load at all. This is important below. >=20 >=20 > So, if I have that much right . . . >=20 > Now for more actual "load side" context: > (Presume, for simplicity, that there is only one=20 > timehands instance instead of 2 or more timehands. So > th does not vary below and is the same on both cpu's > in the later example sequence of activity.) >=20 > do { > th =3D timehands; > gen =3D atomic_load_acq_int(&th->th_generation); > *bt =3D th->th_offset; > bintime_addx(bt, th->th_scale * tc_delta(th)); > atomic_thread_fence_acq(); > } while (gen =3D=3D 0 || gen !=3D th->th_generation); >=20 > For simplicity of referring to things: I again show > a specific sequence in time. I only show the > &th->th_generation activity from cpu N, again for > simplicity. >=20 > (Presume timehands->th_generation=3D=3Dogen-1 initially > and that M!=3DN:) >=20 > cpu M: th =3D timehands; > (Could be after the "cpu N" lines.) >=20 > cpu N: atomic_store_rel_int(&th->th_generation, ogen); > (same th value as for cpu M) >=20 > cpu M: gen =3D atomic_load_acq_int(&th->th_generation); > cpu M: *bt =3D th->th_offset; > cpu M: bintime_addx(bt, th->th_scale * tc_delta(th)); > cpu M: atomic_thread_fence_acq(); > cpu M: gen !=3D th->th_generation > (evaluated to false or to true) >=20 > So here: >=20 > A) gen ends up with: gen=3D=3Dogen-1 || gen=3D=3Dogen > (either is allowed because of the lack of > any barrier between the store and the > involved load). >=20 > B) When gen=3D=3Dogen: there was no barrier > before the assignment to gen to guarantee > other th-> field-value staging relationships. (B) is just wrong: seeing the new value (ogen) does guarantee some about the other th->=20 field-value staging relationships seen, given the lwsync before the store and after the load. > C) When gen=3D=3Dogen: gen!=3Dth->th_generation false > does not guarantee the *bt=3D. . . and > bintime_addx(. . .) activities were based > on a coherent set of th-> field-values. Without (B), (C) does not follow. > If I'm correct about (C) then the likes of the > binuptime and sbinuptime implementations appear > to be broken on powerpc64 and 32-bit powerpc > unless there are extra guarantees always present. >=20 > So have I found at least a powerpc64/32-bit-powerpc > FreeBSD implementation problem? No: I did not find a problem. > Note: While I'm still testing, I've seen problems > on the two 970MP based 2-socket/2-cores-each G5 > PowerMac11,2's that I've so far not seen on three > 2-socket/1-core-each PowerMacs, two such 7455 G4 > PowerMac3,6's and one such 970 G5 PowerMac7,2. > The two PowerMac11,2's are far more tested at > this point. But proving that any test-failure is > specifically because of (C) is problematical. >=20 >=20 > Note: arm apparently has no equivalent of lwsync, > just of sync (aka. hwsync and sync 0). If I > understand correctly, PowerPC/Power has the weakest > memory model of the modern tier-1/tier-2 > architectures and, so, they might be broken for > memory model handling when everything else is > working. >=20 =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)