From owner-freebsd-hackers@freebsd.org Fri Apr 19 04:36:31 2019 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D91A11585168 for ; Fri, 19 Apr 2019 04:36:30 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic308-8.consmr.mail.gq1.yahoo.com (sonic308-8.consmr.mail.gq1.yahoo.com [98.137.68.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 84C4C8D65C for ; Fri, 19 Apr 2019 04:36:29 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-YMail-OSG: zifi6gMVM1maUKrXsjrazmqUDrou5z2vRtV7EQuyIMLvYXmqCPKykO6i11jAByj mFTo9VwqsmxbBS7vxcr33WTGlxP3J7xXOVEWlEQMXITRudWlyV_04qBa0feY.fBgn3rk27lImQcv YgVQqngEwkN2jeHwIp.V937XqSN9dRKAvw2UQ59fUgp5ssCdE0q7fxp33DmjVjgZpbL0BgV66.7C KznQ6FrQjBhBdpA1tutj4_wKYOQKnN3GRlNeHJl7a93LUEPw1_v2x21.0FMxvqYd0ocpxf0dx8m8 vw.OkUt1qRYl5KqSG8betcNMyDBZO4KfFxgbeD4bHNwoXA3fGp16FUgc8s0fBuRMgtb6XSGI_qjN 1b3BHq.M5AV.tEcXZ5kgRnpv08fRKkfDkPfA1HoBcjmoYNtpqKJNDEQYqvCCF9ui.TZLY4F8RDwX _MxuBC1ROihnwo4Aczr06WlfAP46vEh7Z37SvVbG2nHhcP8NSElQcWh9_q_3js.z3vtVr8JOTD3r APuzNwY6kytxXCup6cHJuri.gGRuPK3ehTLFToLn.QMHKw5YN4rRSEMk3X9EAH.87BJQS_hi9P7P bhfwek6EhwZjJLSQ7tFKm6tm_LeyO.pAU2yS6M8.HJPDExkcEF3zRCbuxvE7ey0muh7FeG1GZNoj t9Sh3_B3DdigsfaStvTlFNTg_DRIOf5WilcYH_n_oI0G5rHrPLOsfdcx67RZHHQ7xQpysnzoFeSR 2u6kGkk9GLoQexA4tjEKKTkym.7WL5ahIcQN.6VwjgSsLVbJGEIusYmzTDauNrWdJXuGYLQnBh1Y ZUkQ_KXm.H2qTSE9Zho4P23uK3EzV7h7tDNRrDrT4pfS2McTEHoehIGKMiczVFLvYtpnIRnybjfs xvA.gIxA2jMsO4DfyeILaGDjSZ9SSKOcmINMFPLbl9DSWvKdbudnNasZG9whV_6Z6eZvF_9vME_g QRkxIdHTlF88X2.MQPTkWmbEBy.Ql9u.6FwGSle0arqpWFMOsbeNVO7QxQFs3VMmPV3nTuYI3gzQ 4FK6CamSjLHK3AVIicx6DM72XVKLE7zSC.fRNEe3Bke67rzFVAxZWbTJFGUln6hSoP7_3vP0COPk mO7tFef8t5DXya.F4g_YbizkPUZfPFFob Received: from sonic.gate.mail.ne1.yahoo.com by sonic308.consmr.mail.gq1.yahoo.com with HTTP; Fri, 19 Apr 2019 04:36:22 +0000 Received: from c-76-115-7-162.hsd1.or.comcast.net (EHLO [192.168.1.103]) ([76.115.7.162]) by smtp412.mail.gq1.yahoo.com (Oath Hermes SMTP Server) with ESMTPA ID cb8240f959e44ad1296b1c03198b64c7; Fri, 19 Apr 2019 04:36:20 +0000 (UTC) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.8\)) Subject: powerpc64 or 32-bit power context: FreeBSD lwsync use vs. th->th_generation handling (and related th-> fields) looks broken to me Message-Id: <50CFD7F1-6892-4375-967B-4713517C2520@yahoo.com> Date: Thu, 18 Apr 2019 21:36:19 -0700 Cc: Bruce Evans , Konstantin Belousov To: FreeBSD PowerPC ML , freebsd-hackers Hackers X-Mailer: Apple Mail (2.3445.104.8) X-Rspamd-Queue-Id: 84C4C8D65C X-Spamd-Bar: + X-Spamd-Result: default: False [1.59 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; MV_CASE(0.50)[]; FREEMAIL_FROM(0.00)[yahoo.com]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; MX_GOOD(-0.01)[cached: mta6.am0.yahoodns.net]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/21, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com.dwl.dnswl.org : 127.0.5.0]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; NEURAL_SPAM_SHORT(0.97)[0.972,0]; NEURAL_HAM_LONG(-0.31)[-0.314,0]; MIME_GOOD(-0.10)[text/plain]; RCVD_TLS_LAST(0.00)[]; NEURAL_SPAM_MEDIUM(0.42)[0.415,0]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[32.68.137.98.list.dnswl.org : 127.0.5.0]; IP_SCORE(1.03)[ip: (3.52), ipnet: 98.137.64.0/21(0.93), asn: 36647(0.74), country: US(-0.06)]; FREEMAIL_CC(0.00)[optusnet.com.au] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Apr 2019 04:36:31 -0000 First I review below lwsync behavior. It is based on a = comparison/contrast paper for the powerpc vs. arm memory models. It sets context for later material specific to powerpc64 or 32-bit powerpc FreeBSD. "For a write before a read, separated by a lwsync, the barrier will = ensure that the write is committed before the read is satisfied but lets the read be satisfied = before the write has been propagated to any other thread." (By contrast, sync, guarantees that the write has propagated to all = threads before the read in question is satisfied, the read having been separated from the = write by the sync.) Another wording in case it helps (from the same paper): "The POWER lwsync does *not* ensure that writes before the barrier have = propagated to any other thread before sequent actions, though it does keep writes = before and after an lwsync in order as far as [each thread is] concerned". (Original used = plural form: "all threads are". I tired to avoid any potential implication of cross = (hardware) "thread" ordering constraints for seeing the updates when lwsync is = used.) Next I note FreeBSD powerpc64 and 32-bit powerpc details that happen to involve lwsync, though lwsync is not the only issue: atomic_store_rel_int(&th->th_generation, ogen); and: gen =3D atomic_load_acq_int(&th->th_generation); with: static __inline void \ atomic_store_rel_##TYPE(volatile u_##TYPE *p, u_##TYPE v) \ { \ \ powerpc_lwsync(); \ *p =3D v; \ } and: static __inline u_##TYPE \ atomic_load_acq_##TYPE(volatile u_##TYPE *p) \ { \ u_##TYPE v; \ \ v =3D *p; \ powerpc_lwsync(); \ return (v); \ } \ also: static __inline void atomic_thread_fence_acq(void) { powerpc_lwsync(); } First I list a simpler-than-full-context example to try to make things clearer . . . Here is a sequence, listing in an overall time order, omitting other activity, despite the distinct cpus, (N!=3DM): (Presume th->th_generation=3D=3Dogen-1 initially, then:) cpu N: atomic_store_rel_int(&th->th_generation, ogen); (same th value as for cpu M below) cpu M: gen =3D atomic_load_acq_int(&th->th_generation); For the above sequence: There is no barrier between the store and the later load at all. This is important below. So, if I have that much right . . . Now for more actual "load side" context: (Presume, for simplicity, that there is only one=20 timehands instance instead of 2 or more timehands. So th does not vary below and is the same on both cpu's in the later example sequence of activity.) do { th =3D timehands; gen =3D atomic_load_acq_int(&th->th_generation); *bt =3D th->th_offset; bintime_addx(bt, th->th_scale * tc_delta(th)); atomic_thread_fence_acq(); } while (gen =3D=3D 0 || gen !=3D th->th_generation); For simplicity of referring to things: I again show a specific sequence in time. I only show the &th->th_generation activity from cpu N, again for simplicity. (Presume timehands->th_generation=3D=3Dogen-1 initially and that M!=3DN:) cpu M: th =3D timehands; (Could be after the "cpu N" lines.) cpu N: atomic_store_rel_int(&th->th_generation, ogen); (same th value as for cpu M) cpu M: gen =3D atomic_load_acq_int(&th->th_generation); cpu M: *bt =3D th->th_offset; cpu M: bintime_addx(bt, th->th_scale * tc_delta(th)); cpu M: atomic_thread_fence_acq(); cpu M: gen !=3D th->th_generation (evaluated to false or to true) So here: A) gen ends up with: gen=3D=3Dogen-1 || gen=3D=3Dogen (either is allowed because of the lack of any barrier between the store and the involved load). B) When gen=3D=3Dogen: there was no barrier before the assignment to gen to guarantee other th-> field-value staging relationships. C) When gen=3D=3Dogen: gen!=3Dth->th_generation false does not guarantee the *bt=3D. . . and bintime_addx(. . .) activities were based on a coherent set of th-> field-values. If I'm correct about (C) then the likes of the binuptime and sbinuptime implementations appear to be broken on powerpc64 and 32-bit powerpc unless there are extra guarantees always present. So have I found at least a powerpc64/32-bit-powerpc FreeBSD implementation problem? Note: While I'm still testing, I've seen problems on the two 970MP based 2-socket/2-cores-each G5 PowerMac11,2's that I've so far not seen on three 2-socket/1-core-each PowerMacs, two such 7455 G4 PowerMac3,6's and one such 970 G5 PowerMac7,2. The two PowerMac11,2's are far more tested at this point. But proving that any test-failure is specifically because of (C) is problematical. Note: arm apparently has no equivalent of lwsync, just of sync (aka. hwsync and sync 0). If I understand correctly, PowerPC/Power has the weakest memory model of the modern tier-1/tier-2 architectures and, so, they might be broken for memory model handling when everything else is working. =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)