From owner-freebsd-arch@FreeBSD.ORG Thu Oct 30 19:05:47 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 2B4F8E28; Thu, 30 Oct 2014 19:05:47 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 018BD619; Thu, 30 Oct 2014 19:05:47 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 472A0B923; Thu, 30 Oct 2014 15:05:45 -0400 (EDT) From: John Baldwin To: Andrew Turner Subject: Re: atomic ops Date: Thu, 30 Oct 2014 15:03:13 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.4-CBSD-20140415; KDE/4.5.5; amd64; ; ) References: <20141028025222.GA19223@dft-labs.eu> <201410291335.57919.jhb@freebsd.org> <20141030181048.4cbeeec6@bender.lan> In-Reply-To: <20141030181048.4cbeeec6@bender.lan> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201410301503.14225.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Thu, 30 Oct 2014 15:05:45 -0400 (EDT) Cc: Adrian Chadd , Mateusz Guzik , Ian Lepore , Alan Cox , attilio@freebsd.org, Konstantin Belousov , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Oct 2014 19:05:47 -0000 On Thursday, October 30, 2014 2:10:48 pm Andrew Turner wrote: > On Wed, 29 Oct 2014 13:35:57 -0400 > John Baldwin wrote: > > On Wednesday, October 29, 2014 12:58:15 pm Ian Lepore wrote: > > > Next, when we consider 'Access A' I'm not sure it's true that the > > > access will replay if the store-exclusive fails and the operation > > > loops. The access to A may have been a prefetch, even a prefetch > > > for data on a predicted upcoming execution branch which may or may > > > not end up being taken. > > > > > > I think the only think that makes an ldrex/strex sequence safe for > > > use in implementing synchronization primitives is to insert a 'dmb' > > > after the acquire loop (after the strex succeeds), and 'dsb' before > > > the release loop (dsb is required for SMP, dmb might be good enough > > > on UP). > > > > > > Looking into this has made me realize our current armv6/7 atomics > > > are incorrect in this regard. Guess I'll see about fixing them up > > > Real Soon Now. :) > > > > I'm not actually sure either, but it would be surprising to me > > otherwise. Presumably there is nothing magic about a branch. Either > > the load-acquire is an acquire barrier or it isn't. Namely, suppose > > you had this sequence: > > > > load-acquire P > > access A (prefetch) > > load-acquire Q > > load A > > > > Would you expect the prefetch to satisfy the load or should the > > load-acquire on Q discard that? Having a branch after a failing > > conditional store back to the load acquire should work similarly. It > > has to discard anything that was prefetched or it isn't an actual > > load-acquire. > > I have checked with someone in ARM. The prefetch should not be > considered an access with regard to the barrier and it could be moved > before it as it will only load data into the cache. The barrier only > deals with loading data into the core, i.e. if it has was part of the > prefetch it will be loaded from the cache no earlier than the > load-acquire. The cache coherency protocol ensures the data will be up > to date while the barrier will ensure the ordering of the load of A. > > In the above example the prefetch of A will not be thrown away but the > data in the cache may change between the prefetch and load A if another > core has written to A. If this is the case the load will be of the new > data. That is sufficient for what atomic(9)'s _acq wants, yes. -- John Baldwin