From owner-freebsd-arch@FreeBSD.ORG Tue Oct 28 17:53:29 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 5A02A638; Tue, 28 Oct 2014 17:53:29 +0000 (UTC) Received: from nibbler.fubar.geek.nz (nibbler.fubar.geek.nz [199.48.134.198]) by mx1.freebsd.org (Postfix) with ESMTP id 2B9A8E55; Tue, 28 Oct 2014 17:53:28 +0000 (UTC) Received: from bender.lan (97e078e7.skybroadband.com [151.224.120.231]) by nibbler.fubar.geek.nz (Postfix) with ESMTPSA id 6EA175CC08; Tue, 28 Oct 2014 17:53:26 +0000 (UTC) Date: Tue, 28 Oct 2014 17:53:18 +0000 From: Andrew Turner To: Attilio Rao Subject: Re: atomic ops Message-ID: <20141028175318.709d2ef6@bender.lan> In-Reply-To: References: <20141028025222.GA19223@dft-labs.eu> <20141028142510.10a9d3cb@bender.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: "freebsd-arch@freebsd.org" , Adrian Chadd , Mateusz Guzik , Konstantin Belousov , Alan Cox X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Oct 2014 17:53:29 -0000 On Tue, 28 Oct 2014 15:33:06 +0100 Attilio Rao wrote: > On Tue, Oct 28, 2014 at 3:25 PM, Andrew Turner > wrote: > > On Tue, 28 Oct 2014 14:18:41 +0100 > > Attilio Rao wrote: > > > >> On Tue, Oct 28, 2014 at 3:52 AM, Mateusz Guzik > >> wrote: > >> > As was mentioned sometime ago, our situation related to atomic > >> > ops is not ideal. > >> > > >> > atomic_load_acq_* and atomic_store_rel_* (at least on amd64) > >> > provide full memory barriers, which is stronger than needed. > >> > > >> > Moreover, load is implemented as lock cmpchg on var address, so > >> > it is addditionally slower especially when cpus compete. > >> > >> I already explained this once privately: fully memory barriers is > >> not stronger than needed. > >> FreeBSD has a different semantic than Linux. We historically > >> enforce a full barrier on _acq() and _rel() rather then just a > >> read and write barrier, hence we need a different implementation > >> than Linux. There is code that relies on this property, like the > >> locking primitives (release a mutex, for instance). > > > > On 32-bit ARM prior to ARMv8 (i.e. all chips we currently support) > > there are only full barriers. On both 32 and 64-bit ARMv8 ARM has > > added support for load-acquire and store-release atomic > > instructions. For the use in atomic instructions we can assume > > these only operate of the address passed to them. > > > > It is unlikely we will use them in the 32-bit port however I would > > like to know the expected semantics of these atomic functions to > > make sure we get them correct in the arm64 port. I have been > > advised by one of the ARM Linux kernel maintainers on the problems > > they have found using these instructions but have yet to determine > > what our atomic functions guarantee. > > For FreeBSD the "reference doc" is atomic(9). > It clearly states: There may also be a difference between what it states, how they are implemented, and what developers assume they do. I'm trying to make sure I get them correct. > The second variant of each operation includes a read memory barrier. > This barrier ensures that the effects of this operation are completed > before the effects of any later data accesses. As a result, the > opera- tion is said to have acquire semantics as it acquires a > pseudo-lock requiring further operations to wait until it has > completed. To denote this, the suffix ``_acq'' is inserted into the > function name immediately prior to the ``_'' suffix. For > example, to subtract two integers ensuring that any later writes will > happen after the subtraction is per- formed, use > atomic_subtract_acq_int(). It depends on the point we guarantee the acquire barrier to be. On ARMv8 the function will be a load/modify/write sequence. If we use a load-acquire operation for atomic_subtract_acq_int, for example, for a pointer P and value to subtract X: loop: load-acquire *P to N perform N = N - X store-exclusive N to *P if the store failed goto loop where N and X are both registers. This will mean no access after this loop will happen before it, but they may happen within it, e.g. if there was a later access A the following may be possible: Load P Access A Store P We know the store will happen as if it fails, e.g. another processor access *P, the store will have failed and will iterate over the loop. The other point is we can guarantee any store-release, and therefore any prior access, has happened before a later load-acquire even if it's on another processor. ... > The bottom-side of all this is that read memory barriers ensures that > the effect of the operations you are making (load in case of > atomic_load_acq_int(), for example) are completed before any later > data accesses. "Data accesses" qualifies for *all* the operations > including read, writes, etc. This is very different by what Linux > assumes for its rmb() barrier, for example which just orders loads. So > for FreeBSD there is no _acq -> rmb() analogy and there is no _rel -> > wmb() analogy. On ARMv8 using the above pseudo-code the operation later operations will not be moved before the load-acquire, but they may happen before it's store. Having discussed this with John Baldwin I don't think this is a problem due to the nature of the store operation being allowed to fail if another processor has written its memory. > > This must be kept well in mind when trying to optimize the atomic_*() > operations. At this point I'm more interested in getting them correct as they will be important when I start on SMP support. Andrew