Date: Tue, 28 Oct 2014 15:33:06 +0100 From: Attilio Rao <attilio@freebsd.org> To: Andrew Turner <andrew@fubar.geek.nz> Cc: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>, Adrian Chadd <adrian@freebsd.org>, Mateusz Guzik <mjguzik@gmail.com>, Konstantin Belousov <kib@freebsd.org>, Alan Cox <alc@rice.edu> Subject: Re: atomic ops Message-ID: <CAJ-FndD=9MgK608ra8%2BeMy=cAdq%2BA0xRp9u3xFrwtPEk8eH4CA@mail.gmail.com> In-Reply-To: <20141028142510.10a9d3cb@bender.lan> References: <20141028025222.GA19223@dft-labs.eu> <CAJ-FndCWZt7YwFswt70QvbXA5c8Q_cYME2m3OwHTjCv8Nu3s=Q@mail.gmail.com> <20141028142510.10a9d3cb@bender.lan>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Oct 28, 2014 at 3:25 PM, Andrew Turner <andrew@fubar.geek.nz> wrote: > On Tue, 28 Oct 2014 14:18:41 +0100 > Attilio Rao <attilio@freebsd.org> wrote: > >> On Tue, Oct 28, 2014 at 3:52 AM, Mateusz Guzik <mjguzik@gmail.com> >> wrote: >> > As was mentioned sometime ago, our situation related to atomic ops >> > is not ideal. >> > >> > atomic_load_acq_* and atomic_store_rel_* (at least on amd64) provide >> > full memory barriers, which is stronger than needed. >> > >> > Moreover, load is implemented as lock cmpchg on var address, so it >> > is addditionally slower especially when cpus compete. >> >> I already explained this once privately: fully memory barriers is not >> stronger than needed. >> FreeBSD has a different semantic than Linux. We historically enforce a >> full barrier on _acq() and _rel() rather then just a read and write >> barrier, hence we need a different implementation than Linux. >> There is code that relies on this property, like the locking >> primitives (release a mutex, for instance). > > On 32-bit ARM prior to ARMv8 (i.e. all chips we currently support) > there are only full barriers. On both 32 and 64-bit ARMv8 ARM has added > support for load-acquire and store-release atomic instructions. For the > use in atomic instructions we can assume these only operate of the > address passed to them. > > It is unlikely we will use them in the 32-bit port however I would like > to know the expected semantics of these atomic functions to make sure > we get them correct in the arm64 port. I have been advised by one of > the ARM Linux kernel maintainers on the problems they have found using > these instructions but have yet to determine what our atomic functions > guarantee. For FreeBSD the "reference doc" is atomic(9). It clearly states: The second variant of each operation includes a read memory barrier. This barrier ensures that the effects of this operation are completed before the effects of any later data accesses. As a result, the opera- tion is said to have acquire semantics as it acquires a pseudo-lock requiring further operations to wait until it has completed. To denote this, the suffix ``_acq'' is inserted into the function name immediately prior to the ``_<type>'' suffix. For example, to subtract two integers ensuring that any later writes will happen after the subtraction is per- formed, use atomic_subtract_acq_int(). The third variant of each operation includes a write memory barrier. This ensures that all effects of all previous data accesses are completed before this operation takes place. As a result, the operation is said to have release semantics as it releases any pending data accesses to be completed before its operation is performed. To denote this, the suffix ``_rel'' is inserted into the function name immediately prior to the ``_<type>'' suffix. For example, to add two long integers ensuring that all previous writes will happen first, use atomic_add_rel_long(). The bottom-side of all this is that read memory barriers ensures that the effect of the operations you are making (load in case of atomic_load_acq_int(), for example) are completed before any later data accesses. "Data accesses" qualifies for *all* the operations including read, writes, etc. This is very different by what Linux assumes for its rmb() barrier, for example which just orders loads. So for FreeBSD there is no _acq -> rmb() analogy and there is no _rel -> wmb() analogy. This must be kept well in mind when trying to optimize the atomic_*() operations. Attilio -- Peace can only be achieved by understanding - A. Einstein
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-FndD=9MgK608ra8%2BeMy=cAdq%2BA0xRp9u3xFrwtPEk8eH4CA>