From owner-freebsd-arch@FreeBSD.ORG  Tue Oct 28 17:53:29 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 5A02A638;
 Tue, 28 Oct 2014 17:53:29 +0000 (UTC)
Received: from nibbler.fubar.geek.nz (nibbler.fubar.geek.nz [199.48.134.198])
 by mx1.freebsd.org (Postfix) with ESMTP id 2B9A8E55;
 Tue, 28 Oct 2014 17:53:28 +0000 (UTC)
Received: from bender.lan (97e078e7.skybroadband.com [151.224.120.231])
 by nibbler.fubar.geek.nz (Postfix) with ESMTPSA id 6EA175CC08;
 Tue, 28 Oct 2014 17:53:26 +0000 (UTC)
Date: Tue, 28 Oct 2014 17:53:18 +0000
From: Andrew Turner <andrew@fubar.geek.nz>
To: Attilio Rao <attilio@freebsd.org>
Subject: Re: atomic ops
Message-ID: <20141028175318.709d2ef6@bender.lan>
In-Reply-To: <CAJ-FndD=9MgK608ra8+eMy=cAdq+A0xRp9u3xFrwtPEk8eH4CA@mail.gmail.com>
References: <20141028025222.GA19223@dft-labs.eu>
 <CAJ-FndCWZt7YwFswt70QvbXA5c8Q_cYME2m3OwHTjCv8Nu3s=Q@mail.gmail.com>
 <20141028142510.10a9d3cb@bender.lan>
 <CAJ-FndD=9MgK608ra8+eMy=cAdq+A0xRp9u3xFrwtPEk8eH4CA@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Cc: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>,
 Adrian Chadd <adrian@freebsd.org>, Mateusz Guzik <mjguzik@gmail.com>,
 Konstantin Belousov <kib@freebsd.org>, Alan Cox <alc@rice.edu>
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Oct 2014 17:53:29 -0000

On Tue, 28 Oct 2014 15:33:06 +0100
Attilio Rao <attilio@freebsd.org> wrote:
> On Tue, Oct 28, 2014 at 3:25 PM, Andrew Turner <andrew@fubar.geek.nz>
> wrote:
> > On Tue, 28 Oct 2014 14:18:41 +0100
> > Attilio Rao <attilio@freebsd.org> wrote:
> >
> >> On Tue, Oct 28, 2014 at 3:52 AM, Mateusz Guzik <mjguzik@gmail.com>
> >> wrote:
> >> > As was mentioned sometime ago, our situation related to atomic
> >> > ops is not ideal.
> >> >
> >> > atomic_load_acq_* and atomic_store_rel_* (at least on amd64)
> >> > provide full memory barriers, which is stronger than needed.
> >> >
> >> > Moreover, load is implemented as lock cmpchg on var address, so
> >> > it is addditionally slower especially when cpus compete.
> >>
> >> I already explained this once privately: fully memory barriers is
> >> not stronger than needed.
> >> FreeBSD has a different semantic than Linux. We historically
> >> enforce a full barrier on _acq() and _rel() rather then just a
> >> read and write barrier, hence we need a different implementation
> >> than Linux. There is code that relies on this property, like the
> >> locking primitives (release a mutex, for instance).
> >
> > On 32-bit ARM prior to ARMv8 (i.e. all chips we currently support)
> > there are only full barriers. On both 32 and 64-bit ARMv8 ARM has
> > added support for load-acquire and store-release atomic
> > instructions. For the use in atomic instructions we can assume
> > these only operate of the address passed to them.
> >
> > It is unlikely we will use them in the 32-bit port however I would
> > like to know the expected semantics of these atomic functions to
> > make sure we get them correct in the arm64 port. I have been
> > advised by one of the ARM Linux kernel maintainers on the problems
> > they have found using these instructions but have yet to determine
> > what our atomic functions guarantee.
> 
> For FreeBSD the "reference doc" is atomic(9).
> It clearly states:

There may also be a difference between what it states, how they are
implemented, and what developers assume they do. I'm trying to make
sure I get them correct.

> The second variant of each operation includes a read memory barrier.
> This barrier ensures that the effects of this operation are completed
> before the effects of any later data accesses.  As a result, the
> opera- tion is said to have acquire semantics as it acquires a
> pseudo-lock requiring further operations to wait until it has
> completed.  To denote this, the suffix ``_acq'' is inserted into the
> function name immediately prior to the ``_<type>'' suffix.  For
> example, to subtract two integers ensuring that any later writes will
> happen after the subtraction is per- formed, use
> atomic_subtract_acq_int().

It depends on the point we guarantee the acquire barrier to be. On ARMv8
the function will be a load/modify/write sequence. If we use a
load-acquire operation for atomic_subtract_acq_int, for example, for a
pointer P and value to subtract X:

loop:
 load-acquire *P to N
 perform N = N - X
 store-exclusive N to *P
 if the store failed goto loop

where N and X are both registers.

This will mean no access after this loop will happen before it, but
they may happen within it, e.g. if there was a later access A the
following may be possible:

Load P
Access A
Store P

We know the store will happen as if it fails, e.g. another processor
access *P, the store will have failed and will iterate over the loop.

The other point is we can guarantee any store-release, and therefore
any prior access, has happened before a later load-acquire even if it's
on another processor.

...

> The bottom-side of all this is that read memory barriers ensures that
> the effect of the operations you are making (load in case of
> atomic_load_acq_int(), for example) are completed before any later
> data accesses. "Data accesses" qualifies for *all* the operations
> including read, writes, etc. This is very different by what Linux
> assumes for its rmb() barrier, for example which just orders loads. So
> for FreeBSD there is no _acq -> rmb() analogy and there is no _rel ->
> wmb() analogy.

On ARMv8 using the above pseudo-code the operation later operations
will not be moved before the load-acquire, but they may happen before
it's store. Having discussed this with John Baldwin I don't think this
is a problem due to the nature of the store operation being allowed to
fail if another processor has written its memory.

> 
> This must be kept well in mind when trying to optimize the atomic_*()
> operations.

At this point I'm more interested in getting them correct as they will
be important when I start on SMP support.

Andrew