From owner-freebsd-arch@FreeBSD.ORG  Tue Oct 28 13:18:44 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 9FDC94C1;
 Tue, 28 Oct 2014 13:18:44 +0000 (UTC)
Received: from mail-wi0-x22f.google.com (mail-wi0-x22f.google.com
 [IPv6:2a00:1450:400c:c05::22f])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id E626BA2D;
 Tue, 28 Oct 2014 13:18:43 +0000 (UTC)
Received: by mail-wi0-f175.google.com with SMTP id h11so7294853wiw.14
 for <multiple recipients>; Tue, 28 Oct 2014 06:18:41 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:reply-to:sender:in-reply-to:references:date:message-id
 :subject:from:to:cc:content-type;
 bh=FeHGoc80ObjiUvi06oKq+pUHv4vHaXaGrEZ7P6f5mec=;
 b=Xb9ZRDhWusFnS6KOK6De8hUD2/qpu2nu5kWjtchYMLsY0ye/G9QGLbLgZ4deZHRo7T
 hJ4ku3h9mmTfn35gZ0w+jN/cdBn5AyN6rDVDRhbQxYK2tojzTXP6Jd6JPpqT55WafK3L
 5sZD3s3CTwSFXG2lKJE3cV3N+ADZ3kaj2wqvI7Zg/BsCtKVBJ9DEX6LJp8MI0AUUPLWV
 IsbEWvAn9RYWZXdY5Uy7hHoB60/THTGHKxz2wu7VjA661ICWOjYpGyYKCS3zI+jbvOwz
 fGVwPCfTNFUYITy3EP+pRso3Q5Ptu/M+26TjaSQCU7GW+bhuE1wRqq3iD/PCqwQ0I48+
 KLKg==
MIME-Version: 1.0
X-Received: by 10.180.10.231 with SMTP id l7mr28262950wib.1.1414502321855;
 Tue, 28 Oct 2014 06:18:41 -0700 (PDT)
Reply-To: attilio@FreeBSD.org
Sender: asmrookie@gmail.com
Received: by 10.217.69.73 with HTTP; Tue, 28 Oct 2014 06:18:41 -0700 (PDT)
In-Reply-To: <20141028025222.GA19223@dft-labs.eu>
References: <20141028025222.GA19223@dft-labs.eu>
Date: Tue, 28 Oct 2014 14:18:41 +0100
X-Google-Sender-Auth: 1ORo-3u8UGc8pxN1KyKytrYDHKI
Message-ID: <CAJ-FndCWZt7YwFswt70QvbXA5c8Q_cYME2m3OwHTjCv8Nu3s=Q@mail.gmail.com>
Subject: Re: atomic ops
From: Attilio Rao <attilio@freebsd.org>
To: Mateusz Guzik <mjguzik@gmail.com>
Content-Type: text/plain; charset=UTF-8
Cc: Adrian Chadd <adrian@freebsd.org>, Alan Cox <alc@rice.edu>,
 Konstantin Belousov <kib@freebsd.org>,
 "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Oct 2014 13:18:44 -0000

On Tue, Oct 28, 2014 at 3:52 AM, Mateusz Guzik <mjguzik@gmail.com> wrote:
> As was mentioned sometime ago, our situation related to atomic ops is
> not ideal.
>
> atomic_load_acq_* and atomic_store_rel_* (at least on amd64) provide
> full memory barriers, which is stronger than needed.
>
> Moreover, load is implemented as lock cmpchg on var address, so it is
> addditionally slower especially when cpus compete.

I already explained this once privately: fully memory barriers is not
stronger than needed.
FreeBSD has a different semantic than Linux. We historically enforce a
full barrier on _acq() and _rel() rather then just a read and write
barrier, hence we need a different implementation than Linux.
There is code that relies on this property, like the locking
primitives (release a mutex, for instance).

In short: optimizing the implementation for performance is fine and
due. Changing the semantic is not fine, unless you have reviewed and
fixed all the uses of _rel() and _acq().

> On amd64 it is sufficient to place a compiler barrier in such cases.
>
> Next, we lack some atomic ops in the first place.
>
> Let's define some useful terms:
> smp_wmb - no writes can be reordered past this point
> smp_rmb - no reads can be reordered past this point
>
> With this in mind, we lack ops which would guarantee only the following:
>
> 1. var = tmp; smp_wmb();
> 2. tmp = var; smp_rmb();
> 3. smp_rmb(); tmp = var;
>
> This matters since what we can use already to emulate this is way
> heavier than needed on aforementioned amd64 and most likely other archs.

I can see the value of such barriers in case you want to just
synchronize operation regards read or writes.
I also believe that on newest intel processors (for which we should
optimize) rmb() and wmb() got significantly faster than mb(). However
the most interesting case would be for arm and mips, I assume. That's
where you would see a bigger perf difference if you optimize the
membar paths.

Last time I looked into it, in FreeBSD kernel the Linux-ish
rmb()/wmb()/etc. were used primilarly in 3 places: Linux-derived code,
handling of 16-bits operand and implementation of "faster" bus
barriers.
Initially I had thought about just confining the smp_*() in a Linux
compat layer and fix the other 2 in this way: for 16-bits operands
just pad to 32-bits, as the C11 standard also does. For the bus
barriers, just grow more versions to actually include the rmb()/wmb()
scheme within.

At this point, I understand we may want to instead  support the
concept of write-only or read-only barrier. This means that if we want
to keep the concept tied to the current _acq()/_rel() scheme we will
end up with a KPI explosion.

I'm not the one making the call here, but for a faster and more
granluar approach, possibly we can end up using smp_rmb() and
smp_wmb() directly. As I said I'm not the one making the call.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein