From owner-freebsd-arch@FreeBSD.ORG  Sat Oct  4 07:40:56 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 0ADC6CC7;
 Sat,  4 Oct 2014 07:40:56 +0000 (UTC)
Received: from mail-wg0-x22f.google.com (mail-wg0-x22f.google.com
 [IPv6:2a00:1450:400c:c00::22f])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 2298CDF5;
 Sat,  4 Oct 2014 07:40:54 +0000 (UTC)
Received: by mail-wg0-f47.google.com with SMTP id x13so3048410wgg.18
 for <multiple recipients>; Sat, 04 Oct 2014 00:40:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=date:from:to:cc:subject:message-id:references:mime-version
 :content-type:content-disposition:in-reply-to:user-agent;
 bh=UoGVXadj9OtpFOgg790yonjvLS0iBPVKW4fX1U8dpEs=;
 b=Bq91ld8oXLZGH6ufUjL28nmpDteZy11rQnmK8tPVVAOvo2MTYosOle7tTdKUbHkyVT
 s0PkcPsMA7IL7wVt9Eg9ac7ZQ5eSgjEaWM/iqF4C7GZlVAJrzHSZNqpY3VmJpeVchN4q
 Bj5Fv1i1wNZq4yR46HOyS9OvTuVyB8yDIx9dT19VzgIaxBtvTf+HvMTrx8j8+EvZa7GE
 VJ0XIOQnbcfd5nLSu4oAr7zpAvdkoS4fvzf+8kHrMhPvU6Tnh7vblsP0QnGMkVijVTM2
 NnI1hjop1Gvr+SyHhtBhMnh4URsEKIiUtQvcPhFiRRQIj4Aou/WX5AL5Sy8B7Rjd17Tk
 3Njw==
X-Received: by 10.180.101.70 with SMTP id fe6mr3716257wib.37.1412408453465;
 Sat, 04 Oct 2014 00:40:53 -0700 (PDT)
Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net.
 [2001:470:1f08:1f7::2])
 by mx.google.com with ESMTPSA id dc9sm4243920wib.5.2014.10.04.00.40.52
 for <multiple recipients>
 (version=TLSv1.2 cipher=RC4-SHA bits=128/128);
 Sat, 04 Oct 2014 00:40:52 -0700 (PDT)
Date: Sat, 4 Oct 2014 09:40:50 +0200
From: Mateusz Guzik <mjguzik@gmail.com>
To: Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: [PATCH 1/2] Implement simple sequence counters with memory
 barriers.
Message-ID: <20141004074049.GA17491@dft-labs.eu>
References: <1408064112-573-1-git-send-email-mjguzik@gmail.com>
 <1408064112-573-2-git-send-email-mjguzik@gmail.com>
 <20140816093811.GX2737@kib.kiev.ua>
 <20140816185406.GD2737@kib.kiev.ua>
 <20140817012646.GA21025@dft-labs.eu>
 <CAJUyCcPA7ZDNbwyfx3fT7mq3SE7M-mL5he=eXZ8bY3z-xUCJ-g@mail.gmail.com>
 <20141004052851.GA27891@dft-labs.eu>
 <20141004071139.GL26076@kib.kiev.ua>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <20141004071139.GL26076@kib.kiev.ua>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: alc@freebsd.org, attilio@freebsd.org, Johan Schuijt <johan@transip.nl>,
 "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 04 Oct 2014 07:40:56 -0000

On Sat, Oct 04, 2014 at 10:11:39AM +0300, Konstantin Belousov wrote:
> On Sat, Oct 04, 2014 at 07:28:51AM +0200, Mateusz Guzik wrote:
> > Reviving. Sorry everyone for such big delay, $life.
> > 
> > On Tue, Aug 19, 2014 at 02:24:16PM -0500, Alan Cox wrote:
> > > On Sat, Aug 16, 2014 at 8:26 PM, Mateusz Guzik <mjguzik@gmail.com> wrote:
> > > > Well, my memory-barrier-and-so-on-fu is rather weak.
> > > >
> > > > I had another look at the issue. At least on amd64, it looks like only
> > > > compiler barrier is required for both reads and writes.
> > > >
> > > > According to AMD64 Architecture Programmer???s Manual Volume 2: System
> > > > Programming, 7.2 Multiprocessor Memory Access Ordering states:
> > > >
> > > > "Loads do not pass previous loads (loads are not reordered). Stores do
> > > > not pass previous stores (stores are not reordered)"
> > > >
> > > > Since the code modifying stuff only performs a series of writes and we
> > > > expect exclusive writers, I find it applicable to this scenario.
> > > >
> > > > I checked linux sources and generated assembly, they indeed issue only
> > > > a compiler barrier on amd64 (and for intel processors as well).
> > > >
> > > > atomic_store_rel_int on amd64 seems fine in this regard, but the only
> > > > function for loads issues lock cmpxhchg which kills performance
> > > > (median 55693659 -> 12789232 ops in a microbenchmark) for no gain.
> > > >
> > > > Additionally release and acquire semantics seems to be a stronger than
> > > > needed guarantee.
> > > >
> > > >
> > > 
> > > This statement left me puzzled and got me to look at our x86 atomic.h for
> > > the first time in years.  It appears that our implementation of
> > > atomic_load_acq_int() on x86 is, umm ..., unconventional.  That is, it is
> > > enforcing a constraint that simple acquire loads don't normally enforce.
> > > For example, the C11 stdatomic.h simple acquire load doesn't enforce this
> > > constraint.  Moreover, our own implementation of atomic_load_acq_int() on
> > > ia64, where the mapping from atomic_load_acq_int() to machine instructions
> > > is straightforward, doesn't enforce this constraint either.
> > > 
> > 
> > By 'this constraint' I presume you mean full memory barrier.
> > 
> > It is unclear to me if one can just get rid of it currently. It
> > definitely would be beneficial.
> > 
> > In the meantime, if for some reason full barrier is still needed, we can
> > speed up concurrent load_acq of the same var considerably. There is no
> > need to lock cmpxchg on the same address. We should be able to replace
> > it with +/-:
> > lock add $0,(%rsp);
> > movl ...;
> > 
> > I believe it is possible that cpu will perform some writes before doing
> > read listed here, but this should be fine.
> > 
> > If this is considered too risky to hit 10.1, I would like to implement
> > it within seq as a temporary hack to be fixed up later.
> > 
> > something along:
> > static inline int
> > atomic_load_acq_rmb(volatile u_int *p)
> > {
> > 	volaitle u_int *v;
> > 
> > 	v = *p;
> > 	atomic_load_acq(&v);
> > 	return (v);
> > }
> Do you need it as designated primitive ?  I think you could write this
> inline for the purpose of getting the fix into 10.1.
> 
> With the inline quirk, I think that the fix should go into the HEAD
> now, with some reasonable MFC timer.

Well, proposed seq.h is here:
https://people.freebsd.org/~mjg/seq.h

Note it uses atomic_add_{acq,rel}_int for lack of appropriate
_store_acq. This is temporary as well.

I also do realize atomic_load_acq_rmb_int sounds at least weird, but
well...

And here is the rest of the patch just in case:
https://people.freebsd.org/~mjg/plug-caps-race.diff

-- 
Mateusz Guzik <mjguzik gmail.com>