From owner-freebsd-arch@FreeBSD.ORG Sat Oct 4 08:11:14 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 98BE5BE6; Sat, 4 Oct 2014 08:11:14 +0000 (UTC) Received: from mail-wi0-x230.google.com (mail-wi0-x230.google.com [IPv6:2a00:1450:400c:c05::230]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B2C0212B; Sat, 4 Oct 2014 08:11:13 +0000 (UTC) Received: by mail-wi0-f176.google.com with SMTP id hi2so689340wib.9 for ; Sat, 04 Oct 2014 01:11:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=UpQZ96dAkgbs8w6T/ochgxr6pGsklMqJwyMRoivW73E=; b=bVzyTUCV/jQ9DfJwnWCAoIY2etk9lN+JRbr7XjWoRf6blKUE8gEFzGl1QvJQDtbHZz K7UoA6jVK5UWwMroQajXYOpOkWtvN83kM5k7W8O8aN9dtXqy3/ffJtuFeO1pCozSZefu hcdg2GwZgI0hvEQ58RqYn7Xga7bGp3GDYYXQTOFTdkvFKXxDfvD/Zv3UYFT6ESGuZ3ES oauvtDZvy2ZqDYQrONm0JDvqomlTUlT18x+cVCp69Qi+OWT+xVG64sXQxBoUv3gP5WPJ le7pRUThfR1IBgS6v2R+PY6hIBX9pNmuCFIh3X0ViTtJ8s1+Nr8BU9+Ch5AdQ1jn7BWy K4kQ== X-Received: by 10.180.188.49 with SMTP id fx17mr3949028wic.17.1412410271986; Sat, 04 Oct 2014 01:11:11 -0700 (PDT) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by mx.google.com with ESMTPSA id hp2sm9510160wjb.40.2014.10.04.01.11.10 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Sat, 04 Oct 2014 01:11:11 -0700 (PDT) Date: Sat, 4 Oct 2014 10:11:08 +0200 From: Mateusz Guzik To: Konstantin Belousov Subject: Re: [PATCH 1/2] Implement simple sequence counters with memory barriers. Message-ID: <20141004081108.GB17491@dft-labs.eu> References: <1408064112-573-1-git-send-email-mjguzik@gmail.com> <1408064112-573-2-git-send-email-mjguzik@gmail.com> <20140816093811.GX2737@kib.kiev.ua> <20140816185406.GD2737@kib.kiev.ua> <20140817012646.GA21025@dft-labs.eu> <20141004052851.GA27891@dft-labs.eu> <20141004071139.GL26076@kib.kiev.ua> <20141004074049.GA17491@dft-labs.eu> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20141004074049.GA17491@dft-labs.eu> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: alc@freebsd.org, attilio@freebsd.org, Johan Schuijt , "freebsd-arch@freebsd.org" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 04 Oct 2014 08:11:14 -0000 On Sat, Oct 04, 2014 at 09:40:50AM +0200, Mateusz Guzik wrote: > On Sat, Oct 04, 2014 at 10:11:39AM +0300, Konstantin Belousov wrote: > > On Sat, Oct 04, 2014 at 07:28:51AM +0200, Mateusz Guzik wrote: > > > Reviving. Sorry everyone for such big delay, $life. > > > > > > On Tue, Aug 19, 2014 at 02:24:16PM -0500, Alan Cox wrote: > > > > On Sat, Aug 16, 2014 at 8:26 PM, Mateusz Guzik wrote: > > > > > Well, my memory-barrier-and-so-on-fu is rather weak. > > > > > > > > > > I had another look at the issue. At least on amd64, it looks like only > > > > > compiler barrier is required for both reads and writes. > > > > > > > > > > According to AMD64 Architecture Programmer???s Manual Volume 2: System > > > > > Programming, 7.2 Multiprocessor Memory Access Ordering states: > > > > > > > > > > "Loads do not pass previous loads (loads are not reordered). Stores do > > > > > not pass previous stores (stores are not reordered)" > > > > > > > > > > Since the code modifying stuff only performs a series of writes and we > > > > > expect exclusive writers, I find it applicable to this scenario. > > > > > > > > > > I checked linux sources and generated assembly, they indeed issue only > > > > > a compiler barrier on amd64 (and for intel processors as well). > > > > > > > > > > atomic_store_rel_int on amd64 seems fine in this regard, but the only > > > > > function for loads issues lock cmpxhchg which kills performance > > > > > (median 55693659 -> 12789232 ops in a microbenchmark) for no gain. > > > > > > > > > > Additionally release and acquire semantics seems to be a stronger than > > > > > needed guarantee. > > > > > > > > > > > > > > > > > > This statement left me puzzled and got me to look at our x86 atomic.h for > > > > the first time in years. It appears that our implementation of > > > > atomic_load_acq_int() on x86 is, umm ..., unconventional. That is, it is > > > > enforcing a constraint that simple acquire loads don't normally enforce. > > > > For example, the C11 stdatomic.h simple acquire load doesn't enforce this > > > > constraint. Moreover, our own implementation of atomic_load_acq_int() on > > > > ia64, where the mapping from atomic_load_acq_int() to machine instructions > > > > is straightforward, doesn't enforce this constraint either. > > > > > > > > > > By 'this constraint' I presume you mean full memory barrier. > > > > > > It is unclear to me if one can just get rid of it currently. It > > > definitely would be beneficial. > > > > > > In the meantime, if for some reason full barrier is still needed, we can > > > speed up concurrent load_acq of the same var considerably. There is no > > > need to lock cmpxchg on the same address. We should be able to replace > > > it with +/-: > > > lock add $0,(%rsp); > > > movl ...; > > > > > > I believe it is possible that cpu will perform some writes before doing > > > read listed here, but this should be fine. > > > > > > If this is considered too risky to hit 10.1, I would like to implement > > > it within seq as a temporary hack to be fixed up later. > > > > > > something along: > > > static inline int > > > atomic_load_acq_rmb(volatile u_int *p) > > > { > > > volaitle u_int *v; > > > > > > v = *p; > > > atomic_load_acq(&v); > > > return (v); > > > } > > Do you need it as designated primitive ? I think you could write this > > inline for the purpose of getting the fix into 10.1. > > > > With the inline quirk, I think that the fix should go into the HEAD > > now, with some reasonable MFC timer. > > Well, proposed seq.h is here: > https://people.freebsd.org/~mjg/seq.h > > Note it uses atomic_add_{acq,rel}_int for lack of appropriate > _store_acq. This is temporary as well. > > I also do realize atomic_load_acq_rmb_int sounds at least weird, but > well... > > And here is the rest of the patch just in case: > https://people.freebsd.org/~mjg/plug-caps-race.diff > After a short off-list discussion committed as https://svnweb.freebsd.org/base?view=revision&revision=272503 and https://svnweb.freebsd.org/base?view=revision&revision=272505 respectively. -- Mateusz Guzik