From owner-freebsd-arch@FreeBSD.ORG Tue Oct 28 13:18:44 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 9FDC94C1; Tue, 28 Oct 2014 13:18:44 +0000 (UTC) Received: from mail-wi0-x22f.google.com (mail-wi0-x22f.google.com [IPv6:2a00:1450:400c:c05::22f]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id E626BA2D; Tue, 28 Oct 2014 13:18:43 +0000 (UTC) Received: by mail-wi0-f175.google.com with SMTP id h11so7294853wiw.14 for ; Tue, 28 Oct 2014 06:18:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:sender:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=FeHGoc80ObjiUvi06oKq+pUHv4vHaXaGrEZ7P6f5mec=; b=Xb9ZRDhWusFnS6KOK6De8hUD2/qpu2nu5kWjtchYMLsY0ye/G9QGLbLgZ4deZHRo7T hJ4ku3h9mmTfn35gZ0w+jN/cdBn5AyN6rDVDRhbQxYK2tojzTXP6Jd6JPpqT55WafK3L 5sZD3s3CTwSFXG2lKJE3cV3N+ADZ3kaj2wqvI7Zg/BsCtKVBJ9DEX6LJp8MI0AUUPLWV IsbEWvAn9RYWZXdY5Uy7hHoB60/THTGHKxz2wu7VjA661ICWOjYpGyYKCS3zI+jbvOwz fGVwPCfTNFUYITy3EP+pRso3Q5Ptu/M+26TjaSQCU7GW+bhuE1wRqq3iD/PCqwQ0I48+ KLKg== MIME-Version: 1.0 X-Received: by 10.180.10.231 with SMTP id l7mr28262950wib.1.1414502321855; Tue, 28 Oct 2014 06:18:41 -0700 (PDT) Reply-To: attilio@FreeBSD.org Sender: asmrookie@gmail.com Received: by 10.217.69.73 with HTTP; Tue, 28 Oct 2014 06:18:41 -0700 (PDT) In-Reply-To: <20141028025222.GA19223@dft-labs.eu> References: <20141028025222.GA19223@dft-labs.eu> Date: Tue, 28 Oct 2014 14:18:41 +0100 X-Google-Sender-Auth: 1ORo-3u8UGc8pxN1KyKytrYDHKI Message-ID: Subject: Re: atomic ops From: Attilio Rao To: Mateusz Guzik Content-Type: text/plain; charset=UTF-8 Cc: Adrian Chadd , Alan Cox , Konstantin Belousov , "freebsd-arch@freebsd.org" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Oct 2014 13:18:44 -0000 On Tue, Oct 28, 2014 at 3:52 AM, Mateusz Guzik wrote: > As was mentioned sometime ago, our situation related to atomic ops is > not ideal. > > atomic_load_acq_* and atomic_store_rel_* (at least on amd64) provide > full memory barriers, which is stronger than needed. > > Moreover, load is implemented as lock cmpchg on var address, so it is > addditionally slower especially when cpus compete. I already explained this once privately: fully memory barriers is not stronger than needed. FreeBSD has a different semantic than Linux. We historically enforce a full barrier on _acq() and _rel() rather then just a read and write barrier, hence we need a different implementation than Linux. There is code that relies on this property, like the locking primitives (release a mutex, for instance). In short: optimizing the implementation for performance is fine and due. Changing the semantic is not fine, unless you have reviewed and fixed all the uses of _rel() and _acq(). > On amd64 it is sufficient to place a compiler barrier in such cases. > > Next, we lack some atomic ops in the first place. > > Let's define some useful terms: > smp_wmb - no writes can be reordered past this point > smp_rmb - no reads can be reordered past this point > > With this in mind, we lack ops which would guarantee only the following: > > 1. var = tmp; smp_wmb(); > 2. tmp = var; smp_rmb(); > 3. smp_rmb(); tmp = var; > > This matters since what we can use already to emulate this is way > heavier than needed on aforementioned amd64 and most likely other archs. I can see the value of such barriers in case you want to just synchronize operation regards read or writes. I also believe that on newest intel processors (for which we should optimize) rmb() and wmb() got significantly faster than mb(). However the most interesting case would be for arm and mips, I assume. That's where you would see a bigger perf difference if you optimize the membar paths. Last time I looked into it, in FreeBSD kernel the Linux-ish rmb()/wmb()/etc. were used primilarly in 3 places: Linux-derived code, handling of 16-bits operand and implementation of "faster" bus barriers. Initially I had thought about just confining the smp_*() in a Linux compat layer and fix the other 2 in this way: for 16-bits operands just pad to 32-bits, as the C11 standard also does. For the bus barriers, just grow more versions to actually include the rmb()/wmb() scheme within. At this point, I understand we may want to instead support the concept of write-only or read-only barrier. This means that if we want to keep the concept tied to the current _acq()/_rel() scheme we will end up with a KPI explosion. I'm not the one making the call here, but for a faster and more granluar approach, possibly we can end up using smp_rmb() and smp_wmb() directly. As I said I'm not the one making the call. Attilio -- Peace can only be achieved by understanding - A. Einstein