From owner-svn-src-all@FreeBSD.ORG Sun Jun 16 15:06:28 2013 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 35A406D1; Sun, 16 Jun 2013 15:06:28 +0000 (UTC) (envelope-from marius@alchemy.franken.de) Received: from alchemy.franken.de (alchemy.franken.de [194.94.249.214]) by mx1.freebsd.org (Postfix) with ESMTP id B79BF1BC7; Sun, 16 Jun 2013 15:06:27 +0000 (UTC) Received: from alchemy.franken.de (localhost [127.0.0.1]) by alchemy.franken.de (8.14.6/8.14.6/ALCHEMY.FRANKEN.DE) with ESMTP id r5GF6JXw034114; Sun, 16 Jun 2013 17:06:20 +0200 (CEST) (envelope-from marius@alchemy.franken.de) Received: (from marius@localhost) by alchemy.franken.de (8.14.6/8.14.6/Submit) id r5GF6JSo034113; Sun, 16 Jun 2013 17:06:19 +0200 (CEST) (envelope-from marius) Date: Sun, 16 Jun 2013 17:06:19 +0200 From: Marius Strobl To: Ed Schouten Subject: Re: svn commit: r251782 - head/sys/sparc64/sparc64 Message-ID: <20130616150619.GI91573@alchemy.franken.de> References: <201306150821.r5F8Lst5089231@svn.freebsd.org> <20130615125651.GH91573@alchemy.franken.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Jun 2013 15:06:28 -0000 On Sun, Jun 16, 2013 at 01:49:09PM +0200, Ed Schouten wrote: > Hi Marius, > > 2013/6/15 Marius Strobl : > > Semantically, this change is wrong; what we really need here is an > > acquire variant. Using the release variant instead happens to also > > work - albeit additionally wastes CPU cycles for the write memory > > barrier - because in total store order, atomic operations implicitly > > include the read memory barrier necessary for acquire semantics. In > > other words, atomic(9) is in dare need of atomic_store_acq_(). > > I personally dislike the idea of extending the existing atomic(9) API. > My long-term goal would be that we could just use C11 atomics instead > of using our own home-grown API. If we can't express this using the > atomic(9) API, I'd just like us to use instead. > > Reading up on the C11 standard (section 5.1.2.4), it seems that the > abstract model of threads described does not allow stores to be a > acquire operations. This does make sense, though. A load can of course > not be a release operation. Because releases synchronize with > acquires, a store being an acquire operation would have nothing to > synchronize with. > > So I guess in this case we should solve it by using a relaxed store, > followed by an acquire fence: > > http://80386.nl/pub/sparc64-atomic.txt > > Would that work for you? > Generally, I dislike the concept of passing control of how atomics are implemented over to the compiler, at least as far as the kernel is concerned and especially since the actual code generated for them depends on the flavour and version of the compiler used. It's also unclear to me how the C11 memory orders relate to SPARC v9 memory models. We run the kernel and all of userland in total store order regardless of the memory model denoted in the ELF header (which at least GCC 4.2.1 uses relaxed memory ordering for). Partially that is because - contrary to what one might expect - it turned out that employing total store order and its implicit memory barriers per- forms considerably better than running things with relaxed memory order and inserting explicit memory barriers as needed. With GCC 4.2.1 and your change to kern_event.c to use C11 atomics what happens is that the compiler includes memory barriers in a sledgehammer-fashion, though. In the particular case of the pmap stuff, parts of the atomic accesses to it are implemented in swtch.S. Currently, these are in sync with the code generated when using atomic(9) in C. Also, modulo r251782, the model employed as a whole, i. e. within C and assembler code, for accessing these pmap bits matches as far as acquire and release semantics are concerned. I really want to preserve this determinism and not make the actual code and semantics generated for the C half of it subject to the compiler of the day. So, in order to go forward and if atomic_{load,store}() conflicts with , what I like to do is to introduce an MD atomic_store_acq_ptr() and just live with that. Marius