From owner-freebsd-hackers@FreeBSD.ORG  Mon Oct  5 13:53:42 2009
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C0F881065696;
	Mon,  5 Oct 2009 13:53:42 +0000 (UTC) (envelope-from jhb@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 68B088FC13;
	Mon,  5 Oct 2009 13:53:42 +0000 (UTC)
Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net
	[66.111.2.69])
	by cyrus.watson.org (Postfix) with ESMTPSA id 1365346B39;
	Mon,  5 Oct 2009 09:53:42 -0400 (EDT)
Received: from John-Baldwins-Macbook-Pro.local (localhost [IPv6:::1])
	by bigwig.baldwin.cx (Postfix) with ESMTPA id 7C5638A021;
	Mon,  5 Oct 2009 09:53:40 -0400 (EDT)
Message-ID: <4AC9FA63.9020606@FreeBSD.org>
Date: Mon, 05 Oct 2009 09:53:39 -0400
From: John Baldwin <jhb@FreeBSD.org>
User-Agent: Thunderbird 2.0.0.23 (Macintosh/20090812)
MIME-Version: 1.0
To: Attilio Rao <attilio@freebsd.org>
References: <20090924224935.GW473@gandalf.sssup.it>	
	<200909291953.36373.max@love2party.net>	
	<3bbf2fe10909291342o4d32e381ge23e446582bb2d18@mail.gmail.com>	
	<200909291731.32394.jhb@freebsd.org>
	<3bbf2fe10910031345n5ea54f40i1ca4dd6a1d42ff13@mail.gmail.com>
In-Reply-To: <3bbf2fe10910031345n5ea54f40i1ca4dd6a1d42ff13@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1
	(bigwig.baldwin.cx); Mon, 05 Oct 2009 09:53:41 -0400 (EDT)
X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx
X-Virus-Status: Clean
X-Spam-Status: No, score=-2.6 required=4.2 tests=BAYES_00,NO_RELAYS
	autolearn=ham version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx
Cc: Max Laier <max@love2party.net>, Fabio Checconi <fabio@freebsd.org>,
	freebsd-hackers@freebsd.org
Subject: Re: sx locks and memory barriers
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 05 Oct 2009 13:53:42 -0000

Attilio Rao wrote:
> 2009/9/29 John Baldwin <jhb@freebsd.org>:
>> On Tuesday 29 September 2009 4:42:13 pm Attilio Rao wrote:
>>> 2009/9/29 Max Laier <max@love2party.net>:
>>>> On Tuesday 29 September 2009 17:39:37 Attilio Rao wrote:
>>>>> 2009/9/25 Fabio Checconi <fabio@freebsd.org>:
>>>>>> Hi all,
>>>>>>  looking at sys/sx.h I have some troubles understanding this comment:
>>>>>>
>>>>>>  * A note about memory barriers.  Exclusive locks need to use the same
>>>>>>  * memory barriers as mutexes: _acq when acquiring an exclusive lock
>>>>>>  * and _rel when releasing an exclusive lock.  On the other side,
>>>>>>  * shared lock needs to use an _acq barrier when acquiring the lock
>>>>>>  * but, since they don't update any locked data, no memory barrier is
>>>>>>  * needed when releasing a shared lock.
>>>>>>
>>>>>> In particular, I'm not understanding what prevents the following sequence
>>>>>> from happening:
>>>>>>
>>>>>> CPU A                                   CPU B
>>>>>>
>>>>>> sx_slock(&data->lock);
>>>>>>
>>>>>> sx_sunlock(&data->lock);
>>>>>>
>>>>>> /* reordered after the unlock
>>>>>>   by the cpu */
>>>>>> if (data->buffer)
>>>>>>                                        sx_xlock(&data->lock);
>>>>>>                                        free(data->buffer);
>>>>>>                                        data->buffer = NULL;
>>>>>>                                        sx_xunlock(&data->lock);
>>>>>>
>>>>>>        a = *data->buffer;
>>>>>>
>>>>>> IOW, even if readers do not modify the data protected by the lock,
>>>>>> without a release barrier a memory access may leak past the unlock (as
>>>>>> the cpu won't notice any dependency between the unlock and the fetch,
>>>>>> feeling free to reorder them), thus potentially racing with an exclusive
>>>>>> writer accessing the data.
>>>>>>
>>>>>> On architectures where atomic ops serialize memory accesses this would
>>>>>> never happen, otherwise the sequence above seems possible; am I missing
>>>>>> something?
>>>>> I think your concerns are right, possibly we need this patch:
>>>>> http://www.freebsd.org/~attilio/sxrw_unlockb.diff
>>>>>
>>>>> However speaking with John we agreed possibly there is a more serious
>>>>>  breakage. Possibly, memory barriers would also require to ensure the
>>>>>  compiler to not reorder the operation, while right now, in FreeBSD, they
>>>>>  just take care of the reordering from the architecture perspective.
>>>>> The only way I'm aware of GCC offers that is to clobber memory.
>>>>> I will provide a patch that address this soon, hoping that GCC will be
>>>>> smart enough to not overhead too much the memory clobbering but just
>>>>> try to understand what's our purpose and servers it (I will try to
>>>>> compare code generated before and after the patch at least for tier-1
>>>>> architectures).
>>>> Does GCC really reorder accesses to volatile objects? The C Standard seems to
>>>> object:
>>>>
>>>> 5.1.2.3 - 2
>>>> Accessing a volatile object, modifying an object, modifying a file, or calling
>>>> a function that does any of those operations are all side effects,11) which
>>>> are changes in the state of the execution environment. Evaluation of an
>>>> expression may produce side effects. At certain specified points in the
>>>> execution sequence called sequence points, all side effects of previous
>>>> evaluations shall be complete and no side effects of subsequent evaluations
>>>> shall have taken place. (A summary of the sequence points is given in annex
>>>> C.)
>>> Very interesting.
>>> I was thinking about the other operating systems which basically do
>>> 'memory clobbering' for ensuring a compiler barrier, but actually they
>>> often forsee such a barrier without the conjuction of a memory
>>> operand.
>>>
>>> I think I will need to speak a bit with a GCC engineer in order to see
>>> what do they implement in regard of volatile operands.
>> GCC can be quite aggressive with reordering even in the face of volatile.  I
>> was recently doing a hack to export some data from the kernel to userland
>> that used a spin loop to grab a snapshot of the contents of a structure
>> similar to the method used in the kernel with the timehands structures.  It
>> used a volatile structure exposed from the kernel that looked something
>> like:
>>
>> struct foo {
>>        volatile int gen;
>>        /* other stuff */
>> };
>>
>> volatile struct foo *p;
>>
>> do {
>>        x = p->gen;
>>        /* read other stuff */
>>        y = p->gen;
>> } while (x != y && x != 0);
>>
>> GCC moved the 'y = ' up into the middle of the '/* read other stuff */'.
>> I eventually had to add explicit "memory" clobbers to force GCC to not
>> move the reads of 'gen' around but do them "around" all the other
>> operations, so that the working code is:
>>
>> do {
>>        x = p->gen;
>>        asm volatile("" ::: "memory");
>>        /* read other stuff */
>>        asm volatile("" ::: "memory");
>>        y = p->gen;
>> } while (x != y && x != 0);
> 
> The situation was not so desperate as I first thought.
> Actually, only ia32 and amd64 seems affected by the missing of memory
> clobbering because it is already done for all the other platform when
> using all the memory barriers.
> On ia32 and amd64 too, the impact is more limited than what I
> expected. atomic_cmpset_* already clobbers the memory and only
> atomic_store_rel_* is not adeguately protected among the atomics used
> in our locking primitives, thus I would really expect a limited
> performance penalty if any.
> What was not really protected where the functions defined through
> ATOMIC_ASM() and that was the larger part to fix.
> 
> I spoke briefly about the compiler support with Christian Lattner
> (@Apple, LLVM leader) and he mentioned he was aware of the aggressive
> behaviour of GCC pointing me in the direction that what the C Standard
> really mandates is that read/write operations with non-volatile
> operands can be reordered across a volatile operand but that the
> read/write of volatile operands are strong ordered in regard of
> eachother. This however means that we have to fix the 'memory clobber'
> for GCC-simil compilers and also offer a support for the other that
> let them specify a memory barrier.
> I then wrote this patch:
> http://www.freebsd.org/~attilio/atomic.h.diff
> 
> That should address all the concern raised and also forsee the
> possibility for other compiler to specify memory barriers semantics
> differently from normal atomic.
> 
> rdivacky@ kindly already tested the patch on LLVM/CLANG reporting no problems.
> 
> I still didn't compare pickly the produced binaries, but I see a
> little increase in the binary sizes probabilly caming from the .text
> growing.

This looks good to me.  One thing that is missing is that the UP atomic 
load/store ops still need "memory" clobbers to prevent the compiler from
reordering things (you could still have bad juju if you preempted in 
between the atomic op and the compiler-reordered operation on UP).

-- 
John Baldwin