From owner-freebsd-hackers@FreeBSD.ORG Mon Oct 5 13:53:42 2009 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C0F881065696; Mon, 5 Oct 2009 13:53:42 +0000 (UTC) (envelope-from jhb@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 68B088FC13; Mon, 5 Oct 2009 13:53:42 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 1365346B39; Mon, 5 Oct 2009 09:53:42 -0400 (EDT) Received: from John-Baldwins-Macbook-Pro.local (localhost [IPv6:::1]) by bigwig.baldwin.cx (Postfix) with ESMTPA id 7C5638A021; Mon, 5 Oct 2009 09:53:40 -0400 (EDT) Message-ID: <4AC9FA63.9020606@FreeBSD.org> Date: Mon, 05 Oct 2009 09:53:39 -0400 From: John Baldwin User-Agent: Thunderbird 2.0.0.23 (Macintosh/20090812) MIME-Version: 1.0 To: Attilio Rao References: <20090924224935.GW473@gandalf.sssup.it> <200909291953.36373.max@love2party.net> <3bbf2fe10909291342o4d32e381ge23e446582bb2d18@mail.gmail.com> <200909291731.32394.jhb@freebsd.org> <3bbf2fe10910031345n5ea54f40i1ca4dd6a1d42ff13@mail.gmail.com> In-Reply-To: <3bbf2fe10910031345n5ea54f40i1ca4dd6a1d42ff13@mail.gmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Mon, 05 Oct 2009 09:53:41 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.6 required=4.2 tests=BAYES_00,NO_RELAYS autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: Max Laier , Fabio Checconi , freebsd-hackers@freebsd.org Subject: Re: sx locks and memory barriers X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 05 Oct 2009 13:53:42 -0000 Attilio Rao wrote: > 2009/9/29 John Baldwin : >> On Tuesday 29 September 2009 4:42:13 pm Attilio Rao wrote: >>> 2009/9/29 Max Laier : >>>> On Tuesday 29 September 2009 17:39:37 Attilio Rao wrote: >>>>> 2009/9/25 Fabio Checconi : >>>>>> Hi all, >>>>>> looking at sys/sx.h I have some troubles understanding this comment: >>>>>> >>>>>> * A note about memory barriers. Exclusive locks need to use the same >>>>>> * memory barriers as mutexes: _acq when acquiring an exclusive lock >>>>>> * and _rel when releasing an exclusive lock. On the other side, >>>>>> * shared lock needs to use an _acq barrier when acquiring the lock >>>>>> * but, since they don't update any locked data, no memory barrier is >>>>>> * needed when releasing a shared lock. >>>>>> >>>>>> In particular, I'm not understanding what prevents the following sequence >>>>>> from happening: >>>>>> >>>>>> CPU A CPU B >>>>>> >>>>>> sx_slock(&data->lock); >>>>>> >>>>>> sx_sunlock(&data->lock); >>>>>> >>>>>> /* reordered after the unlock >>>>>> by the cpu */ >>>>>> if (data->buffer) >>>>>> sx_xlock(&data->lock); >>>>>> free(data->buffer); >>>>>> data->buffer = NULL; >>>>>> sx_xunlock(&data->lock); >>>>>> >>>>>> a = *data->buffer; >>>>>> >>>>>> IOW, even if readers do not modify the data protected by the lock, >>>>>> without a release barrier a memory access may leak past the unlock (as >>>>>> the cpu won't notice any dependency between the unlock and the fetch, >>>>>> feeling free to reorder them), thus potentially racing with an exclusive >>>>>> writer accessing the data. >>>>>> >>>>>> On architectures where atomic ops serialize memory accesses this would >>>>>> never happen, otherwise the sequence above seems possible; am I missing >>>>>> something? >>>>> I think your concerns are right, possibly we need this patch: >>>>> http://www.freebsd.org/~attilio/sxrw_unlockb.diff >>>>> >>>>> However speaking with John we agreed possibly there is a more serious >>>>> breakage. Possibly, memory barriers would also require to ensure the >>>>> compiler to not reorder the operation, while right now, in FreeBSD, they >>>>> just take care of the reordering from the architecture perspective. >>>>> The only way I'm aware of GCC offers that is to clobber memory. >>>>> I will provide a patch that address this soon, hoping that GCC will be >>>>> smart enough to not overhead too much the memory clobbering but just >>>>> try to understand what's our purpose and servers it (I will try to >>>>> compare code generated before and after the patch at least for tier-1 >>>>> architectures). >>>> Does GCC really reorder accesses to volatile objects? The C Standard seems to >>>> object: >>>> >>>> 5.1.2.3 - 2 >>>> Accessing a volatile object, modifying an object, modifying a file, or calling >>>> a function that does any of those operations are all side effects,11) which >>>> are changes in the state of the execution environment. Evaluation of an >>>> expression may produce side effects. At certain specified points in the >>>> execution sequence called sequence points, all side effects of previous >>>> evaluations shall be complete and no side effects of subsequent evaluations >>>> shall have taken place. (A summary of the sequence points is given in annex >>>> C.) >>> Very interesting. >>> I was thinking about the other operating systems which basically do >>> 'memory clobbering' for ensuring a compiler barrier, but actually they >>> often forsee such a barrier without the conjuction of a memory >>> operand. >>> >>> I think I will need to speak a bit with a GCC engineer in order to see >>> what do they implement in regard of volatile operands. >> GCC can be quite aggressive with reordering even in the face of volatile. I >> was recently doing a hack to export some data from the kernel to userland >> that used a spin loop to grab a snapshot of the contents of a structure >> similar to the method used in the kernel with the timehands structures. It >> used a volatile structure exposed from the kernel that looked something >> like: >> >> struct foo { >> volatile int gen; >> /* other stuff */ >> }; >> >> volatile struct foo *p; >> >> do { >> x = p->gen; >> /* read other stuff */ >> y = p->gen; >> } while (x != y && x != 0); >> >> GCC moved the 'y = ' up into the middle of the '/* read other stuff */'. >> I eventually had to add explicit "memory" clobbers to force GCC to not >> move the reads of 'gen' around but do them "around" all the other >> operations, so that the working code is: >> >> do { >> x = p->gen; >> asm volatile("" ::: "memory"); >> /* read other stuff */ >> asm volatile("" ::: "memory"); >> y = p->gen; >> } while (x != y && x != 0); > > The situation was not so desperate as I first thought. > Actually, only ia32 and amd64 seems affected by the missing of memory > clobbering because it is already done for all the other platform when > using all the memory barriers. > On ia32 and amd64 too, the impact is more limited than what I > expected. atomic_cmpset_* already clobbers the memory and only > atomic_store_rel_* is not adeguately protected among the atomics used > in our locking primitives, thus I would really expect a limited > performance penalty if any. > What was not really protected where the functions defined through > ATOMIC_ASM() and that was the larger part to fix. > > I spoke briefly about the compiler support with Christian Lattner > (@Apple, LLVM leader) and he mentioned he was aware of the aggressive > behaviour of GCC pointing me in the direction that what the C Standard > really mandates is that read/write operations with non-volatile > operands can be reordered across a volatile operand but that the > read/write of volatile operands are strong ordered in regard of > eachother. This however means that we have to fix the 'memory clobber' > for GCC-simil compilers and also offer a support for the other that > let them specify a memory barrier. > I then wrote this patch: > http://www.freebsd.org/~attilio/atomic.h.diff > > That should address all the concern raised and also forsee the > possibility for other compiler to specify memory barriers semantics > differently from normal atomic. > > rdivacky@ kindly already tested the patch on LLVM/CLANG reporting no problems. > > I still didn't compare pickly the produced binaries, but I see a > little increase in the binary sizes probabilly caming from the .text > growing. This looks good to me. One thing that is missing is that the UP atomic load/store ops still need "memory" clobbers to prevent the compiler from reordering things (you could still have bad juju if you preempted in between the atomic op and the compiler-reordered operation on UP). -- John Baldwin