Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 18 Sep 2012 16:39:55 +0100
From:      Attilio Rao <attilio@freebsd.org>
To:        John Baldwin <jhb@freebsd.org>
Cc:        Davide Italiano <davide@freebsd.org>, src-committers@freebsd.org, David Chisnall <theraven@freebsd.org>, Jeff Roberson <jeff@freebsd.org>, Dimitry Andric <dim@freebsd.org>, svn-src-projects@freebsd.org, Konstantin Belousov <kostikbel@gmail.com>
Subject:   Re: svn commit: r238907 - projects/calloutng/sys/kern
Message-ID:  <CAJ-FndAAi4q86GSa7UctT1koRWHpDxkZbEL-kxop7KkMxFxFiw@mail.gmail.com>
In-Reply-To: <201209180832.50694.jhb@freebsd.org>
References:  <201207301350.q6UDobCI099069@svn.freebsd.org> <505849DB.3090704@FreeBSD.org> <EAB6CB55-C1A4-4099-AD86-6FDE3593A867@FreeBSD.org> <201209180832.50694.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 9/18/12, John Baldwin <jhb@freebsd.org> wrote:
> On Tuesday, September 18, 2012 6:29:31 am David Chisnall wrote:
>> On 18 Sep 2012, at 11:15, Dimitry Andric wrote:
>>
>> > Please use gcc's __sync_synchronize() builtin[1] instead, which is
>> > specifically for this purpose.  Clang also supports it.
>> >
>> > The builtin will emit actual memory barrier instructions, if the target
>> > architecture supports it, otherwise it will emit the same asm statement
>> > you show above.  See contrib/gcc/builtins.c, around line 5584, function
>> > expand_builtin_synchronize().
>>
>> From Attilio's description of the problem in IRC, I believe that
> atomic_signal_fence() is the correct thing to use here.  He stated that he
> cares about reordering of memory access with regard to the current CPU, but
>
> not with regard to other CPUs / threads.  He also said that he only cares
> about the compiler performing the reordering, not about the CPU, but I
> suspect
> that is incorrect as there are numerous subtle bugs that can creep in on
> weakly-ordered architectures (e.g. Alpha, ARMv8) if you only have a compiler
>
> barrier.
>
> Not true.  Barriers only affect the order that writes are posted to
> external
> viewers (e.g. other processors and devices that perform DMA).  On a single
> CPU barriers are completely meaningless.  The types of barriers Attilio is
> worried about are things like RAW and WAR hazards.  CPUs generally handle
> these things internally (except for ia64 and it's stop bits on bundles, but
> the compiler is required to insert those to ensure that things still
> complete
> in "program order").
>
> Specifically, CPUs will only reorder instructions in a way that does not
> violate a RAW or WAR hazard.  Compilers have similar constraints (they can
> move constants out of a loop because it is not a WAR).  Given that I think
> your last comment is largely bollocks. :)
>
>> That said, this is likely to be incorrect, because it's very unusual for
> that to actually be the requirement, especially in multithreaded code (where
>
> the atomic.h stuff is actually important).  In most of the cases where
> __compiler_membar() is being used, you actually want at least a partial
> barrier.
>
> The specific cases where Attilio wants to use a pure compiler barrier
> without a full atomic op (that will include appropriate barriers already)
> are attempting to force the compiler to safely order actions that are
> sensitive to preemption (e.g. ensuring that td_pinned and td_critnest are
> properly set before a critical section is entered so that any interrupt
> that occurs is guaranteed to "see" the result of sched_pin() or
> critical_enter() before any protected accesses are performed, and similarly
> to ensure that any protected accesses are completed before the weak "lock"
> is released via sched_unpin() or critical_exit()).  You can think of these
> as WAR or RAW hazards that the compiler simply has no way of knowing about
> (and can't).  However, assuming the compiler is correct, there are no
> WAR or RAW hazards that are not visible to the CPU.
>
> This is actually very similar to signal handling in userland (signals are
> basically interrupts for userland), so it may be that atomic_signal_fence()
> is
> in fact be correct.

Per standard description, atomic_signal_fence() seems being very
similar to hw memory barrier semantic rather than compiler memory
barrier. Also, the standard specifically mention signal handler,
besides our interpretation of it being applicable to interrupts in
kernel context too.

In general one may argue we use stdatomic.h rather than atomic.h but I
completely agree with kib@ on his later e-mail on the topic.

In definitive, I think that the patch I proposed is still fully valid,
modulo the fallback.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-FndAAi4q86GSa7UctT1koRWHpDxkZbEL-kxop7KkMxFxFiw>