Date: Tue, 18 Sep 2012 08:32:50 -0400 From: John Baldwin <jhb@freebsd.org> To: David Chisnall <theraven@freebsd.org> Cc: Davide Italiano <davide@freebsd.org>, src-committers@freebsd.org, Jeff Roberson <jeff@freebsd.org>, Dimitry Andric <dim@freebsd.org>, attilio@freebsd.org, svn-src-projects@freebsd.org, Konstantin Belousov <kostikbel@gmail.com> Subject: Re: svn commit: r238907 - projects/calloutng/sys/kern Message-ID: <201209180832.50694.jhb@freebsd.org> In-Reply-To: <EAB6CB55-C1A4-4099-AD86-6FDE3593A867@FreeBSD.org> References: <201207301350.q6UDobCI099069@svn.freebsd.org> <505849DB.3090704@FreeBSD.org> <EAB6CB55-C1A4-4099-AD86-6FDE3593A867@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tuesday, September 18, 2012 6:29:31 am David Chisnall wrote: > On 18 Sep 2012, at 11:15, Dimitry Andric wrote: > > > Please use gcc's __sync_synchronize() builtin[1] instead, which is > > specifically for this purpose. Clang also supports it. > > > > The builtin will emit actual memory barrier instructions, if the target > > architecture supports it, otherwise it will emit the same asm statement > > you show above. See contrib/gcc/builtins.c, around line 5584, function > > expand_builtin_synchronize(). > > From Attilio's description of the problem in IRC, I believe that atomic_signal_fence() is the correct thing to use here. He stated that he cares about reordering of memory access with regard to the current CPU, but not with regard to other CPUs / threads. He also said that he only cares about the compiler performing the reordering, not about the CPU, but I suspect that is incorrect as there are numerous subtle bugs that can creep in on weakly-ordered architectures (e.g. Alpha, ARMv8) if you only have a compiler barrier. Not true. Barriers only affect the order that writes are posted to external viewers (e.g. other processors and devices that perform DMA). On a single CPU barriers are completely meaningless. The types of barriers Attilio is worried about are things like RAW and WAR hazards. CPUs generally handle these things internally (except for ia64 and it's stop bits on bundles, but the compiler is required to insert those to ensure that things still complete in "program order"). Specifically, CPUs will only reorder instructions in a way that does not violate a RAW or WAR hazard. Compilers have similar constraints (they can move constants out of a loop because it is not a WAR). Given that I think your last comment is largely bollocks. :) > That said, this is likely to be incorrect, because it's very unusual for that to actually be the requirement, especially in multithreaded code (where the atomic.h stuff is actually important). In most of the cases where __compiler_membar() is being used, you actually want at least a partial barrier. The specific cases where Attilio wants to use a pure compiler barrier without a full atomic op (that will include appropriate barriers already) are attempting to force the compiler to safely order actions that are sensitive to preemption (e.g. ensuring that td_pinned and td_critnest are properly set before a critical section is entered so that any interrupt that occurs is guaranteed to "see" the result of sched_pin() or critical_enter() before any protected accesses are performed, and similarly to ensure that any protected accesses are completed before the weak "lock" is released via sched_unpin() or critical_exit()). You can think of these as WAR or RAW hazards that the compiler simply has no way of knowing about (and can't). However, assuming the compiler is correct, there are no WAR or RAW hazards that are not visible to the CPU. This is actually very similar to signal handling in userland (signals are basically interrupts for userland), so it may be that atomic_signal_fence() is in fact be correct. -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201209180832.50694.jhb>