Date: Wed, 13 Oct 2010 20:21:44 +0400 From: Gennady Proskurin <gprspb@mail.ru> To: John Baldwin <jhb@freebsd.org> Cc: Gennady Proskurin <gprspb@mail.ru>, src-committers@freebsd.org, svn-src-all@freebsd.org, Andriy Gapon <avg@freebsd.org>, Bruce Evans <brde@optusnet.com.au>, svn-src-head@freebsd.org Subject: Re: svn commit: r213648 - head/sys/kern Message-ID: <20101013162144.GA5254@gpr.nnz-home.ru> In-Reply-To: <201010130905.04784.jhb@freebsd.org> References: <201010090807.o9987nG0030939@svn.freebsd.org> <4CB03A82.6040701@freebsd.org> <20101012210804.GA4286@gpr.nnz-home.ru> <201010130905.04784.jhb@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Thank you and Bruce for explanation. The key point here (which I did not understand) is that "something == PCPU_GET(cpuid))" may be true only if "something" is set by the current cpu, in which case its value is not stale. On Wed, Oct 13, 2010 at 09:05:04AM -0400, John Baldwin wrote: > On Tuesday, October 12, 2010 5:08:04 pm Gennady Proskurin wrote: > > On Sat, Oct 09, 2010 at 12:48:50PM +0300, Andriy Gapon wrote: > > > on 09/10/2010 12:33 Bruce Evans said the following: > > > > On Sat, 9 Oct 2010, Andriy Gapon wrote: > > > > > > > >> Log: > > > >> panic_cpu variable should be volatile > > > >> > > > >> This is to prevent caching of its value in a register when it is checked > > > >> and modified by multiple CPUs in parallel. > > > >> Also, move the variable into the scope of the only function that uses it. > > > >> > > > >> Reviewed by: jhb > > > >> Hint from: mdf > > > >> MFC after: 1 week > > > > > > > > I doubt that this is either necessary or sufficient. Most variables > > > > aren't volatile while they are locked by a mutex. But panic() cannot use > > > > normal mutex locking, so it should access this variable using atomic > > > > ops with memory barriers. The compiler part of the memory barriers > > > > effectively make _all_ variables temporily volatile. However, 2 of the > > > > the 4 accesses to this variable doesn't use an atomic op. > > > > > > > > % #ifdef SMP > > > > % /* > > > > % * We don't want multiple CPU's to panic at the same time, so we > > > > % * use panic_cpu as a simple spinlock. We have to keep checking > > > > % * panic_cpu if we are spinning in case the panic on the first > > > > % * CPU is canceled. > > > > % */ > > > > % if (panic_cpu != PCPU_GET(cpuid)) > > > > > > > > This access doesn't use an atomic op. > > > > > > > > % while (atomic_cmpset_int(&panic_cpu, NOCPU, > > > > % PCPU_GET(cpuid)) == 0) > > > > > > > > This access uses an atomic op. Not all atomic ops use memory barriers, > > > > at least on i386. However, this one does, at least on i386. > > > > > > > > (I'm always confused about what atomic_any() without acq or release > > > > means. Do they mean that you don't want a memory barrier (this is > > > > what the mostly give, at least on i386), and if so, what use are > > > > they? There are far too many atomic ops, for far too many never-used > > > > widths, with alternative spellings to encourage using a wrong one. > > > > cmpset is is especially confusing since it you can spell it as cmpset, > > > > cmpset_acq or compset_rel and always get the barrier, at least on > > > > i386. At least cmpset only supports 1 width, at least on i386.) > > > > > > > > % while (panic_cpu != NOCPU) > > > > > > > > This access doesn't use an atomic op. > > > > > > > > % ; /* nothing */ > > > > % #endif > > > > % ... > > > > % #ifdef RESTARTABLE_PANICS > > > > % /* See if the user aborted the panic, in which case we continue. */ > > > > % if (panicstr == NULL) { > > > > % #ifdef SMP > > > > % atomic_store_rel_int(&panic_cpu, NOCPU); > > > > > > > > This access uses an atomic op with a memory barrier, at least on i386. > > > > Now its rel semantics are clear. > > > > > > > > panicstr is non-volatile and never accessed by an atomic op, so it probably > > > > strictly needs to be declared volatile even more than panic_cpu. I think > > > > > > I agree about panicstr. > > > But I am not sure if we have places in code (beyond panic function) itself where > > > volatile vs. non-volatile would make any actual difference. > > > But still. > > > > > > > the only thing that makes it work now is that it is bogusly pubic, and > > > > compilers can't analyze the whole program yet -- if they could, then they > > > > would see that it is only set in panic(). > > > > > > > > % #endif > > > > % return; > > > > % } > > > > % #endif > > > > % #endif > > > > > > > > Now, why don't the partial memory barriers prevent caching the variable? > > > > > > > > % if (panic_cpu != PCPU_GET(cpuid)) > > > > % while (atomic_cmpset_int(&panic_cpu, NOCPU, > > > > % PCPU_GET(cpuid)) == 0) > > > > % while (panic_cpu != NOCPU) > > > > % ; /* nothing */ > > > > > > > > The very first access can't reasonably use a cachec value. atomic_cmpset() > > > > can change panic_cpu, so even without the memory barrier, panic_cpu must > > > > be reloaded for the third access the first time. But then when the third > > > > access is repeated in the second while loop, the missing atomic op with > > > > barrier makes things very broken. panic_cpu isn't changed by the loop, > > > > and the compiler thinks that it isn't changed by anything else either, so > > > > the compiler may reduce the loop to: > > > > > > > > % if (panic_cpu != NOCPU) > > > > % for (;;) > > > > % ; /* nothing */ > > > > > > > > > Yes, it's exactly the last loop that had the problem. > > > On amd64 with -O2: > > > .loc 1 544 0 > > > movl panic_cpu(%rip), %eax > > > .LVL134: > > > .p2align 4,,7 > > > .L210: > > > cmpl $255, %eax > > > jne .L210 > > > jmp .L225 > > > > > > > except I've seen claims that even an endless for loop can be optimized > > > > to nothing. Declaring panic_cpu as volatile prevents the compiler doing > > > > this, but I think it is insufficient. We really do want to see panic_cpu > > > > changed by other CPUs, and what is the point of atomic_load_acq*() if not > > > > to use for this -- if declaring things volatile were sufficient, then we > > > > could just use *(volatile *)&var. atomic_load/store_acq/rel*() on i386 > > > > > > I discussed this with kib and the idea is that atomic operation is not needed in > > > that place and volatile is sufficient, because we don't need barrier semantics > > > there and only want to ensure that variable is reloaded from memory. > > > > If I understand correctly, without acquiring barrier, variable is not > > guaranteed to be loaded from memory, it can end up with some stale value from > > cpu's cache or pipeline. If incorrect value will be checked in the first "if" > > operator, it can lead to skiping all the atomic synchronization. > > The same about writing to variable, without barrier new value may be written to > > some local cpu cache and not be seen by readers until it is flushed from cache. > > No, a barrier does _not_ force any cache flushes. The point of the volatile > is to serve as a compiler barrier. A memory barrier only enforces a ordering > in memory operations in the CPU, it does not flush any caches or force a CPU > to post writes to memory. It is weaker than that. :) For example, the > sparcv9 manuals explicitly state something along the lines that any write may > be held in the CPU's store buffer for an unspecified amount of time. > Barriers do not alter that, nor would it really be useful for them to do so. > What barriers allow you to do is order the operations on a lock cookie (such > as mtx_lock in mutexes) with respect to operations on other memory locations > (e.g. the data a lock protects). By using a specific set of barriers and > protocols for accessing the lock cookie, you can ensure that the protected > data is not accessed without holding the lock. However, for a standalone > word, memory barriers do not buy you anything. And in fact, if one CPU has > written to a variable and that write is still sitting in the store buffer, > 'atomic_load_acq' on another CPU may still return a stale value. All that > FreeBSD requires is that 'atomic_cmpset' will not report success using stale > data. It can either fail or block waiting for the write in a store buffer in > another CPU to drain. We typically handle the 'fail' case by continuing to > loop until we observe a new state or succesfully perform 'atomic_cmpset' > (for an example, see setting of MTX_CONTESTED in mtx_lock). > > Currently we do not have 'atomic_load()' and 'atomic_store()' variants without > memory barriers since 'x = y' is atomic (in that it is performed as a single > operation). However, there are cases where 'x = y' needs a compiler memory > barrier (but not a CPU one). (e.g. if 'y' can be updated by a signal handler > in userland, or an interrupt in the kernel.) That is what 'volatile' is for. > > -- > John Baldwin > _______________________________________________ > svn-src-head@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/svn-src-head > To unsubscribe, send any mail to "svn-src-head-unsubscribe@freebsd.org" >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20101013162144.GA5254>