Date: Tue, 17 May 2011 16:46:34 +0300 From: Andriy Gapon <avg@FreeBSD.org> To: John Baldwin <jhb@FreeBSD.org> Cc: Max Laier <max@love2party.net>, FreeBSD current <freebsd-current@FreeBSD.org>, neel@FreeBSD.org, Peter Grehan <grehan@FreeBSD.org> Subject: Re: proposed smp_rendezvous change Message-ID: <4DD27C3A.3040509@FreeBSD.org> In-Reply-To: <4DD26256.2070008@FreeBSD.org> References: <4DCD357D.6000109@FreeBSD.org> <201105161421.27665.jhb@freebsd.org> <4DD17AB3.1070606@FreeBSD.org> <201105161609.21898.jhb@freebsd.org> <4DD22BD9.6070504@FreeBSD.org> <4DD26256.2070008@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
on 17/05/2011 14:56 John Baldwin said the following: > On 5/17/11 4:03 AM, Andriy Gapon wrote: >> Couldn't [Shouldn't] the whole: >> >>>>> /* Ensure we have up-to-date values. */ >>>>> atomic_add_acq_int(&smp_rv_waiters[0], 1); >>>>> while (smp_rv_waiters[0]< smp_rv_ncpus) >>>>> cpu_spinwait(); >> >> be just replaced with: >> >> rmb(); >> >> Or a proper MI function that does just a read memory barrier, if rmb() is not that. > > No, you could replace it with: > > atomic_add_acq_int(&smp_rv_waiters[0], 1); What about (void)atomic_load_acq(&smp_rv_waiters[0]); In my opinion that should ensure that the hardware must post the latest value from a master CPU to memory of smp_rv_waiters[0] and a slave CPU gets it from there. And also, because of memory barriers inserted by store_rel on the master CPU and load_acq on the slave CPU, the latest values of all other smp_rv_* fields should become visible to the slave CPU. > The key being that atomic_add_acq_int() will block (either in hardware or > software) until it can safely perform the atomic operation. That means waiting > until the write to set smp_rv_waiters[0] to 0 by the rendezvous initiator is > visible to the current CPU. > > On some platforms a write by one CPU may not post instantly to other CPUs (e.g. it > may sit in a store buffer). That is fine so long as an attempt to update that > value atomically (using cas or a conditional-store, etc.) fails. For those > platforms, the atomic(9) API is required to spin until it succeeds. > > This is why the mtx code spins if it can't set MTX_CONTESTED for example. > Thank you for the great explanation! Taking sparc64 as an example, I think that atomic_load_acq uses a degenerate cas call, which should take care of hardware synchronization. -- Andriy Gapon
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4DD27C3A.3040509>