Date: Tue, 17 May 2011 17:34:41 +0300 From: Andriy Gapon <avg@FreeBSD.org> To: John Baldwin <jhb@FreeBSD.org> Cc: Max Laier <max@love2party.net>, FreeBSD current <freebsd-current@FreeBSD.org>, neel@FreeBSD.org, Peter Grehan <grehan@FreeBSD.org> Subject: Re: proposed smp_rendezvous change Message-ID: <4DD28781.6050002@FreeBSD.org> In-Reply-To: <201105170958.16847.jhb@freebsd.org> References: <4DCD357D.6000109@FreeBSD.org> <4DD26256.2070008@FreeBSD.org> <4DD27C3A.3040509@FreeBSD.org> <201105170958.16847.jhb@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
on 17/05/2011 16:58 John Baldwin said the following: > No, it doesn't quite work that way. It wouldn't work on Alpha for example. > > All load_acq is a load with a memory barrier to order other loads after it. > It is still free to load stale data. Only a read-modify-write operation > would actually block until it could access an up-to-date value. Hmm, ok. How about atomic_add_acq_int(&smp_rv_waiters[0], 0) ? :-) Or an equivalent MI action that doesn't actually change smp_rv_waiters[0] value, if there could be any. Maybe explicit atomic_cmpset_acq_int(&smp_rv_waiters[0], 0, 0) ? You see at what I am getting? >>> The key being that atomic_add_acq_int() will block (either in hardware or >>> software) until it can safely perform the atomic operation. That means waiting >>> until the write to set smp_rv_waiters[0] to 0 by the rendezvous initiator is >>> visible to the current CPU. >>> >>> On some platforms a write by one CPU may not post instantly to other CPUs (e.g. it >>> may sit in a store buffer). That is fine so long as an attempt to update that >>> value atomically (using cas or a conditional-store, etc.) fails. For those >>> platforms, the atomic(9) API is required to spin until it succeeds. >>> >>> This is why the mtx code spins if it can't set MTX_CONTESTED for example. >>> >> >> Thank you for the great explanation! >> Taking sparc64 as an example, I think that atomic_load_acq uses a degenerate cas >> call, which should take care of hardware synchronization. > > sparc64's load_acq() is stronger than the MI effect of load_acq(). On ia64 Oh, well, my expectation was that MI effect of atomic_load (emphasis on atomic_) was to get a non-stale value. > which uses ld.acq or Alpha (originally) which uses a membar and simple load, > the guarantees are only what I stated above (and would not be sufficient). > > Note that Alpha borrowed heavily from MIPS, and the MIPS atomic implementation > is mostly identical to the old Alpha one (using conditional stores, etc.). > > The MIPS atomic_load_acq(): > > #define ATOMIC_STORE_LOAD(WIDTH) \ > static __inline uint##WIDTH##_t \ > atomic_load_acq_##WIDTH(__volatile uint##WIDTH##_t *p) \ > { \ > uint##WIDTH##_t v; \ > \ > v = *p; \ > mips_sync(); \ > return (v); \ > } \ I should have checked this myself. Thank you for patiently explaining these things to me. -- Andriy Gapon
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4DD28781.6050002>