Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 17 May 2011 17:34:41 +0300
From:      Andriy Gapon <avg@FreeBSD.org>
To:        John Baldwin <jhb@FreeBSD.org>
Cc:        Max Laier <max@love2party.net>, FreeBSD current <freebsd-current@FreeBSD.org>, neel@FreeBSD.org, Peter Grehan <grehan@FreeBSD.org>
Subject:   Re: proposed smp_rendezvous change
Message-ID:  <4DD28781.6050002@FreeBSD.org>
In-Reply-To: <201105170958.16847.jhb@freebsd.org>
References:  <4DCD357D.6000109@FreeBSD.org> <4DD26256.2070008@FreeBSD.org> <4DD27C3A.3040509@FreeBSD.org> <201105170958.16847.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
on 17/05/2011 16:58 John Baldwin said the following:
> No, it doesn't quite work that way.  It wouldn't work on Alpha for example.
> 
> All load_acq is a load with a memory barrier to order other loads after it.
> It is still free to load stale data.  Only a read-modify-write operation
> would actually block until it could access an up-to-date value.

Hmm, ok.
How about atomic_add_acq_int(&smp_rv_waiters[0], 0) ? :-)
Or an equivalent MI action that doesn't actually change smp_rv_waiters[0] value,
if there could be any.
Maybe explicit atomic_cmpset_acq_int(&smp_rv_waiters[0], 0, 0) ?

You see at what I am getting?

>>> The key being that atomic_add_acq_int() will block (either in hardware or
>>> software) until it can safely perform the atomic operation.  That means waiting
>>> until the write to set smp_rv_waiters[0] to 0 by the rendezvous initiator is
>>> visible to the current CPU.
>>>
>>> On some platforms a write by one CPU may not post instantly to other CPUs (e.g. it
>>> may sit in a store buffer).  That is fine so long as an attempt to update that
>>> value atomically (using cas or a conditional-store, etc.) fails.  For those
>>> platforms, the atomic(9) API is required to spin until it succeeds.
>>>
>>> This is why the mtx code spins if it can't set MTX_CONTESTED for example.
>>>
>>
>> Thank you for the great explanation!
>> Taking sparc64 as an example, I think that atomic_load_acq uses a degenerate cas
>> call, which should take care of hardware synchronization.
> 
> sparc64's load_acq() is stronger than the MI effect of load_acq().  On ia64

Oh, well, my expectation was that MI effect of atomic_load (emphasis on atomic_)
was to get a non-stale value.

> which uses ld.acq or Alpha (originally) which uses a membar and simple load,
> the guarantees are only what I stated above (and would not be sufficient).
> 
> Note that Alpha borrowed heavily from MIPS, and the MIPS atomic implementation
> is mostly identical to the old Alpha one (using conditional stores, etc.).
> 
> The MIPS atomic_load_acq():
> 
> #define ATOMIC_STORE_LOAD(WIDTH)                        \
> static __inline  uint##WIDTH##_t                        \
> atomic_load_acq_##WIDTH(__volatile uint##WIDTH##_t *p)  \
> {                                                       \
>         uint##WIDTH##_t v;                              \
>                                                         \
>         v = *p;                                         \
>         mips_sync();                                    \
>         return (v);                                     \
> }                                                       \

I should have checked this myself.
Thank you for patiently explaining these things to me.

-- 
Andriy Gapon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4DD28781.6050002>