Date: Sat, 4 Oct 2014 10:11:08 +0200 From: Mateusz Guzik <mjguzik@gmail.com> To: Konstantin Belousov <kostikbel@gmail.com> Cc: alc@freebsd.org, attilio@freebsd.org, Johan Schuijt <johan@transip.nl>, "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org> Subject: Re: [PATCH 1/2] Implement simple sequence counters with memory barriers. Message-ID: <20141004081108.GB17491@dft-labs.eu> In-Reply-To: <20141004074049.GA17491@dft-labs.eu> References: <1408064112-573-1-git-send-email-mjguzik@gmail.com> <1408064112-573-2-git-send-email-mjguzik@gmail.com> <20140816093811.GX2737@kib.kiev.ua> <20140816185406.GD2737@kib.kiev.ua> <20140817012646.GA21025@dft-labs.eu> <CAJUyCcPA7ZDNbwyfx3fT7mq3SE7M-mL5he=eXZ8bY3z-xUCJ-g@mail.gmail.com> <20141004052851.GA27891@dft-labs.eu> <20141004071139.GL26076@kib.kiev.ua> <20141004074049.GA17491@dft-labs.eu>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Oct 04, 2014 at 09:40:50AM +0200, Mateusz Guzik wrote:
> On Sat, Oct 04, 2014 at 10:11:39AM +0300, Konstantin Belousov wrote:
> > On Sat, Oct 04, 2014 at 07:28:51AM +0200, Mateusz Guzik wrote:
> > > Reviving. Sorry everyone for such big delay, $life.
> > >
> > > On Tue, Aug 19, 2014 at 02:24:16PM -0500, Alan Cox wrote:
> > > > On Sat, Aug 16, 2014 at 8:26 PM, Mateusz Guzik <mjguzik@gmail.com> wrote:
> > > > > Well, my memory-barrier-and-so-on-fu is rather weak.
> > > > >
> > > > > I had another look at the issue. At least on amd64, it looks like only
> > > > > compiler barrier is required for both reads and writes.
> > > > >
> > > > > According to AMD64 Architecture Programmer???s Manual Volume 2: System
> > > > > Programming, 7.2 Multiprocessor Memory Access Ordering states:
> > > > >
> > > > > "Loads do not pass previous loads (loads are not reordered). Stores do
> > > > > not pass previous stores (stores are not reordered)"
> > > > >
> > > > > Since the code modifying stuff only performs a series of writes and we
> > > > > expect exclusive writers, I find it applicable to this scenario.
> > > > >
> > > > > I checked linux sources and generated assembly, they indeed issue only
> > > > > a compiler barrier on amd64 (and for intel processors as well).
> > > > >
> > > > > atomic_store_rel_int on amd64 seems fine in this regard, but the only
> > > > > function for loads issues lock cmpxhchg which kills performance
> > > > > (median 55693659 -> 12789232 ops in a microbenchmark) for no gain.
> > > > >
> > > > > Additionally release and acquire semantics seems to be a stronger than
> > > > > needed guarantee.
> > > > >
> > > > >
> > > >
> > > > This statement left me puzzled and got me to look at our x86 atomic.h for
> > > > the first time in years. It appears that our implementation of
> > > > atomic_load_acq_int() on x86 is, umm ..., unconventional. That is, it is
> > > > enforcing a constraint that simple acquire loads don't normally enforce.
> > > > For example, the C11 stdatomic.h simple acquire load doesn't enforce this
> > > > constraint. Moreover, our own implementation of atomic_load_acq_int() on
> > > > ia64, where the mapping from atomic_load_acq_int() to machine instructions
> > > > is straightforward, doesn't enforce this constraint either.
> > > >
> > >
> > > By 'this constraint' I presume you mean full memory barrier.
> > >
> > > It is unclear to me if one can just get rid of it currently. It
> > > definitely would be beneficial.
> > >
> > > In the meantime, if for some reason full barrier is still needed, we can
> > > speed up concurrent load_acq of the same var considerably. There is no
> > > need to lock cmpxchg on the same address. We should be able to replace
> > > it with +/-:
> > > lock add $0,(%rsp);
> > > movl ...;
> > >
> > > I believe it is possible that cpu will perform some writes before doing
> > > read listed here, but this should be fine.
> > >
> > > If this is considered too risky to hit 10.1, I would like to implement
> > > it within seq as a temporary hack to be fixed up later.
> > >
> > > something along:
> > > static inline int
> > > atomic_load_acq_rmb(volatile u_int *p)
> > > {
> > > volaitle u_int *v;
> > >
> > > v = *p;
> > > atomic_load_acq(&v);
> > > return (v);
> > > }
> > Do you need it as designated primitive ? I think you could write this
> > inline for the purpose of getting the fix into 10.1.
> >
> > With the inline quirk, I think that the fix should go into the HEAD
> > now, with some reasonable MFC timer.
>
> Well, proposed seq.h is here:
> https://people.freebsd.org/~mjg/seq.h
>
> Note it uses atomic_add_{acq,rel}_int for lack of appropriate
> _store_acq. This is temporary as well.
>
> I also do realize atomic_load_acq_rmb_int sounds at least weird, but
> well...
>
> And here is the rest of the patch just in case:
> https://people.freebsd.org/~mjg/plug-caps-race.diff
>
After a short off-list discussion committed as
https://svnweb.freebsd.org/base?view=revision&revision=272503
and
https://svnweb.freebsd.org/base?view=revision&revision=272505
respectively.
--
Mateusz Guzik <mjguzik gmail.com>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20141004081108.GB17491>
