Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 4 Jun 2012 21:57:46 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Attilio Rao <attilio@freebsd.org>
Cc:        svn-src-head@freebsd.org, Tijl Coosemans <tijl@freebsd.org>, src-committers@freebsd.org, svn-src-all@freebsd.org
Subject:   Re: svn commit: r236456 - in head/sys: amd64/include i386/include
Message-ID:  <20120604185746.GE85127@deviant.kiev.zoral.com.ua>
In-Reply-To: <CAJ-FndBb2tReaYqtpheSbiFc6bTEupSpyOTgsOb4pu6kmawjrA@mail.gmail.com>
References:  <201206021810.q52IAGZA004238@svn.freebsd.org> <4FCC873B.90104@freebsd.org> <20120604125050.GA85127@deviant.kiev.zoral.com.ua> <CAJ-FndDENiKT5iG0gCEbxdC45yOXEgH7P_=72SXPRdFRxB_DKw@mail.gmail.com> <20120604142749.GB85127@deviant.kiev.zoral.com.ua> <CAJ-FndBb2tReaYqtpheSbiFc6bTEupSpyOTgsOb4pu6kmawjrA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--VdOwlNaOFKGAtAAV
Content-Type: text/plain; charset=koi8-r
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Jun 04, 2012 at 04:59:22PM +0100, Attilio Rao wrote:
> 2012/6/4 Konstantin Belousov <kostikbel@gmail.com>:
> > On Mon, Jun 04, 2012 at 02:58:57PM +0100, Attilio Rao wrote:
> >> 2012/6/4 Konstantin Belousov <kostikbel@gmail.com>:
> >> > On Mon, Jun 04, 2012 at 12:00:27PM +0200, Tijl Coosemans wrote:
> >> >> On 02-06-2012 20:10, Konstantin Belousov wrote:
> >> >> > Author: kib
> >> >> > Date: Sat Jun =9A2 18:10:16 2012
> >> >> > New Revision: 236456
> >> >> > URL: http://svn.freebsd.org/changeset/base/236456
> >> >> >
> >> >> > Log:
> >> >> > =9A Use plain store for atomic_store_rel on x86, instead of impli=
citly
> >> >> > =9A locked xchg instruction. =9AIA32 memory model guarantees that=
 store has
> >> >> > =9A release semantic, since stores cannot pass loads or stores.
> >> >>
> >> >> They can pass non-temporal stores can't they?
> >> > Sure. But (our) barriers only work for WB memory accesses, in respec=
t to other
> >> > WB memory accesses.
> >> >
> >> > The atomic(9) contains not quite explicit mention of the requirement,
> >> > for ia32 and more direct notion for ia64. It could probably be rewor=
ded to
> >> > mention memory access type explicitely for ia32 too.
> >>
> >> I don't think this is right.
> >> What if I want to use NTI in a block of code locked? What if I want to
> >> use CLFLUSH? I simply cannot do that now because of the reordering
> >> requirement.
> > Assuming that NTI means "Non Temporal Instruction", Intel explicit
> > requirement is to use fence barrier if order shall be ensured. This,
> > as well as CLFLUSH use, is somewhat commonly documented. More, CLFLUSH
> > is documented by Intel to _not_ serialize with any other fencing or
> > serialization instruction, except MFENCE. So xchg-based _store_rel is
> > not different from mov-based _store_rel for CLFLUSH and non-temporal
> > ops.
> >
> > I do not see how you note is relevant.
> >
> >> Also, there is the more worrisome case of the string operations. If
> >> gcc/clang optimize the code in order to do string operations between
> >> locked path, this is not valid anymore as they can be reordered
> >> against the _rel() barrier.
> > They cannot. Fast string operation volatile store order only among
> > string operation itself, the operation cannot pass sequential store.
> > The store used in _store_rel thus cannot be passed by fast string
> > optimizations.
> >
> > I do not see how you note is relevant there, again.
>=20
> I'm not sure why but I thought that the string writes could be
> re-ordered against "external" writes too, but  I re-read the manual
> and this is not the case.
>=20
> >
> >>
> >> However, we should consider atomic(9) as a script for MI requirement
> >> of our locking primitives among the architectures. Right now too many
> >> things live on assumptions of people doing patches (like this case)
> >> rather than actually working on a common policy of what we can easilly
> >> support and what we can't.
> >>
> >> I also wondered often if we should use *fence on architectures
> >> supporting them, by default, because of the possibility to use FPU now
> >> (which wasn't present back in the day) and thus we cannot really
> >> guarantee memory ordering over stores of memory areas bigger than a
> >> quad-word. If we don't want to add the burden, we should explicitely
> >> mention that in atomic(9) or any other place.
> > The proposal to use fence explicitely contradicts recommendations from
> > the AMD Optimization Guide, which, JFYI, I cited in the updated comment
> > in the patch.
> >
> > How is FPU relevant to the memory model discussion, I left out of the
> > answer.
>=20
> I'm not saying to use *fence, I'm saying that we should document MD
> cases where this is required (we can use agnostic terms like "simple
> barrier") in atomic(9).
> Also, about FPU, I'm saying that before we had no need to consider
> FPU/XMM into consideration from a model perspective but now we should
> do that because we can use FPU within the kernel.
FPU and XMM, unless explicitely doing something weird, as in movnti case,
are following normal mem model rules.

The fact that we can use FPU in kernel does not makes it reasonable to use
FPU in kernel, except taking advantage of some non-regular CPU features
like AESNI. And in fact you still cannot (easily) use floating-point
due to exceptions.

>=20
> >> Definitively: I think this patch violates some edge cases. Please back=
 it out.
> > No. I explicitely inform you that I consider the backout request
> > as frivolous, technically unfounded, and that I will not back it out.
>=20
> My main motivation for asking the backout was for the ordering issue
> with the string operations, but it doesn't seem to be the case, than I
> withdraw my request, sorry.
> I'm fine with using explicit memory barriers in MD code that use NTI
> (I just assume gcc/clang won't over-optimize things on their own as
> thinking they want to reduce cache traffic by using NTI).

Intel simply cannot go that far while their own recommendation is to
use normal stores for unlock. For me, the real unexpected architectural
behaviour was of CLFLUSH. I was quite surprised when Alan pointed
that to me during the work on pmap_invalidate_cache_range().

--VdOwlNaOFKGAtAAV
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (FreeBSD)

iEYEARECAAYFAk/NBSoACgkQC3+MBN1Mb4jDhwCg4ixpZoE8TBwg9g4CHa6jUblY
tvMAn2ulp2iphhm+SBIXGCFIzLtLFf4c
=AZtw
-----END PGP SIGNATURE-----

--VdOwlNaOFKGAtAAV--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120604185746.GE85127>