Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 11 Aug 2004 17:29:23 -0400
From:      John Baldwin <jhb@FreeBSD.org>
To:        Bruce Evans <bde@zeta.org.au>
Cc:        Tim Robbins <tjr@FreeBSD.org>
Subject:   Re: Atomic operations on i386/amd64
Message-ID:  <200408111729.23451.jhb@FreeBSD.org>
In-Reply-To: <20040811170302.T1037@epsplex.bde.org>
References:  <20040805050422.GA41201@cat.robbins.dropbear.id.au> <200408051759.53079.jhb@FreeBSD.org> <20040811170302.T1037@epsplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wednesday 11 August 2004 04:56 am, Bruce Evans wrote:
> On Thu, 5 Aug 2004, John Baldwin wrote:
> > Actually, using mov instead of lock xchg for store_rel reduced
> > performance in some benchmarks Scott ran on an SMP machine, I'm guessing
> > due to the higher latency of locks becoming available to other CPUs.  I'm
> > still waiting for benchmark results on UP to see if the change should be
> > made under #ifndef SMP or some such.
>
> I don't believe unlocked instructions could be slower, and using
> unlocked && unfenced instructions is just broken in the SMP case.
> Perhaps there is enough synchronization provided by the lock in load_acq
> (which in theory needs less locking than store_rel) for missing
> synchronization in store_rel to sort of work.

x86 processors ensure program order of stores, so you don't actually need an 
sfence because it would be redundant.  The problem with using a simple mov is 
that even though it is faster it can sit in a cache for a while before it is 
visible to other CPUs hence the higher latency problem that I alluded to 
above.

> > > Also, could we use MFENCE/LFENCE/SFENCE in combination with MOV on
> > > SMP systems instead of LOCK CMPXCHG / (implied LOCK) XCHG?
>
> It isn't clear to me (from amd64 manuals) that *FENCE affects caches
> other than ones seen by the current CPU.  I think they do, and can be
> used (MFENCE might be needed for both).  They should work for the same
> reasons that "LOCK MOV" is an invalid  instruction (MOV is inherently
> atomic (?)).  Apparently we are using fake "LOCK MOV"s just for the
> side effects of the lock instruction (at least on amd64's, the lock
> instruction does *FENCE implicitly).

*FENCE doesn't do cache flushing, it determines write ordering as far as the 
order that writes become visible to devices outside of the CPU, e.g. other 
CPUs and other devices that can access memory via DMA, etc.

-- 
John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200408111729.23451.jhb>