Date: Thu, 6 May 2004 14:17:51 -0400 From: John Baldwin <jhb@FreeBSD.org> To: freebsd-current@FreeBSD.org Cc: Andrew Gallatin <gallatin@cs.duke.edu> Subject: Re: 4.7 vs 5.2.1 SMP/UP bridging performance Message-ID: <200405061417.51886.jhb@FreeBSD.org> In-Reply-To: <20040507031253.Y21938@gamplex.bde.org> References: <FE045D4D9F7AED4CBFF1B3B813C85337045D8CB5@mail.sandvine.com> <20040506150754.GC27139@empiric.dek.spc.org> <20040507031253.Y21938@gamplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thursday 06 May 2004 01:18 pm, Bruce Evans wrote: > On Thu, 6 May 2004, Bruce M Simpson wrote: > > On Thu, May 06, 2004 at 10:15:44AM -0400, Andrew Gallatin wrote: > > > For what its worth, using those operations yeilds these results > > > on my 2.53GHz P4 (for UP) > > > > > > Mutex (atomic_store_rel_int) cycles per iteration: 208 > > > Mutex (sfence) cycles per iteration: 85 > > > Mutex (lfence) cycles per iteration: 63 > > > Mutex (mfence) cycles per iteration: 169 > > > Mutex (none) cycles per iteration: 18 > > > > > > lfence looks like a winner.. > > > > Please be aware, though, that the different FENCE instructions are acting > > as fences against different things. The NASM documentation has a good > > quick reference for what each of the instructions do, but the definitive > > reference is Intel's IA-32 programmer's reference manuals. > > They are also documented in amd64 manuals. > > Don't they all act as fences only on the same CPU, so they are no help > for SMP? They are still almost twice as slow than full locks on Athlons, > so hopefully they do more. They are a traditional membar like membar on Sparc or acq/rel on ia64. membars only have to apply to the current CPU, but you have to use them in conjunction with a memory address used to implement a lock. Thus, when you acquire a lock, you want to use a lfence to ensure that the CPU won't go past the lfence (assuming lfence is like ia64 acq and sfence is like ia64 rel) for loads. This ensures that you don't read any of the locked values until you have the lock. On release you would use a sfence to prevent any stores from occurring before the store that releases the actual lock. The fence doesn't push out the pending writes to the other CPUs. However, it does mean that another CPU won't see that the lock is released unless it can also see all the other stores before the sfence. Thus, you can actually have a CPU spin waiting for a lock that is already unlocked. I've seen this on my test Alpha (DS20) where CPU0 unlocked sched_lock, CPU1 logged a KTR trace saying it was starting to spin on sched_lock, and a short time later, CPU1 then logged saying it had gotten sched_lock. I'm not sure if *fence is quite that weak. They might be though. Note that each generation of ia32 processors seems to have a weaker memory model than the previous generation. -- John Baldwin <jhb@FreeBSD.org> <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200405061417.51886.jhb>