Date: Fri, 20 Jan 2012 08:29:12 -0500 From: John Baldwin <jhb@freebsd.org> To: davidxu@freebsd.org Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org Subject: Re: svn commit: r230201 - head/lib/libc/gen Message-ID: <201201200829.12616.jhb@freebsd.org> In-Reply-To: <4F18B711.9000406@gmail.com> References: <201201160615.q0G6FE9r019542@svn.freebsd.org> <201201191023.28426.jhb@freebsd.org> <4F18B711.9000406@gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thursday, January 19, 2012 7:36:33 pm David Xu wrote: > On 2012/1/19 23:23, John Baldwin wrote: > > On Thursday, January 19, 2012 12:57:50 am David Xu wrote: > >> rdtsc() may not work on SMP, so I have updated it to use clock_gettime > >> to get total time. > >> http://people.freebsd.org/~davidxu/bench/semaphore2/ > >> <http://people.freebsd.org/%7Edavidxu/bench/semaphore2/> > >> > >> Still, lfence is a lot faster than atomic lock. > > http://www.freebsd.org/~jhb/patches/amd64_fence.patch > > > > This the patch I've had for quite a while. Can you retest with this? You'll > > probably have to install the updated header in /usr/include as well. > > > The lines in atomic_load_acq() seem not what I want: > > + v = *p; \ > + __asm __volatile("lfence" ::: "memory"); \ > > I think they should be swapped ? No, the point is that any subsequent loads cannot pass the '*p'. If you swap the order, then the compiler (and CPU) are free to reorder '*p' to be later than some other load later in program order. > + __asm __volatile("lfence" ::: "memory"); \ > + v = *p; \ > > What I need in the semaphore code is read can not pass write in such a special case. Hmm, it seems you need the equivalent of an 'mfence'. Note that for your first change in your diff, it should not have made a difference on x86. atomic_add_rel_int() already has the equivalent of an 'mfence' (on x86 the non-load/store ops all end up with full fences since that is what 'lock' provides, the architecture doesn't let us do more fine-grained barriers). It may be that you still have a race and that the barrier just changed the timing enough to fix your test case. Specifically, note that an 'rmb' (or 'lfence') does not force other CPUs to flush any pending writes, or wait for other CPUs to flush pending writes. Even with the lfence, you can still read a "stale" value of _has_waiters. This is why in-kernel locking primitives encode this state in the lock cookie via contested flags and use cmpset's to set them (and retry the loop if the cmpset fails). -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201201200829.12616.jhb>