Date: Thu, 19 Jan 2012 13:57:50 +0800 From: David Xu <listlog2011@gmail.com> To: davidxu@freebsd.org Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org, John Baldwin <jhb@freebsd.org> Subject: Re: svn commit: r230201 - head/lib/libc/gen Message-ID: <4F17B0DE.3060008@gmail.com> In-Reply-To: <4F178CDC.3030807@gmail.com> References: <201201160615.q0G6FE9r019542@svn.freebsd.org> <201201170957.47718.jhb@freebsd.org> <4F1629D5.4020605@gmail.com> <201201181009.23221.jhb@freebsd.org> <4F178CDC.3030807@gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2012/1/19 11:24, David Xu wrote: > On 2012/1/18 23:09, John Baldwin wrote: >> On Tuesday, January 17, 2012 9:09:25 pm David Xu wrote: >>> On 2012/1/17 22:57, John Baldwin wrote: >>>> On Monday, January 16, 2012 1:15:14 am David Xu wrote: >>>>> Author: davidxu >>>>> Date: Mon Jan 16 06:15:14 2012 >>>>> New Revision: 230201 >>>>> URL: http://svn.freebsd.org/changeset/base/230201 >>>>> >>>>> Log: >>>>> Insert read memory barriers. >>>> I think using atomic_load_acq() on sem->nwaiters would be clearer >>>> as it would >>>> indicate which variable you need to ensure is read after other >>>> operations. In >>>> general I think raw rmb/wmb usage should be avoided when possible >>>> as it is >>>> does not describe the programmer's intent as well. >>>> >>> Yes, I had considered that I may use atomic_load_acq(), but at that >>> time, >>> I thought it emits a bus locking, right ? so I just picked up rmb() >>> which >>> only affects current cpu. maybe atomic_load_acq() does same thing with >>> rmb() ? >>> it is still unclear to me. >> atomic_load_acq() is the same as rmb(). Right now it uses a locked >> instruction on amd64, but it could easily switch to lfence/sfence >> instead. I >> had patches to do that but I think bde@ had done some benchmarks that >> showed >> that change made no difference. >> > I wish there is a version uses lfence for atomic_load_acq(). I always > think > bus locking is expensive on a multiple-core machine. Here we work on > large > machine found that even current rwlock in libthr is not scale well if > most threads are readers, we have to implement CSNZI-like rwlock to avoid > CPU conflict. > http://people.csail.mit.edu/mareko/spaa09-scalablerwlocks.pdf > > I have just done a benchmark on my notebook which is a 4 SMT sandy bridge > CPU i3 2310m. > http://people.freebsd.org/~davidxu/bench/semaphore/ > <http://people.freebsd.org/%7Edavidxu/bench/semaphore/> > > The load_acq uses atomic locking is much slower than lfence: > http://people.freebsd.org/~davidxu/bench/semaphore/ministat.txt > <http://people.freebsd.org/%7Edavidxu/bench/semaphore/ministat.txt> > > benchmark program: > http://people.freebsd.org/~davidxu/bench/semaphore/sem_test.c > <http://people.freebsd.org/%7Edavidxu/bench/semaphore/sem_test.c> > rdtsc() may not work on SMP, so I have updated it to use clock_gettime to get total time. http://people.freebsd.org/~davidxu/bench/semaphore2/ <http://people.freebsd.org/%7Edavidxu/bench/semaphore2/> Still, lfence is a lot faster than atomic lock.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4F17B0DE.3060008>