Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 19 Jan 2012 11:24:12 +0800
From:      David Xu <listlog2011@gmail.com>
To:        John Baldwin <jhb@freebsd.org>
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org, davidxu@freebsd.org
Subject:   Re: svn commit: r230201 - head/lib/libc/gen
Message-ID:  <4F178CDC.3030807@gmail.com>
In-Reply-To: <201201181009.23221.jhb@freebsd.org>
References:  <201201160615.q0G6FE9r019542@svn.freebsd.org> <201201170957.47718.jhb@freebsd.org> <4F1629D5.4020605@gmail.com> <201201181009.23221.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2012/1/18 23:09, John Baldwin wrote:
> On Tuesday, January 17, 2012 9:09:25 pm David Xu wrote:
>> On 2012/1/17 22:57, John Baldwin wrote:
>>> On Monday, January 16, 2012 1:15:14 am David Xu wrote:
>>>> Author: davidxu
>>>> Date: Mon Jan 16 06:15:14 2012
>>>> New Revision: 230201
>>>> URL: http://svn.freebsd.org/changeset/base/230201
>>>>
>>>> Log:
>>>>     Insert read memory barriers.
>>> I think using atomic_load_acq() on sem->nwaiters would be clearer as it would
>>> indicate which variable you need to ensure is read after other operations.  In
>>> general I think raw rmb/wmb usage should be avoided when possible as it is
>>> does not describe the programmer's intent as well.
>>>
>> Yes, I had considered that I may use atomic_load_acq(), but at that time,
>> I thought it emits a bus locking, right ? so I just picked up rmb() which
>> only affects current cpu. maybe atomic_load_acq() does same thing with
>> rmb() ?
>> it is still unclear to me.
> atomic_load_acq() is the same as rmb().  Right now it uses a locked
> instruction on amd64, but it could easily switch to lfence/sfence instead.  I
> had patches to do that but I think bde@ had done some benchmarks that showed
> that change made no difference.
>
I wish there is a version uses lfence for atomic_load_acq(). I always think
bus locking is expensive on a multiple-core machine. Here we work on large
machine found that even current rwlock in libthr is not scale well if
most threads are readers, we have to implement CSNZI-like rwlock to avoid
CPU conflict.
http://people.csail.mit.edu/mareko/spaa09-scalablerwlocks.pdf

I have just done a benchmark on my notebook which is a 4 SMT sandy bridge
CPU i3 2310m.
http://people.freebsd.org/~davidxu/bench/semaphore/ 
<http://people.freebsd.org/%7Edavidxu/bench/semaphore/>;

The load_acq uses atomic locking is much slower than lfence:
http://people.freebsd.org/~davidxu/bench/semaphore/ministat.txt 
<http://people.freebsd.org/%7Edavidxu/bench/semaphore/ministat.txt>;

benchmark program:
http://people.freebsd.org/~davidxu/bench/semaphore/sem_test.c 
<http://people.freebsd.org/%7Edavidxu/bench/semaphore/sem_test.c>;


Regards,
David Xu




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4F178CDC.3030807>