Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 20 Jan 2012 03:55:42 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        John Baldwin <jhb@FreeBSD.org>
Cc:        svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org, davidxu@FreeBSD.org
Subject:   Re: svn commit: r230201 - head/lib/libc/gen
Message-ID:  <20120120030456.O1411@besplex.bde.org>
In-Reply-To: <201201191023.28426.jhb@freebsd.org>
References:  <201201160615.q0G6FE9r019542@svn.freebsd.org> <4F178CDC.3030807@gmail.com> <4F17B0DE.3060008@gmail.com> <201201191023.28426.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 19 Jan 2012, John Baldwin wrote:

> On Thursday, January 19, 2012 12:57:50 am David Xu wrote:
>> rdtsc() may not work on SMP, so I have updated it to use clock_gettime
>> to get total time.
>> http://people.freebsd.org/~davidxu/bench/semaphore2/
>> <http://people.freebsd.org/%7Edavidxu/bench/semaphore2/>;
>>
>> Still, lfence is a lot faster than atomic lock.

I hope it does non-microbenchmarks.  IIRC, jhb found that it was
actually slower in some cases.  I only did micro-benchmarks on Athlon64.

> http://www.freebsd.org/~jhb/patches/amd64_fence.patch
>
> This the patch I've had for quite a while.  Can you retest with this?  You'll
> probably have to install the updated header in /usr/include as well.

% --- //depot/projects/smpng/sys/amd64/include/atomic.h	2011-01-05 17:06:25.000000000 0000
% +++ //depot/user/jhb/ktrace/amd64/include/atomic.h	2011-01-05 22:08:56.000000000 0000
% ...
% @@ -213,13 +213,12 @@
%  #if defined(_KERNEL) && !defined(SMP)
% 
%  /*
% - * We assume that a = b will do atomic loads and stores.  However, on a
% - * PentiumPro or higher, reads may pass writes, so for that case we have
% - * to use a serializing instruction (i.e. with LOCK) to do the load in
% - * SMP kernels.  For UP kernels, however, the cache of the single processor
% + * We assume that a = b will do atomic loads and stores.  However, reads
% + * may pass writes, so we have to use fences in SMP kernels to preserve
% + * ordering.  For UP kernels, however, the cache of the single processor
%   * is always consistent, so we only need to take care of compiler.
%   */
% -#define	ATOMIC_STORE_LOAD(TYPE, LOP, SOP)		\
% +#define	ATOMIC_STORE_LOAD(TYPE)				\
%  static __inline u_##TYPE				\
%  atomic_load_acq_##TYPE(volatile u_##TYPE *p)		\
%  {							\

It also has some simplifications from removing the use of different
operators for load and store.  These simplifications seem to be not
quite generic, since "lock; cmpxchg*" seems to be 2 cycles faster
than "xchg*" in Athlon64.  The ATOMIC_STORE_LOAD() macro is obfuscatory.
Better to have separate macros for load/store, like we do for
set/clear/add/subtract.  (The latter can be obfuscated even better
using 4 parameters for the ops.  Then better yet by making the type
a parameter.)

% @@ -240,32 +239,22 @@
% 
%  #else /* !(_KERNEL && !SMP) */
% 
% -#define	ATOMIC_STORE_LOAD(TYPE, LOP, SOP)		\
% +#define	ATOMIC_STORE_LOAD(TYPE)				\
%  static __inline u_##TYPE				\
%  atomic_load_acq_##TYPE(volatile u_##TYPE *p)		\
%  {							\
% -	u_##TYPE res;					\
% +	u_##TYPE v;					\
%  							\
% -	__asm __volatile(MPLOCKED LOP			\
% -	: "=a" (res),			/* 0 */		\
% -	  "=m" (*p)			/* 1 */		\
% -	: "m" (*p)			/* 2 */		\
% -	: "memory", "cc");				\
% -							\
% -	return (res);					\
% +	v = *p;						\
% +	__asm __volatile("lfence" ::: "memory");	\

Style bug (missing spaces around binary operator `:') which becomes
a syntax error for C++.  Other places in this file use ` : : : '.

lfence() should be in cpufunc.h if it can be done separately
like this.  However, I think it can't be done separately --
it needs to be done in the same asm as the load/store, since
separate C statements may be reordered.  This is the same
problem that forces us to write __asm volatile("sti; hlt");
instead of sti(); hlt(); in too many places for idling in
machdep.c.  BTW, sti() and hlt() are bogusly not in cpufunc.h
either:
- I misnamed sti() as disable_intr() since disable_intr() was
   supposed to be MI and I didn't understand that any MI
   interface should not be direct in cpufunc.h.
- someone misnamed hlt() as halt().

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120120030456.O1411>