From owner-svn-src-all@FreeBSD.ORG  Thu Jan 19 16:55:48 2012
Return-Path: <owner-svn-src-all@FreeBSD.ORG>
Delivered-To: svn-src-all@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A5F7D106566C;
	Thu, 19 Jan 2012 16:55:48 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail05.syd.optusnet.com.au (mail05.syd.optusnet.com.au
	[211.29.132.186])
	by mx1.freebsd.org (Postfix) with ESMTP id 3AF758FC16;
	Thu, 19 Jan 2012 16:55:47 +0000 (UTC)
Received: from c211-30-171-136.carlnfd1.nsw.optusnet.com.au
	(c211-30-171-136.carlnfd1.nsw.optusnet.com.au [211.30.171.136])
	by mail05.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q0JGthr7006251
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Fri, 20 Jan 2012 03:55:46 +1100
Date: Fri, 20 Jan 2012 03:55:42 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: John Baldwin <jhb@FreeBSD.org>
In-Reply-To: <201201191023.28426.jhb@freebsd.org>
Message-ID: <20120120030456.O1411@besplex.bde.org>
References: <201201160615.q0G6FE9r019542@svn.freebsd.org>
	<4F178CDC.3030807@gmail.com>
	<4F17B0DE.3060008@gmail.com> <201201191023.28426.jhb@freebsd.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org,
	src-committers@FreeBSD.org, davidxu@FreeBSD.org
Subject: Re: svn commit: r230201 - head/lib/libc/gen
X-BeenThere: svn-src-all@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "SVN commit messages for the entire src tree \(except for &quot;
	user&quot; and &quot; projects&quot; \)" <svn-src-all.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-all>,
	<mailto:svn-src-all-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-all>
List-Post: <mailto:svn-src-all@freebsd.org>
List-Help: <mailto:svn-src-all-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-all>,
	<mailto:svn-src-all-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 19 Jan 2012 16:55:48 -0000

On Thu, 19 Jan 2012, John Baldwin wrote:

> On Thursday, January 19, 2012 12:57:50 am David Xu wrote:
>> rdtsc() may not work on SMP, so I have updated it to use clock_gettime
>> to get total time.
>> http://people.freebsd.org/~davidxu/bench/semaphore2/
>> <http://people.freebsd.org/%7Edavidxu/bench/semaphore2/>
>>
>> Still, lfence is a lot faster than atomic lock.

I hope it does non-microbenchmarks.  IIRC, jhb found that it was
actually slower in some cases.  I only did micro-benchmarks on Athlon64.

> http://www.freebsd.org/~jhb/patches/amd64_fence.patch
>
> This the patch I've had for quite a while.  Can you retest with this?  You'll
> probably have to install the updated header in /usr/include as well.

% --- //depot/projects/smpng/sys/amd64/include/atomic.h	2011-01-05 17:06:25.000000000 0000
% +++ //depot/user/jhb/ktrace/amd64/include/atomic.h	2011-01-05 22:08:56.000000000 0000
% ...
% @@ -213,13 +213,12 @@
%  #if defined(_KERNEL) && !defined(SMP)
% 
%  /*
% - * We assume that a = b will do atomic loads and stores.  However, on a
% - * PentiumPro or higher, reads may pass writes, so for that case we have
% - * to use a serializing instruction (i.e. with LOCK) to do the load in
% - * SMP kernels.  For UP kernels, however, the cache of the single processor
% + * We assume that a = b will do atomic loads and stores.  However, reads
% + * may pass writes, so we have to use fences in SMP kernels to preserve
% + * ordering.  For UP kernels, however, the cache of the single processor
%   * is always consistent, so we only need to take care of compiler.
%   */
% -#define	ATOMIC_STORE_LOAD(TYPE, LOP, SOP)		\
% +#define	ATOMIC_STORE_LOAD(TYPE)				\
%  static __inline u_##TYPE				\
%  atomic_load_acq_##TYPE(volatile u_##TYPE *p)		\
%  {							\

It also has some simplifications from removing the use of different
operators for load and store.  These simplifications seem to be not
quite generic, since "lock; cmpxchg*" seems to be 2 cycles faster
than "xchg*" in Athlon64.  The ATOMIC_STORE_LOAD() macro is obfuscatory.
Better to have separate macros for load/store, like we do for
set/clear/add/subtract.  (The latter can be obfuscated even better
using 4 parameters for the ops.  Then better yet by making the type
a parameter.)

% @@ -240,32 +239,22 @@
% 
%  #else /* !(_KERNEL && !SMP) */
% 
% -#define	ATOMIC_STORE_LOAD(TYPE, LOP, SOP)		\
% +#define	ATOMIC_STORE_LOAD(TYPE)				\
%  static __inline u_##TYPE				\
%  atomic_load_acq_##TYPE(volatile u_##TYPE *p)		\
%  {							\
% -	u_##TYPE res;					\
% +	u_##TYPE v;					\
%  							\
% -	__asm __volatile(MPLOCKED LOP			\
% -	: "=a" (res),			/* 0 */		\
% -	  "=m" (*p)			/* 1 */		\
% -	: "m" (*p)			/* 2 */		\
% -	: "memory", "cc");				\
% -							\
% -	return (res);					\
% +	v = *p;						\
% +	__asm __volatile("lfence" ::: "memory");	\

Style bug (missing spaces around binary operator `:') which becomes
a syntax error for C++.  Other places in this file use ` : : : '.

lfence() should be in cpufunc.h if it can be done separately
like this.  However, I think it can't be done separately --
it needs to be done in the same asm as the load/store, since
separate C statements may be reordered.  This is the same
problem that forces us to write __asm volatile("sti; hlt");
instead of sti(); hlt(); in too many places for idling in
machdep.c.  BTW, sti() and hlt() are bogusly not in cpufunc.h
either:
- I misnamed sti() as disable_intr() since disable_intr() was
   supposed to be MI and I didn't understand that any MI
   interface should not be direct in cpufunc.h.
- someone misnamed hlt() as halt().

Bruce