From owner-svn-src-all@FreeBSD.ORG Thu Jan 19 16:55:48 2012 Return-Path: Delivered-To: svn-src-all@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A5F7D106566C; Thu, 19 Jan 2012 16:55:48 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail05.syd.optusnet.com.au (mail05.syd.optusnet.com.au [211.29.132.186]) by mx1.freebsd.org (Postfix) with ESMTP id 3AF758FC16; Thu, 19 Jan 2012 16:55:47 +0000 (UTC) Received: from c211-30-171-136.carlnfd1.nsw.optusnet.com.au (c211-30-171-136.carlnfd1.nsw.optusnet.com.au [211.30.171.136]) by mail05.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q0JGthr7006251 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 20 Jan 2012 03:55:46 +1100 Date: Fri, 20 Jan 2012 03:55:42 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: John Baldwin In-Reply-To: <201201191023.28426.jhb@freebsd.org> Message-ID: <20120120030456.O1411@besplex.bde.org> References: <201201160615.q0G6FE9r019542@svn.freebsd.org> <4F178CDC.3030807@gmail.com> <4F17B0DE.3060008@gmail.com> <201201191023.28426.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org, davidxu@FreeBSD.org Subject: Re: svn commit: r230201 - head/lib/libc/gen X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Jan 2012 16:55:48 -0000 On Thu, 19 Jan 2012, John Baldwin wrote: > On Thursday, January 19, 2012 12:57:50 am David Xu wrote: >> rdtsc() may not work on SMP, so I have updated it to use clock_gettime >> to get total time. >> http://people.freebsd.org/~davidxu/bench/semaphore2/ >> >> >> Still, lfence is a lot faster than atomic lock. I hope it does non-microbenchmarks. IIRC, jhb found that it was actually slower in some cases. I only did micro-benchmarks on Athlon64. > http://www.freebsd.org/~jhb/patches/amd64_fence.patch > > This the patch I've had for quite a while. Can you retest with this? You'll > probably have to install the updated header in /usr/include as well. % --- //depot/projects/smpng/sys/amd64/include/atomic.h 2011-01-05 17:06:25.000000000 0000 % +++ //depot/user/jhb/ktrace/amd64/include/atomic.h 2011-01-05 22:08:56.000000000 0000 % ... % @@ -213,13 +213,12 @@ % #if defined(_KERNEL) && !defined(SMP) % % /* % - * We assume that a = b will do atomic loads and stores. However, on a % - * PentiumPro or higher, reads may pass writes, so for that case we have % - * to use a serializing instruction (i.e. with LOCK) to do the load in % - * SMP kernels. For UP kernels, however, the cache of the single processor % + * We assume that a = b will do atomic loads and stores. However, reads % + * may pass writes, so we have to use fences in SMP kernels to preserve % + * ordering. For UP kernels, however, the cache of the single processor % * is always consistent, so we only need to take care of compiler. % */ % -#define ATOMIC_STORE_LOAD(TYPE, LOP, SOP) \ % +#define ATOMIC_STORE_LOAD(TYPE) \ % static __inline u_##TYPE \ % atomic_load_acq_##TYPE(volatile u_##TYPE *p) \ % { \ It also has some simplifications from removing the use of different operators for load and store. These simplifications seem to be not quite generic, since "lock; cmpxchg*" seems to be 2 cycles faster than "xchg*" in Athlon64. The ATOMIC_STORE_LOAD() macro is obfuscatory. Better to have separate macros for load/store, like we do for set/clear/add/subtract. (The latter can be obfuscated even better using 4 parameters for the ops. Then better yet by making the type a parameter.) % @@ -240,32 +239,22 @@ % % #else /* !(_KERNEL && !SMP) */ % % -#define ATOMIC_STORE_LOAD(TYPE, LOP, SOP) \ % +#define ATOMIC_STORE_LOAD(TYPE) \ % static __inline u_##TYPE \ % atomic_load_acq_##TYPE(volatile u_##TYPE *p) \ % { \ % - u_##TYPE res; \ % + u_##TYPE v; \ % \ % - __asm __volatile(MPLOCKED LOP \ % - : "=a" (res), /* 0 */ \ % - "=m" (*p) /* 1 */ \ % - : "m" (*p) /* 2 */ \ % - : "memory", "cc"); \ % - \ % - return (res); \ % + v = *p; \ % + __asm __volatile("lfence" ::: "memory"); \ Style bug (missing spaces around binary operator `:') which becomes a syntax error for C++. Other places in this file use ` : : : '. lfence() should be in cpufunc.h if it can be done separately like this. However, I think it can't be done separately -- it needs to be done in the same asm as the load/store, since separate C statements may be reordered. This is the same problem that forces us to write __asm volatile("sti; hlt"); instead of sti(); hlt(); in too many places for idling in machdep.c. BTW, sti() and hlt() are bogusly not in cpufunc.h either: - I misnamed sti() as disable_intr() since disable_intr() was supposed to be MI and I didn't understand that any MI interface should not be direct in cpufunc.h. - someone misnamed hlt() as halt(). Bruce