From owner-freebsd-arch Wed Jan 2 14:57:50 2002 Delivered-To: freebsd-arch@freebsd.org Received: from netau1.alcanet.com.au (ntp.alcanet.com.au [203.62.196.27]) by hub.freebsd.org (Postfix) with ESMTP id 2A77537B41B; Wed, 2 Jan 2002 14:57:32 -0800 (PST) Received: from mfg1.cim.alcatel.com.au (mfg1.cim.alcatel.com.au [139.188.23.1]) by netau1.alcanet.com.au (8.9.3 (PHNE_22672)/8.9.3) with ESMTP id JAA23485; Thu, 3 Jan 2002 09:57:05 +1100 (EDT) Received: from gsmx07.alcatel.com.au by cim.alcatel.com.au (PMDF V5.2-32 #37641) with ESMTP id <01KCMPBDRB9CVFM2DO@cim.alcatel.com.au>; Thu, 3 Jan 2002 09:56:12 +1100 Received: (from jeremyp@localhost) by gsmx07.alcatel.com.au (8.11.6/8.11.6) id g02Mv2c00921; Thu, 03 Jan 2002 09:57:02 +1100 Content-return: prohibited Date: Thu, 03 Jan 2002 09:57:02 +1100 From: Peter Jeremy Subject: Re: When to use atomic_ functions? (was: 64 bit counters) In-reply-to: ; from mime@traveller.cz on Wed, Jan 02, 2002 at 03:53:55PM +0100 To: Michal Mertl Cc: Matthew Dillon , Bruce Evans , Mike Smith , Bernd Walter , arch@FreeBSD.ORG Mail-Followup-To: Michal Mertl , Matthew Dillon , Bruce Evans , Mike Smith , Bernd Walter , arch@FreeBSD.ORG Message-id: <20020103095701.B561@gsmx07.alcatel.com.au> MIME-version: 1.0 Content-type: text/plain; charset=us-ascii Content-disposition: inline User-Agent: Mutt/1.2.5i References: <200201012349.g01NnKA40071@apollo.backplane.com> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On 2002-Jan-02 15:53:55 +0100, Michal Mertl wrote: >I don't know how much time will be wasted - my measurements on pII show >the atomic_ operations aren't that expensive. As Matt has pointed out, this is only true if you have a single processor. Atomic operations always translate into bus cycles - and the bus is roughly an order of magnitude slower than the CPU core for current CPUs. The worst situation is where a common counter is updated by a random CPU - the counter will virtually always be in another CPU's cache, requiring multiple bus cycles to transfer the data. Also, many RISC processors (eg Alpha) don't have locked read-modify- write primitives. On the Alpha, you need an instruction sequence: loop: load_locked memory->register update register store_conditional register->memory if not success goto loop with a few memory barriers added to ensure that the load/store are visible to other CPUs. The store_conditional will fail if your CPU was interrupted or if another CPU updated an implementation-defined region including the specified memory address. (64-bit atomic operations on the IA32 use the same approach - using CMPXCHG8B as the store_conditional instruction). This approach is quite expensive when you have multiple CPU's contending for the same resource. >There is a lot of counters and to have all of them for each processor >would waste a bit of memory I would be surprised if it was more than a page or two per CPU - which is trivial in the overall scheme of things. > but more importantly it would require some structural changes - >which may end up causing counters update being even more expensive >that atomic_. This depends on how it is implemented. Obviously int counter[NCPUS]; will be just as expensive as performing atomic operations, but no-one in their right mind would do that. One approach is to aggregate all the per-CPU counters into a single region of KVM and arrange for that KVM to be mapped to different physical memory for each CPU. (Solaris does or did this). This means that the code to update the counter doesn't need to know whether a counter is per-CPU or not. The code to read the counters _does_ need to know that the counters are per-CPU and have to sum all the individual counters - which is more expensive than a straight read, but is normally far less frequent. Peter To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message