Date: Thu, 03 Jan 2002 09:57:02 +1100 From: Peter Jeremy <peter.jeremy@alcatel.com.au> To: Michal Mertl <mime@traveller.cz> Cc: Matthew Dillon <dillon@apollo.backplane.com>, Bruce Evans <bde@zeta.org.au>, Mike Smith <msmith@FreeBSD.ORG>, Bernd Walter <ticso@cicely8.cicely.de>, arch@FreeBSD.ORG Subject: Re: When to use atomic_ functions? (was: 64 bit counters) Message-ID: <20020103095701.B561@gsmx07.alcatel.com.au> In-Reply-To: <Pine.BSF.4.41.0201021003580.18429-100000@prg.traveller.cz>; from mime@traveller.cz on Wed, Jan 02, 2002 at 03:53:55PM %2B0100 References: <200201012349.g01NnKA40071@apollo.backplane.com> <Pine.BSF.4.41.0201021003580.18429-100000@prg.traveller.cz>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2002-Jan-02 15:53:55 +0100, Michal Mertl <mime@traveller.cz> wrote: >I don't know how much time will be wasted - my measurements on pII show >the atomic_ operations aren't that expensive. As Matt has pointed out, this is only true if you have a single processor. Atomic operations always translate into bus cycles - and the bus is roughly an order of magnitude slower than the CPU core for current CPUs. The worst situation is where a common counter is updated by a random CPU - the counter will virtually always be in another CPU's cache, requiring multiple bus cycles to transfer the data. Also, many RISC processors (eg Alpha) don't have locked read-modify- write primitives. On the Alpha, you need an instruction sequence: loop: load_locked memory->register update register store_conditional register->memory if not success goto loop with a few memory barriers added to ensure that the load/store are visible to other CPUs. The store_conditional will fail if your CPU was interrupted or if another CPU updated an implementation-defined region including the specified memory address. (64-bit atomic operations on the IA32 use the same approach - using CMPXCHG8B as the store_conditional instruction). This approach is quite expensive when you have multiple CPU's contending for the same resource. >There is a lot of counters and to have all of them for each processor >would waste a bit of memory I would be surprised if it was more than a page or two per CPU - which is trivial in the overall scheme of things. > but more importantly it would require some structural changes - >which may end up causing counters update being even more expensive >that atomic_. This depends on how it is implemented. Obviously int counter[NCPUS]; will be just as expensive as performing atomic operations, but no-one in their right mind would do that. One approach is to aggregate all the per-CPU counters into a single region of KVM and arrange for that KVM to be mapped to different physical memory for each CPU. (Solaris does or did this). This means that the code to update the counter doesn't need to know whether a counter is per-CPU or not. The code to read the counters _does_ need to know that the counters are per-CPU and have to sum all the individual counters - which is more expensive than a straight read, but is normally far less frequent. Peter To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020103095701.B561>