Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 03 Jan 2002 09:57:02 +1100
From:      Peter Jeremy <peter.jeremy@alcatel.com.au>
To:        Michal Mertl <mime@traveller.cz>
Cc:        Matthew Dillon <dillon@apollo.backplane.com>, Bruce Evans <bde@zeta.org.au>, Mike Smith <msmith@FreeBSD.ORG>, Bernd Walter <ticso@cicely8.cicely.de>, arch@FreeBSD.ORG
Subject:   Re: When to use atomic_ functions? (was: 64 bit counters)
Message-ID:  <20020103095701.B561@gsmx07.alcatel.com.au>
In-Reply-To: <Pine.BSF.4.41.0201021003580.18429-100000@prg.traveller.cz>; from mime@traveller.cz on Wed, Jan 02, 2002 at 03:53:55PM %2B0100
References:  <200201012349.g01NnKA40071@apollo.backplane.com> <Pine.BSF.4.41.0201021003580.18429-100000@prg.traveller.cz>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2002-Jan-02 15:53:55 +0100, Michal Mertl <mime@traveller.cz> wrote:
>I don't know how much time will be wasted - my measurements on pII show
>the atomic_ operations aren't that expensive.

As Matt has pointed out, this is only true if you have a single
processor.  Atomic operations always translate into bus cycles - and
the bus is roughly an order of magnitude slower than the CPU core for
current CPUs.  The worst situation is where a common counter is
updated by a random CPU - the counter will virtually always be in
another CPU's cache, requiring multiple bus cycles to transfer the
data.

Also, many RISC processors (eg Alpha) don't have locked read-modify-
write primitives.  On the Alpha, you need an instruction sequence:
  loop:	load_locked memory->register
	update register
	store_conditional register->memory
	if not success goto loop
with a few memory barriers added to ensure that the load/store are
visible to other CPUs.  The store_conditional will fail if your CPU
was interrupted or if another CPU updated an implementation-defined
region including the specified memory address.  (64-bit atomic
operations on the IA32 use the same approach - using CMPXCHG8B as the
store_conditional instruction).

This approach is quite expensive when you have multiple CPU's
contending for the same resource.

>There is a lot of counters and to have all of them for each processor
>would waste a bit of memory

I would be surprised if it was more than a page or two per CPU -
which is trivial in the overall scheme of things.

> but more importantly it would require some structural changes -
>which may end up causing counters update being even more expensive
>that atomic_.

This depends on how it is implemented.  Obviously
	int counter[NCPUS];
will be just as expensive as performing atomic operations, but no-one
in their right mind would do that.  One approach is to aggregate all
the per-CPU counters into a single region of KVM and arrange for that
KVM to be mapped to different physical memory for each CPU.  (Solaris
does or did this).  This means that the code to update the counter
doesn't need to know whether a counter is per-CPU or not.

The code to read the counters _does_ need to know that the counters
are per-CPU and have to sum all the individual counters - which is
more expensive than a straight read, but is normally far less frequent.

Peter

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020103095701.B561>