Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 4 Jan 2002 12:15:01 +0100
From:      Bernd Walter <ticso@cicely8.cicely.de>
To:        Terry Lambert <tlambert2@mindspring.com>
Cc:        Bernd Walter <ticso@cicely9.cicely.de>, Matthew Dillon <dillon@apollo.backplane.com>, Alfred Perlstein <bright@mu.org>, John Baldwin <jhb@FreeBSD.ORG>, arch@FreeBSD.ORG, Mike Smith <msmith@FreeBSD.ORG>, Bruce Evans <bde@zeta.org.au>, Michal Mertl <mime@traveller.cz>, Peter Jeremy <peter.jeremy@alcatel.com.au>
Subject:   Re: When to use atomic_ functions? (was: 64 bit counters)
Message-ID:  <20020104121500.E5294@cicely8.cicely.de>
In-Reply-To: <3C34C27C.C7FE6713@mindspring.com>
References:  <XFMail.020102152920.jhb@FreeBSD.org> <200201030002.g0302Eo60575@apollo.backplane.com> <20020102180734.A82406@elvis.mu.org> <200201030012.g030Cgp60752@apollo.backplane.com> <3C33D3E3.38E758CA@mindspring.com> <20020103094239.GH53199@cicely9.cicely.de> <3C34C27C.C7FE6713@mindspring.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Jan 03, 2002 at 12:43:40PM -0800, Terry Lambert wrote:
> Bernd Walter wrote:
> > > In any case, multiple reader, single write doesn't need locks
> > > unless it has to be done atomically.
> > 
> > Single write means always with the same CPU or thread.
> 
> Yes, of course.
> 
> > If you always write with the same CPU the other might keep an
> > old value in his cache for an unknown long time.
> > You simply trust that caches gets syncronised automaticaly by
> > regular events like context switching.
> > I'm not shure this is valid.
> 
> Actually, I'd suggest putting it in a non-cacheable page, if
> multiple CPUs needed to get at it, and it must be 100% accurate
> at all times; if it were in a cacheable page, there would be an
> Invalidate (MESI, remember?), and the cache line in the other
> CPUs that had it cached would be flushed.

Only if you have a coherent cache design.
On alphas you explicitly have to request that either based on the
cacheline, what happens with ldx_l/stx_c, or globaly.
Using non-cacheable is worse than using syncronisation primitives.

> Personally, I'd suggest making the algorithms insensitive to
> the data, in the neighborhood of 2 times the update latency.
> 
> This is what I suggested to Alfred, when I suggested per CPU
> run queues (and three years ago, when we had the threads
> design meeting at the Whistle facility FreeBSD user group).

I'm not commenting your example, because I'm far away from knowing
the current alternative.

> ---
> 
> The point of this example is that the issue isn't about how much
> or how little locks cost, but about designs that don't require
> locks in the common case.
> 
> So it really doesn't *matter* what the costs are up front, as
> long as the costs are incurrect so infrequently that their
> amortized value is lower than the lowest cost method being hit
> much more frequently.

Right - as long as the design is still correct.

> In other words: Matt was right, when he said "Maybe we are looking
> at this all wrong ... maybe we don't need locks ..." (paraphrasing).

We have address 0x0 with initial value 1
and address 0x58 with inital value 2

Remember this scenario is possible:
CPU-A
write value 3 to 0x0
write value 4 to 0x58
CPU-B
read value 4 from 0x58
read value 1 from 0x0

> > > An easy way to do this atomically, even for fairly large items,
> > > is to toggle between an active and inactive structure of data,
> > > where the pointer to the strucutre is assigned atomically.
> > 
> > Only you use rel/acq behavour - atomic alone isn't strong enough.
> 
> It depends on your cycle time on the objects.  So long as the
> update/read cost is very small relative to the update/read
> frequency, you're OK.

On alpha you only have special cost for the first access in a
given cache-line.
You savely access as often as you want as long as it's still
locked in the cache without any performance loss other then
strict odering of the locked/conditional instructions.
What you realy want to avoid is CPU changing of that cache-line.
But unless you 100% exclude another CPU accessing your values
in any way you still need to care about syncronisation.

> Basically, you need N copies, where the update/read frequency
> divided by the update/read cost is greater than N*2.  For my
> example of a zero system call gettimeofday() and time(), using
> a page with PG_U set on it to contain a reflection of the
> timecounter data, N == 2 (the minimum N permissable).  For
> something else, N might be a larger value, and then instead of
> two pointers for "current" and "previous", you are talking two
> pointers for a two-handed clock through a circularly linked
> list of structures containing the reflected data (the hands
> are both moved on update-forward, so (1) the data is never
> incorrect, and (2) the data is always one-behind during the
> update process itself).

Sorry - I can't follow you here enough to even see if it would
at least work correct.

-- 
B.Walter              COSMO-Project         http://www.cosmo-project.de
ticso@cicely.de         Usergroup           info@cosmo-project.de


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020104121500.E5294>