Date: Fri, 4 Jan 2002 12:15:01 +0100 From: Bernd Walter <ticso@cicely8.cicely.de> To: Terry Lambert <tlambert2@mindspring.com> Cc: Bernd Walter <ticso@cicely9.cicely.de>, Matthew Dillon <dillon@apollo.backplane.com>, Alfred Perlstein <bright@mu.org>, John Baldwin <jhb@FreeBSD.ORG>, arch@FreeBSD.ORG, Mike Smith <msmith@FreeBSD.ORG>, Bruce Evans <bde@zeta.org.au>, Michal Mertl <mime@traveller.cz>, Peter Jeremy <peter.jeremy@alcatel.com.au> Subject: Re: When to use atomic_ functions? (was: 64 bit counters) Message-ID: <20020104121500.E5294@cicely8.cicely.de> In-Reply-To: <3C34C27C.C7FE6713@mindspring.com> References: <XFMail.020102152920.jhb@FreeBSD.org> <200201030002.g0302Eo60575@apollo.backplane.com> <20020102180734.A82406@elvis.mu.org> <200201030012.g030Cgp60752@apollo.backplane.com> <3C33D3E3.38E758CA@mindspring.com> <20020103094239.GH53199@cicely9.cicely.de> <3C34C27C.C7FE6713@mindspring.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Jan 03, 2002 at 12:43:40PM -0800, Terry Lambert wrote: > Bernd Walter wrote: > > > In any case, multiple reader, single write doesn't need locks > > > unless it has to be done atomically. > > > > Single write means always with the same CPU or thread. > > Yes, of course. > > > If you always write with the same CPU the other might keep an > > old value in his cache for an unknown long time. > > You simply trust that caches gets syncronised automaticaly by > > regular events like context switching. > > I'm not shure this is valid. > > Actually, I'd suggest putting it in a non-cacheable page, if > multiple CPUs needed to get at it, and it must be 100% accurate > at all times; if it were in a cacheable page, there would be an > Invalidate (MESI, remember?), and the cache line in the other > CPUs that had it cached would be flushed. Only if you have a coherent cache design. On alphas you explicitly have to request that either based on the cacheline, what happens with ldx_l/stx_c, or globaly. Using non-cacheable is worse than using syncronisation primitives. > Personally, I'd suggest making the algorithms insensitive to > the data, in the neighborhood of 2 times the update latency. > > This is what I suggested to Alfred, when I suggested per CPU > run queues (and three years ago, when we had the threads > design meeting at the Whistle facility FreeBSD user group). I'm not commenting your example, because I'm far away from knowing the current alternative. > --- > > The point of this example is that the issue isn't about how much > or how little locks cost, but about designs that don't require > locks in the common case. > > So it really doesn't *matter* what the costs are up front, as > long as the costs are incurrect so infrequently that their > amortized value is lower than the lowest cost method being hit > much more frequently. Right - as long as the design is still correct. > In other words: Matt was right, when he said "Maybe we are looking > at this all wrong ... maybe we don't need locks ..." (paraphrasing). We have address 0x0 with initial value 1 and address 0x58 with inital value 2 Remember this scenario is possible: CPU-A write value 3 to 0x0 write value 4 to 0x58 CPU-B read value 4 from 0x58 read value 1 from 0x0 > > > An easy way to do this atomically, even for fairly large items, > > > is to toggle between an active and inactive structure of data, > > > where the pointer to the strucutre is assigned atomically. > > > > Only you use rel/acq behavour - atomic alone isn't strong enough. > > It depends on your cycle time on the objects. So long as the > update/read cost is very small relative to the update/read > frequency, you're OK. On alpha you only have special cost for the first access in a given cache-line. You savely access as often as you want as long as it's still locked in the cache without any performance loss other then strict odering of the locked/conditional instructions. What you realy want to avoid is CPU changing of that cache-line. But unless you 100% exclude another CPU accessing your values in any way you still need to care about syncronisation. > Basically, you need N copies, where the update/read frequency > divided by the update/read cost is greater than N*2. For my > example of a zero system call gettimeofday() and time(), using > a page with PG_U set on it to contain a reflection of the > timecounter data, N == 2 (the minimum N permissable). For > something else, N might be a larger value, and then instead of > two pointers for "current" and "previous", you are talking two > pointers for a two-handed clock through a circularly linked > list of structures containing the reflected data (the hands > are both moved on update-forward, so (1) the data is never > incorrect, and (2) the data is always one-behind during the > update process itself). Sorry - I can't follow you here enough to even see if it would at least work correct. -- B.Walter COSMO-Project http://www.cosmo-project.de ticso@cicely.de Usergroup info@cosmo-project.de To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020104121500.E5294>