Date: Thu, 7 Aug 2003 15:02:14 -0400 From: "Portante, Peter" <peter.portante@hp.com> To: <deischen@freebsd.org> Cc: alpha@freebsd.org Subject: RE: Atomic swap Message-ID: <B24FABB430F7C94D942D6386447C93DC0512B55F@tayexc17.americas.cpqcorp.net>
next in thread | raw e-mail | index | archive | help
Dan, > ---------- > From: Daniel Eischen > Reply To: deischen@freebsd.org > Sent: Thursday, August 7, 2003 1:44 PM > To: Portante, Peter > Cc: alpha@freebsd.org; deischen@freebsd.org > Subject: RE: Atomic swap >=20 > On Thu, 7 Aug 2003, Portante, Peter wrote: >=20 > > Dan, > >=20 > > I don't think you want to do the stq_c if the location already holds = the > > same value. Instead, check the loaded value to see if it is the = same as the >=20 > The purpose of the atomic swap is to make a FIFO queueing > list. The values should never be the same. It's not meant > to be used as test_and_set. >=20 Reasonable. We had a major performance bug in our code when we assumed = a routine performed a certain way based on its name. You might want to = change the name, because an atomic swap long could be used to implement = a mutex if one didn't know better and then this code will tube an MP = system under contention. > > value to be stored, and branch out of the loop returning the result = if it is > > they are the same. And starting with EV56, the need to do the = branch > > forward/branch back logic has been removed. And EV6 and later CPUs = do such > > a good job predicting the branching that it is not worth the = instruction > > stream space when that space can be used to avoid a stq_c. > >=20 > > Additionally, the stq_c destroys the contents of %2, so you need to = move the > > value in %2 into another register for use in the stq_c. I don't = know how to > > do that in the ASM, so I just used raw register names below, = highlighted in > > red. >=20 > How about this? >=20 Not too bad, except every time you loop you make another memory = reference to get the value. If you load it into a register once, you = can just move it into place each time before the store with out = referencing memory. For performance, don't reference memory unless you = absolutely have to. Also, you might want to issue a ldq, once, before = the actual loop of ldq_l so that the processor gets the cache line using = the normal load instruction avoiding the heavier load-locked logic. I just read Marcel's note, and his code looks pretty good. Just add a = ldq before the "1: ldq_l" that code will perform quite well. If you = don't want to add the ldq to the asm, just read the destination value = before call the atomic_swap_long(), it will really help this perform = well. -Peter
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?B24FABB430F7C94D942D6386447C93DC0512B55F>