Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 7 Aug 2003 15:02:14 -0400
From:      "Portante, Peter" <peter.portante@hp.com>
To:        <deischen@freebsd.org>
Cc:        alpha@freebsd.org
Subject:   RE: Atomic swap
Message-ID:  <B24FABB430F7C94D942D6386447C93DC0512B55F@tayexc17.americas.cpqcorp.net>

next in thread | raw e-mail | index | archive | help
Dan,

> ----------
> From: 	Daniel Eischen
> Reply To: 	deischen@freebsd.org
> Sent: 	Thursday, August 7, 2003 1:44 PM
> To: 	Portante, Peter
> Cc: 	alpha@freebsd.org; deischen@freebsd.org
> Subject: 	RE: Atomic swap
>=20
> On Thu, 7 Aug 2003, Portante, Peter wrote:
>=20
> > Dan,
> >=20
> > I don't think you want to do the stq_c if the location already holds =
the
> > same value.  Instead, check the loaded value to see if it is the =
same as the
>=20
> The purpose of the atomic swap is to make a FIFO queueing
> list.  The values should never be the same.  It's not meant
> to be used as test_and_set.
>=20
Reasonable.  We had a major performance bug in our code when we assumed =
a routine performed a certain way based on its name.  You might want to =
change the name, because an atomic swap long could be used to implement =
a mutex if one didn't know better and then this code will tube an MP =
system under contention.

> > value to be stored, and branch out of the loop returning the result =
if it is
> > they are the same.  And starting with EV56, the need to do the =
branch
> > forward/branch back logic has been removed.  And EV6 and later CPUs =
do such
> > a good job predicting the branching that it is not worth the =
instruction
> > stream space when that space can be used to avoid a stq_c.
> >=20
> > Additionally, the stq_c destroys the contents of %2, so you need to =
move the
> > value in %2 into another register for use in the stq_c.  I don't =
know how to
> > do that in the ASM, so I just used raw register names below, =
highlighted in
> > red.
>=20
> How about this?
>=20
Not too bad, except every time you loop you make another memory =
reference to get the value.  If you load it into a register once, you =
can just move it into place each time before the store with out =
referencing memory.  For performance, don't reference memory unless you =
absolutely have to.  Also, you might want to issue a ldq, once, before =
the actual loop of ldq_l so that the processor gets the cache line using =
the normal load instruction avoiding the heavier load-locked logic.

I just read Marcel's note, and his code looks pretty good.  Just add a =
ldq before the "1: ldq_l" that code will perform quite well.  If you =
don't want to add the ldq to the asm, just read the destination value =
before call the atomic_swap_long(), it will really help this perform =
well.

-Peter



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?B24FABB430F7C94D942D6386447C93DC0512B55F>