Date: Tue, 23 Nov 1999 09:51:29 -0800 (PST) From: Matthew Dillon <dillon@apollo.backplane.com> To: Peter Wemm <peter@netplex.com.au>, Tommy Hallgren <thallgren@yahoo.com>, freebsd-smp@FreeBSD.ORG Subject: more on... Re: Matt's new unlock optimiazation Message-ID: <199911231751.JAA10135@apollo.backplane.com> References: <19991123140128.3A7D41C6D@overcee.netplex.com.au> <199911231703.JAA09896@apollo.backplane.com>
index | next in thread | previous in thread | raw e-mail
::>
::> The subject is: spin_unlock optimization(i386)
::
::A bit worrying, to say the least, especially coming from Linus (even moreso
::in light of his work at transmeta and what they're doing with/to Intel cpu's).
::
::Cheers,
::-Peter
:
: hmm. I was under the impression that the Pentium serialized writes
: by reserving locations through their caches. But knowing Intel, Linus
: is probably right.
:
: Sometimes I wish I could just take a gun to the Pentium.
:
: But this isn't a big deal, we should simply be able to do a locked
: write into the per-cpu area to synchronize just before we release
: the lock. This is still going to be a whole lot more efficient then
: trying to lock a write to the shared lock, because we will almost certainly
: already own that memory location.
:
: I'll run some tests and commit a solution Nobody commit anything. No
: matter what, we still get the benefit of the recursion lock optimization
: which is actually the more important one.
Ok, there's a problem but I don't believe you have to use a locked
instruction to get around it. All you should need to do is synchronize
the instruction stream. I remember from somewhere that 'NOP' (which is
really just and xchg instruction) does this. But I am not sure, I am
going to have to do some more research.
I did test and am correct about the cache line ownership change overhead.
On an SMP box, with two competing processors, using a locked instruction
on the *same* physical memory location results in 3x the overhead
whereas the same locked instruction on different memory locations are
more efficient.
So if I can't find a definitive way to do instruction synchronization,
we will simply do a dummy locked instruction into the per-cpu area.
With cmpxchgl
test3:/home/dillon# ./lock shared
165 nS/loop
test3:/home/dillon# ./lock private
53 nS/loop
With just xchgl
test3:/home/dillon# ./lock shared
160 nS/loop
test3:/home/dillon# ./lock private
47 nS/loop
-Matt
Matthew Dillon
<dillon@backplane.com>
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message
home |
help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199911231751.JAA10135>
