From owner-freebsd-smp  Tue Nov 23  9:52:25 1999
Delivered-To: freebsd-smp@freebsd.org
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by hub.freebsd.org (Postfix) with ESMTP id 329BA14C05
	for <freebsd-smp@FreeBSD.ORG>; Tue, 23 Nov 1999 09:52:15 -0800 (PST)
	(envelope-from dillon@apollo.backplane.com)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.9.3/8.9.1) id JAA10135;
	Tue, 23 Nov 1999 09:51:29 -0800 (PST)
	(envelope-from dillon)
Date: Tue, 23 Nov 1999 09:51:29 -0800 (PST)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <199911231751.JAA10135@apollo.backplane.com>
To: Peter Wemm <peter@netplex.com.au>,
	Tommy Hallgren <thallgren@yahoo.com>, freebsd-smp@FreeBSD.ORG
Subject: more on... Re: Matt's new unlock optimiazation 
References: <19991123140128.3A7D41C6D@overcee.netplex.com.au> <199911231703.JAA09896@apollo.backplane.com>
Sender: owner-freebsd-smp@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

::> 
::> The subject is: spin_unlock optimization(i386)
::
::A bit worrying, to say the least, especially coming from Linus (even moreso
::in light of his work at transmeta and what they're doing with/to Intel cpu's).
::
::Cheers,
::-Peter
:
:    hmm.  I was under the impression that the Pentium serialized writes
:    by reserving locations through their caches.  But knowing Intel, Linus 
:    is probably right.
:
:    Sometimes I wish I could just take a gun to the Pentium.
:
:    But this isn't a big deal, we should simply be able to do a locked 
:    write into the per-cpu area to synchronize just before we release
:    the lock.  This is still going to be a whole lot more efficient then
:    trying to lock a write to the shared lock, because we will almost certainly
:    already own that memory location.
:
:    I'll run some tests and commit a solution  Nobody commit anything.  No
:    matter what, we still get the benefit of the recursion lock optimization
:    which is actually the more important one.

    Ok, there's a problem but I don't believe you have to use a locked
    instruction to get around it.  All you should need to do is synchronize
    the instruction stream.  I remember from somewhere that 'NOP' (which is
    really just and xchg instruction) does this.  But I am not sure, I am 
    going to have to do some more research.

    I did test and am correct about the cache line ownership change overhead.
    On an SMP box, with two competing processors, using a locked instruction
    on the *same* physical memory location results in 3x the overhead 
    whereas the same locked instruction on different memory locations are
    more efficient.

    So if I can't find a definitive way to do instruction synchronization,
    we will simply do a dummy locked instruction into the per-cpu area.

    With cmpxchgl

	test3:/home/dillon# ./lock shared
	165 nS/loop
	test3:/home/dillon# ./lock private
	53 nS/loop

    With just xchgl

	test3:/home/dillon# ./lock shared
	160 nS/loop
	test3:/home/dillon# ./lock private
	47 nS/loop

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message