From owner-freebsd-arch  Wed May 24 20:22:50 2000
Delivered-To: freebsd-arch@freebsd.org
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by hub.freebsd.org (Postfix) with ESMTP id 6BEF037B9DE
	for <arch@freebsd.org>; Wed, 24 May 2000 20:22:47 -0700 (PDT)
	(envelope-from dillon@apollo.backplane.com)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.9.3/8.9.1) id UAA78579;
	Wed, 24 May 2000 20:22:43 -0700 (PDT)
	(envelope-from dillon)
Date: Wed, 24 May 2000 20:22:43 -0700 (PDT)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <200005250322.UAA78579@apollo.backplane.com>
To: Chuck Paterson <cp@bsdi.com>
Cc: arch@freebsd.org
Subject: Re: Short summary 
References:  <200005250218.UAA16278@berserker.bsdi.com>
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


:}    The jist of the optimization is that if you use a lock prefix when
:}    locking, you do *not* need a lock prefix when unlocking.  Write 
:}    ordering is guarenteed on Intel (586 or above).
:
:	This won't work with the BSD/OS locks. The reason is that
:	we use the same word to detect that the someone is waiting
:	for the lock to be released. This works with spins
:	locks kind of (more just ahead) because you don't
:	need to do anything if someone else want the lock
:	you just go ahead and release it. With non spin
:	locks when you release a contested lock you need
:	to go put another process on the run queueu. 

    Ouch, having the contending cpu actually do a locked write
    to the lock (i.e. cache line) held by another cpu is really,
    really slow.  Both processors will eat the full overhead of
    the hardware cache coherency protocol - It's about 3 times 
    as expensive as a contended lock without the ping-pong writing
    and about twice as expensive as a non-contending lock,
    and recursive locks using this model will be about 5x as expensive
    even in the best case.

    If there is any way to avoid this, I would avoid this.


:	The "more just head" is address by you ahead actually.
:}
:}    Also, for recursive locks for the case where you ALREADY hold the lock,
:}    you do not need a lock prefix when incrementing or decrementing the
:}    count.
:}
:
:	The BSD/OS mutexs generally use the locked operation and
:	take a miss on the mutex if it is already held, even
:	by the same process. There is a flag to on the
:	mtx_enter/mtx_exit the recursion is likely and
:	that the code should check this before doing the
:	locked operation. 
:
:	By default BSD/OS mutexs are always optimized for the
:	non contested, non-recursed cased. This means that
:	everything is just a cmpxchg and if that wins your
:	done.

    If you can get rid of the contending-cpu-writes-to-the-lock
    case, your best case recursion code will be about 5 times
    faster in the recursion case and your best case non-contending
    non-recursive lock case will be about twice as fast.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message