From owner-freebsd-arch Wed May 24 20:22:50 2000 Delivered-To: freebsd-arch@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by hub.freebsd.org (Postfix) with ESMTP id 6BEF037B9DE for ; Wed, 24 May 2000 20:22:47 -0700 (PDT) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.9.3/8.9.1) id UAA78579; Wed, 24 May 2000 20:22:43 -0700 (PDT) (envelope-from dillon) Date: Wed, 24 May 2000 20:22:43 -0700 (PDT) From: Matthew Dillon Message-Id: <200005250322.UAA78579@apollo.backplane.com> To: Chuck Paterson Cc: arch@freebsd.org Subject: Re: Short summary References: <200005250218.UAA16278@berserker.bsdi.com> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG :} The jist of the optimization is that if you use a lock prefix when :} locking, you do *not* need a lock prefix when unlocking. Write :} ordering is guarenteed on Intel (586 or above). : : This won't work with the BSD/OS locks. The reason is that : we use the same word to detect that the someone is waiting : for the lock to be released. This works with spins : locks kind of (more just ahead) because you don't : need to do anything if someone else want the lock : you just go ahead and release it. With non spin : locks when you release a contested lock you need : to go put another process on the run queueu. Ouch, having the contending cpu actually do a locked write to the lock (i.e. cache line) held by another cpu is really, really slow. Both processors will eat the full overhead of the hardware cache coherency protocol - It's about 3 times as expensive as a contended lock without the ping-pong writing and about twice as expensive as a non-contending lock, and recursive locks using this model will be about 5x as expensive even in the best case. If there is any way to avoid this, I would avoid this. : The "more just head" is address by you ahead actually. :} :} Also, for recursive locks for the case where you ALREADY hold the lock, :} you do not need a lock prefix when incrementing or decrementing the :} count. :} : : The BSD/OS mutexs generally use the locked operation and : take a miss on the mutex if it is already held, even : by the same process. There is a flag to on the : mtx_enter/mtx_exit the recursion is likely and : that the code should check this before doing the : locked operation. : : By default BSD/OS mutexs are always optimized for the : non contested, non-recursed cased. This means that : everything is just a cmpxchg and if that wins your : done. If you can get rid of the contending-cpu-writes-to-the-lock case, your best case recursion code will be about 5 times faster in the recursion case and your best case non-contending non-recursive lock case will be about twice as fast. -Matt Matthew Dillon To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message