Date: Wed, 24 May 2000 18:52:45 -0700 (PDT) From: Matthew Dillon <dillon@apollo.backplane.com> To: Chuck Paterson <cp@bsdi.com> Cc: arch@FreeBSD.ORG Subject: Re: Short summary Message-ID: <200005250152.SAA78130@apollo.backplane.com>
index | next in thread | raw e-mail
:virtually the same speed as the Giant lock BSD/OS kernel in a uniprocessor
:environment. It occurred to me today that in a uniprocessor environment
:the lock prefix to the cmpxchg can be removed. I ran some
:experiments. The following data is from a very limited sample size. On
:a couple of different systems with different clock rates removing
:the lock prefix reduced execution time of mutex operations to one
:third of their original value. Running the same job with two kernels
:whose only difference was the lock prefix there was a reduction in
:system time of 2.5 percent. This suggested that the total system
:time used for locking with the SMP locks in place is 3.6 percent
:and with the the locks trimmed for uniprocessor only operation is
:1.2 percent. (Please excuse rounding errors).
Chuck, there was extensive debate and testing on both Linux and
FreeBSD with regards to locked instructions in an SMP environment.
It was determined that there is an optimization one can make which
improves lock performance on SMP systems.
The jist of the optimization is that if you use a lock prefix when
locking, you do *not* need a lock prefix when unlocking. Write
ordering is guarenteed on Intel (586 or above).
Also, for recursive locks for the case where you ALREADY hold the lock,
you do not need a lock prefix when incrementing or decrementing the
count.
Take a look at the FreeBSD mp_unlock code in 4.x or 5.x (with a reasonably
recent cvs update) for an example. /usr/src/sys/i386/i386/mplock.s,
the MPrellock_edx subroutine. These changes saved over a microsecond
in syscall overhead for FreeBSD SMP.
This optimization radically improves the performance of an unlock at
the cost of adding a slight delay before contending cpu's see the
change. Since there is no lock contention 99.999% of the time, the
delay is completely absorbed and you realize an increase in performance
across the board.
The recursion optimization makes recursive locks practical in an SMP
setting. There is virtually *NO* overhead after you've obtained the
initial lock.
-Matt
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message
home |
help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200005250152.SAA78130>
