Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 9 Nov 2004 21:33:17 +0000 (GMT)
From:      Robert Watson <rwatson@FreeBSD.org>
To:        Julian Elischer <julian@elischer.org>
Cc:        Stephan Uphoff <ups@tree.com>
Subject:   Re: cvs commit: src/sys/i386/i386 pmap.c
Message-ID:  <Pine.NEB.3.96L.1041109212934.60848G-100000@fledge.watson.org>
In-Reply-To: <4191062A.6090009@elischer.org>

next in thread | previous in thread | raw e-mail | index | archive | help

On Tue, 9 Nov 2004, Julian Elischer wrote:

> >Assuming that these changes are correct, and pass whatever tests people
> >have in mind, this would be a very strong merge candidate for performance
> >reasons.  The difference is visible in packet send tests from user space
> >as a percentage or two improvement on UP on my P4, although it's a litte
> >hard to tell due to the noise. 
> >  
> Can you explain why a spin mutex is more expensive than a sleep mutex (I 
> assume this is uncontested)?

A sleep mutex involves only an atomic acquire.  A spin mutex also disables
interrupts.  The reasons for this are documented extensively in various
books on the topic, but my personal favorite is the UNIX Systems for
Modern Architectures version.  Basically, spin locks are the preferred way
to synchronize with interrupt handlers, and you don't want an interrupt
handler to spin when it attempts to preempt code holding the spin lock it
wants to acquire.  The easiest way to avoid this is to disable interrupts
while holding the spin lock.  Various critical section optimizations are
presumably applicable here, as with other uses of critical seections.  The
interrupt disable/enable instructions are especially expensive on the P4,
which shows up clearly in the spin lock micro-benchmarks.  Ideally, you
should see the cost of a spin lock being the cost of a sleep mutex plus
the cost of a critical section, minus some fudge due to pipe-lining.  The
odd case we've been discussing was one where this was clearly not the
case: an extra one hundred cycles appeared as a result of a cmpxchg which
is implemented via a locked instruction on the P4.  So it turns out,
elementary school arithmetic was actually useful :-).

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert@fledge.watson.org      Principal Research Scientist, McAfee Research




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.NEB.3.96L.1041109212934.60848G-100000>