Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 25 Sep 2000 03:02:13 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        grog@wantadilla.lemis.com (Greg Lehey)
Cc:        cp@bsdi.com (Chuck Paterson), archie@whistle.com (Archie Cobbs), brian@awfulhak.org (Brian Somers), joerg@cs.waikato.ac.nz (Joerg Micheel), mjacob@feral.com (Matthew Jacob), frank@exit.com (Frank Mayhar), jhb@pike.osd.bsdi.com (John Baldwin), markm@FreeBSD.ORG (Mark Murray), FreeBSD-arch@FreeBSD.ORG
Subject:   Re: Mutexes and semaphores (was: cvs commit: src/sys/conf files src/sys/sys random.h src/sys/dev/randomdev hash.c hash.h harvest
Message-ID:  <200009250302.UAA04620@usr05.primenet.com>
In-Reply-To: <20000924154216.D512@wantadilla.lemis.com> from "Greg Lehey" at Sep 24, 2000 03:42:16 PM

next in thread | previous in thread | raw e-mail | index | archive | help
> >> A MUTEX is just a sepaphore whose initial count is 1.
> >>
> >> ??
> >
> > In general this might be true, but in specific it isn't. 
> 
> As you know, I used to say exactly the same thing as Archie, but I've
> realized that this implied count of 1 causes a couple of important
> differences.  I'm still working on a clearer definition, but what I've
> seen so far is:
> 
> 1.  Because "mutexes" (I really hate this term; I wish I could find a
>     better one) only have an implied count of one, they can also have
>     the concept of an owner, which we use.
> 
> 2.  Because the mutex has an owner, only the owner can release it.
> 
> 3.  The mutex can also be "recursive" (it's really iterative, I
>     suppose): the owner can take it several times.  The only reason
>     for this appears to be sloppy coding, but in the short term I
>     think we're agreed that we can't dispose of that.
> 
> One thing that I don't think is important is the duration of
> ownership.  We currently use mutexes for short periods of time, which
> is why we have the spin version.

Actually, that's crucial, since it defines the conflict domain;
you can acquire a heavy lock after contending with a spin lock
for the right to acquire the heavy lock.  In most cases, where
the heavy lock is held a short time, you won't have any contention,
and thus can quickly grant the resource.

In the case of a long held resource, the contention domain is such
that the resource is probably contended, and has waiters outstanding;
this means that the release case (and thus the acquisition case)
must be much heavier weight.

I recommend:

<http://www.cl.cam.ac.uk/Research/SRG/netos/pegasus/reports/TR361.txt>;
<http://www.wam.umd.edu/whats_new/workshop3.0/common-tools/numerical_comp_guide/ncg_glossary.doc.html>;

This second is a rather good glossary, which is duplicated in many
places on the net.


> At Tandem, we only used semaphores, but they always had a count of 1,
> so they were effectively very close to our mutexes.  They didn't allow
> recursion, which is the Right Thing in a system designed from the
> ground up, but they also didn't have owners.  One of the most frequent
> complicated problems we had were system hangs (deadlocks), and we
> frequently couldn't figure out who had done what and why.  Having
> owners is a great debug aid.

I think that we need to be very clear on one thing: you can recurse
on a semaphore, but a true mutex will not permit recursion; it is a
light weight object, and has very little content.  It lacks a recurse
count, and many other attributes of semaphores.

Microsoft actually got this right in Windows, surprisingly.

When you attempt to get a mutex you already hold, you are shooting
yourself in the foot; it means that you didn't track the resource
sufficiently.  Usually this occurs when a mutex is acquired at one
level, and released at another, or worse, when it is acquired in the
wrong place (e.g. a subroutine called several times from a higher
level routine, which should be acquiring the mutex instead).

Disallowing recursion, mutex ownership is therefore implicit by
virtue of the holder of the mutex holding it.

In the case of a starvation or deadly embrace deadlock, one need
only get a stack trace of processes currently in the kernel to
determine where the problem lives; however, an owner would make
this rather automatic, and could aid debugging, as you say.  I do
have a problem with this approach, however, since it makes it much
more likely that people will be sloppy, and then wait for deadlocks
to be reported, rather than thinking through their code and ensuring
that deadlocks are not possible in the first place.  The idea that
fixing deadlocks in released code, rather than releasing only code
without deadlocks, is an acceptable approach needs to be discouraged.


> If we can expect that the mutex will, on average, be freed in less
> time than it would take to schedule a new process, spin locks can be a
> better alternative.  Otherwise we wouldn't need them at all.
> 
> Anyway, this doesn't directly relate to semaphores.  We have the basic
> issue of atomicity, which in general can be handled without spin
> locks, and that would apply to semaphores just as much as to mutexes.

The advantage to a test-and-set spin prior to acquisition of a
mutex is that the mutex can be acquired without taking a cache
synchronization hit between processors, which would otherwise be
necessary.  Some cache synchronization events will inevitably
occur, but they will be much less frequent.  The mutexes themselves
can be in non-cached pages, to accomplish this.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200009250302.UAA04620>