From owner-freebsd-threads@FreeBSD.ORG  Fri May 16 21:18:15 2008
Return-Path: <owner-freebsd-threads@FreeBSD.ORG>
Delivered-To: freebsd-threads@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 86BAD1065671;
	Fri, 16 May 2008 21:18:15 +0000 (UTC)
	(envelope-from b.j.casavant@ieee.org)
Received: from yeppers.tdkt.org (skyline.tdkt.org [209.98.211.67])
	by mx1.freebsd.org (Postfix) with ESMTP id C47CE8FC15;
	Fri, 16 May 2008 21:18:14 +0000 (UTC)
	(envelope-from b.j.casavant@ieee.org)
Received: from pkunk.americas.sgi.com (cfcafwp.sgi.com [192.48.179.6])
	(authenticated bits=0)
	by yeppers.tdkt.org (8.12.11/8.12.11/erikj-OpenBSD) with ESMTP id
	m4GLI3vl003370; Fri, 16 May 2008 16:18:07 -0500 (CDT)
Date: Fri, 16 May 2008 16:17:52 -0500 (CDT)
From: Brent Casavant <b.j.casavant@ieee.org>
X-X-Sender: bcasavan@pkunk.americas.sgi.com
To: Alfred Perlstein <alfred@freebsd.org>
In-Reply-To: <20080516201555.GL32532@elvis.mu.org>
Message-ID: <alpine.BSF.1.10.0805161522070.80796@pkunk.americas.sgi.com>
References: <482B0297.2050300@icyb.net.ua> <482BBA77.8000704@freebsd.org>
	<482BF5EA.5010806@icyb.net.ua>
	<20080516201555.GL32532@elvis.mu.org>
User-Agent: Alpine 1.10 (BSF 962 2008-03-14)
Organization: "Angeltread Software Organization"
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Cc: Daniel Eischen <deischen@freebsd.org>, freebsd-threads@freebsd.org,
	Andriy Gapon <avg@icyb.net.ua>, David Xu <davidxu@freebsd.org>
Subject: Re: thread scheduling at mutex unlock
X-BeenThere: freebsd-threads@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: Brent Casavant <b.j.casavant@ieee.org>
List-Id: Threading on FreeBSD <freebsd-threads.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-threads>, 
	<mailto:freebsd-threads-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-threads>
List-Post: <mailto:freebsd-threads@freebsd.org>
List-Help: <mailto:freebsd-threads-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-threads>,
	<mailto:freebsd-threads-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 16 May 2008 21:18:15 -0000

On Fri, 16 May 2008, Alfred Perlstein wrote:

> If I am reading the test case, any heavily contented mutex will
> exhibit this problem.

[snip]

> Let's say that that "thread A" spends N (99%) of it's time doing
> "heavy processing" then that means that only M (1%) of the time the
> scheduler will preempt it at the correct time so that it is not
> holding the lock.
> 
> Clearly this will lead to starvation where "thread A" has a
> 99% chance per-quantum of being left holding the lock when it
> is preempted.

You're analysis is completely correct.  However, I believe the
conclusion that a change is necessary to address the problem is
not.

Any code which holds a lock over a long period of processing is
inherently going to cause "unfairness" in scheduling, just as you
observed in your comment.  However, this is an artifact of poor
application design, not the design of the locking primitive.

The correct solution is to re-architecht the code such that the
lock is held for much less time.  This could mean, for example,
holding the lock for just long enough to remove work items from
a queue, then performing the time-intensive portion of the
operation outside of the lock.  The nature of the solution depends
on the specific nature of the code, but in the end I would be
very surprised if the original complaint could not be solved by
rethinking the application's locking mechanisms and usage.

Let's pretend, for a moment, that the mutexes in question were
changed to utilize a queuing behavior.  Now imagine a scenario
where Thread A drops and reacquires a particular lock fairly
rapidly in a loop, and another Thread B does much the same.
A typical example might be two worker threads which are processing
requests from a common queue -- something we'll probably see more
of as multicore CPUs become common.

What happens in this case is Thread A may have been able to process
several queue items during its timeslice, but each time it goes to
re-acquire the lock it ends blocking until Thread B has a chance
to run and cycle through the lock.  On a dual-processor system
with no other processes of note running (the best scenario) you
are, with almost every loop iteration, seeing one thread or the
other sit idle until the other has a chance to go through it's
loop.  In other words, you'll approach a 50% performance reduction
in the worst case.

It's sort of like traffic control.  It's the difference between
a stop sign and a traffic light.  A stop sign (turning vehicles
not considered) is like a queuing mutex -- a single vehicle
goes through in one direction, then a single vehicle goes through
in the other direction.  A traffic light is like a non-queueing
mutex -- a large number of vehicles go through in one direction,
then a large number of vehicles go through in the other direction.
That is, with a traffic light, each direction is allocated a
timeslice, and the whole bunch of "work" gets done, instead of
unnecessarily wasting time switching back and forth.

A traffic light results in more efficient utilization of the
contended resource, just as a non-queued mutex does.  There's
a reason that stop signs get replaced with traffic lights at
busy intersections, despire their much higher cost.

So, let's get back to the original situation.  What we have is a
traffic light that's allowing traffic in one direction 99% of
the time, and the other direction 1% of the time.  _Of_course_
the traffic on the 1% direction is going to encounter delays
and starvation; the traffic control mechanism is being used
incorrectly!  The solution is not to replace it with a stop sign,
which admittedly makes everything more "fair", because it comes
at the incredibly high price of never allowing the free flow of
traffic in the other direction.

Other "solutions" are just as bad, for slightly different use
cases.  You could force a thread yield whenever a contended
mutex is released along with a wakeup and immediate schedule
of the other thread -- which will have exactly the same problems
as the queued mutex, the lock-dropping thread will just context
switch at a different point in its execution.  I'm sure there
are other clever ideas as well, but in the end, no matter what
you come up with for a solution, it's going to trade off one
set of advantages and disadvantages for another.  At least
the current design allows you the flexibility (with some
additional code, like I posted yesterday) of working however
you'd like it to work.  A queued or forced-switch design does
not, and thus would only serve to hobble the mutex.

Fix how the light is used (the application holding the lock too
long), don't mess with the light itself (the mutex).

Brent

-- 
Brent Casavant			Dance like everybody should be watching.
www.angeltread.org
KD5EMB, EN34lv