From owner-freebsd-arch@FreeBSD.ORG  Tue Jan 27 20:09:22 2009
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 46634106564A
	for <arch@freebsd.org>; Tue, 27 Jan 2009 20:09:22 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 022AB8FC08
	for <arch@freebsd.org>; Tue, 27 Jan 2009 20:09:22 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from fledge.watson.org (fledge.watson.org [65.122.17.41])
	by cyrus.watson.org (Postfix) with ESMTPS id 7861A46B03;
	Tue, 27 Jan 2009 15:09:21 -0500 (EST)
Date: Tue, 27 Jan 2009 20:09:21 +0000 (GMT)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: Julian Elischer <julian@elischer.org>
In-Reply-To: <497F51D6.1000903@elischer.org>
Message-ID: <alpine.BSF.2.00.0901272001310.51605@fledge.watson.org>
References: <23211.1232871523@critter.freebsd.dk>
	<497C249C.5060507@elischer.org>
	<alpine.BSF.2.00.0901271509290.5592@fledge.watson.org>
	<497F51D6.1000903@elischer.org>
User-Agent: Alpine 2.00 (BSF 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: arch@freebsd.org, Poul-Henning Kamp <phk@phk.freebsd.dk>
Subject: Re: need for another mutex type/flag?
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 27 Jan 2009 20:09:22 -0000


On Tue, 27 Jan 2009, Julian Elischer wrote:

> Robert Watson wrote:
>> On Sun, 25 Jan 2009, Julian Elischer wrote:
>> 
>>> Even purely documentary would be good but given the option, I'd like it to 
>>> scream when Witness is enabled and you try get another mutex....
>>> 
>>> there are certain contexts (e.g. in most netgraph nodes) where we really 
>>> want the authors to only take such locks and taking more unpredictable 
>>> mutexes plays havoc with netgraph response times as a system as a whole, 
>>> since if one node blocks, teh thread doesn't get to go off and service 
>>> other nodes.
>> 
>> I thought lots of Netgraph nodes liked to do things like call m_pullup() 
>> and m_prepend()?  Disallowing acquiring mutexes will prevent them from 
>> doing that.
>
> I think I mentioned that in another mail. The problem I see is that some 
> module writers are tempted to just use a mutex in their node without 
> thinking about what the result will be.  My understanding is that the mbuf 
> allocation code has been tuned to within an inch of its life to try keep 
> it's waits to a minimum.  I am worried about it, but I am more worried about 
> a netgraph node waiting on something that is depending on some piece of 
> hardware such as what I think HPS had in his suggested patch for the 
> bluetooth code..

Right, but what I'm saying is: if we have a MTX_LEAFNODE flag for mtx_init(9), 
it won't work for any code that holds the lock over a call to the mbuf 
routines.  I am happy with us adding a MTX_LEAFNODE flag and would use it 
myself, I just not sure it will work for Netgraph node mutexes.

The case you're talking about is generally problematic for mutexes anyway -- 
in principle we divide the world of sleeping into two categories: sleeps of 
strictly "bounded" length that generall correspond to the sleeps associated 
with mutex or rwlock acquire, and sleeps of potentially "unbounded" length, 
such as those associated with msleep(9), cv_wait(9), sx(9), lockmgr(9), etc. 
If you try to perform an unbounded sleep while holding a mutex, then WITNESS 
will already complain about sleeping while holding a lock.

The tricky case is the "may sleep" case, in which case we already have a 
WITNESS call you can do to say this code may sleep, even though it doesn't in 
the common case, so warn if this is called with a mutex held so that we can 
catch that happening".  For example, mb_reclaim does this so that any attempt 
to call it with a mutex held will throw up a red flag:

/*
  * This is the protocol drain routine.
  *
  * No locks should be held when this is called.  The drain routines have to
  * presently acquire some locks which raises the possibility of lock order
  * reversal.
  */
static void
mb_reclaim(void *junk)
{
         struct domain *dp;
         struct protosw *pr;

         WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK | WARN_PANIC, NULL,
             "mb_reclaim()");

         for (dp = domains; dp != NULL; dp = dp->dom_next)
                 for (pr = dp->dom_protosw; pr < dp->dom_protoswNPROTOSW; pr++)
                         if (pr->pr_drain != NULL)
                                 (*pr->pr_drain)();
}

If there is a common or unconditional call to msleep(9) in code called by 
Netgraph with a mutex held, however, then it should already be displaying a 
warning when that happens.  If you want to simulate this effect without a lock 
necessarily be held, you can do what the softclock() code does:

                                 THREAD_NO_SLEEPING();
                                 c_func(c_arg);
                                 THREAD_SLEEPING_OK();

This causes WITNESS to complain loudly if the callout handler attempts to 
perform an unbounded sleep (i.e., msleep(), cv_wait(), etc).  Splitting the 
world into bounded an unbounded sleeps is quite useful, and is entirely about 
the sort of deadlock/cascading delay scenario you have in mind: code inside a 
mutex-protected block is essentially a critical section, and should never wait 
a long time, especially given that ithreads may acquire mutexes leading to 
potential deadlocks if msleep() is effectively waiting on an interrupt that 
will never be delivered -- if we allow ithreads to wait on msleep(), which we 
don't because of the above mechanisms.

Robert N M Watson
Computer Laboratory
University of Cambridge