From owner-freebsd-hackers@FreeBSD.ORG  Fri Apr 27 17:49:42 2007
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
X-Original-To: freebsd-hackers@freebsd.org
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id 1668916A46D
	for <freebsd-hackers@freebsd.org>; Fri, 27 Apr 2007 17:49:42 +0000 (UTC)
	(envelope-from dillon@apollo.backplane.com)
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by mx1.freebsd.org (Postfix) with ESMTP id DCE4613C45A
	for <freebsd-hackers@freebsd.org>; Fri, 27 Apr 2007 17:49:41 +0000 (UTC)
	(envelope-from dillon@apollo.backplane.com)
Received: from apollo.backplane.com (localhost [127.0.0.1])
	by apollo.backplane.com (8.13.8/8.13.7) with ESMTP id l3RHdHrS009989;
	Fri, 27 Apr 2007 10:39:17 -0700 (PDT)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.13.8/8.13.4/Submit) id l3RHdG7a009988;
	Fri, 27 Apr 2007 10:39:16 -0700 (PDT)
Date: Fri, 27 Apr 2007 10:39:16 -0700 (PDT)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <200704271739.l3RHdG7a009988@apollo.backplane.com>
To: Hans Petter Selasky <hselasky@c2i.net>
References: <200704262136.33196.hselasky@c2i.net>
	<46311708.5030002@elischer.org> <200704270753.05438.hselasky@c2i.net>
Cc: freebsd-hackers@freebsd.org, Julian Elischer <julian@elischer.org>
Subject: Re: msleep() on recursivly locked mutexes
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 27 Apr 2007 17:49:42 -0000

    The real culprit here is passing held mutexes to unrelated procedures
    in the first place because those procedures might have to block, in
    order so those procedures can release and reacquire the mutex.
    That's just bad coding in my view.  The unrelated procedure has no
    clue as to what the mutex is or why it is being held and really has no
    business messing with it.

    What I did was implement spinlocks with VERY restricted capabilities,
    far more restricted then the capabilities of your mutexes.  Our
    spinlocks are meant only to be used to lock up tiny pieces of code
    (like for ref counting or structural or flag-changing operations).
    Plus the kernel automatically acts as if it were in a critical section
    if it takes an interrupt while the current thread is holding a spinlock.
    That way mainline code can just use a spinlock to deal with small bits
    of interlocked information without it costing much in the way of
    overhead.

    I made the decision that ANYTHING more complex then that would have to
    use a real lock, like a lockmgr lock or a token, depending on the
    characteristics desired.  To make it even more desireable I also 
    stripped down the lockmgr() lock implementation, removing numerous
    bits that were inherited from very old code methodologies that have no
    business being in a modern operating system, like LK_DRAIN.  And I
    removed the passing of an interlocking spinlock to the lockmgr code,
    because that methodology was being massively abused in existing code
    (and I do mean massively).

    I'm not quite sure what the best way to go is for FreeBSD, because
    you guys have made your mutexes just as or even more sophisticated
    then your normal locks in many respects, and you have like 50 different
    types of locks now (I can't keep track of them all).

    If I were to offer advise it would be: Just stop trying to mix water
    and hot wax.  Stop holding mutexes across potentially blocking procedure
    calls.  Stop passing mutexes into unrelated bits of code in order for
    them to be released and reacquired somewhere deep in that code.  Just
    doing that will probably solve all of the problems being reported.

						-Matt