From owner-freebsd-hackers@FreeBSD.ORG Fri Apr 27 17:49:42 2007 Return-Path: X-Original-To: freebsd-hackers@freebsd.org Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 1668916A46D for ; Fri, 27 Apr 2007 17:49:42 +0000 (UTC) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by mx1.freebsd.org (Postfix) with ESMTP id DCE4613C45A for ; Fri, 27 Apr 2007 17:49:41 +0000 (UTC) (envelope-from dillon@apollo.backplane.com) Received: from apollo.backplane.com (localhost [127.0.0.1]) by apollo.backplane.com (8.13.8/8.13.7) with ESMTP id l3RHdHrS009989; Fri, 27 Apr 2007 10:39:17 -0700 (PDT) Received: (from dillon@localhost) by apollo.backplane.com (8.13.8/8.13.4/Submit) id l3RHdG7a009988; Fri, 27 Apr 2007 10:39:16 -0700 (PDT) Date: Fri, 27 Apr 2007 10:39:16 -0700 (PDT) From: Matthew Dillon Message-Id: <200704271739.l3RHdG7a009988@apollo.backplane.com> To: Hans Petter Selasky References: <200704262136.33196.hselasky@c2i.net> <46311708.5030002@elischer.org> <200704270753.05438.hselasky@c2i.net> Cc: freebsd-hackers@freebsd.org, Julian Elischer Subject: Re: msleep() on recursivly locked mutexes X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 Apr 2007 17:49:42 -0000 The real culprit here is passing held mutexes to unrelated procedures in the first place because those procedures might have to block, in order so those procedures can release and reacquire the mutex. That's just bad coding in my view. The unrelated procedure has no clue as to what the mutex is or why it is being held and really has no business messing with it. What I did was implement spinlocks with VERY restricted capabilities, far more restricted then the capabilities of your mutexes. Our spinlocks are meant only to be used to lock up tiny pieces of code (like for ref counting or structural or flag-changing operations). Plus the kernel automatically acts as if it were in a critical section if it takes an interrupt while the current thread is holding a spinlock. That way mainline code can just use a spinlock to deal with small bits of interlocked information without it costing much in the way of overhead. I made the decision that ANYTHING more complex then that would have to use a real lock, like a lockmgr lock or a token, depending on the characteristics desired. To make it even more desireable I also stripped down the lockmgr() lock implementation, removing numerous bits that were inherited from very old code methodologies that have no business being in a modern operating system, like LK_DRAIN. And I removed the passing of an interlocking spinlock to the lockmgr code, because that methodology was being massively abused in existing code (and I do mean massively). I'm not quite sure what the best way to go is for FreeBSD, because you guys have made your mutexes just as or even more sophisticated then your normal locks in many respects, and you have like 50 different types of locks now (I can't keep track of them all). If I were to offer advise it would be: Just stop trying to mix water and hot wax. Stop holding mutexes across potentially blocking procedure calls. Stop passing mutexes into unrelated bits of code in order for them to be released and reacquired somewhere deep in that code. Just doing that will probably solve all of the problems being reported. -Matt