From owner-freebsd-hackers@freebsd.org  Wed Apr 12 07:55:37 2017
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 73D27D3A1C8
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Wed, 12 Apr 2017 07:55:37 +0000 (UTC)
 (envelope-from torek@elf.torek.net)
Received: from elf.torek.net (mail.torek.net [96.90.199.121])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "elf.torek.net", Issuer "elf.torek.net" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id 51020F45
 for <freebsd-hackers@freebsd.org>; Wed, 12 Apr 2017 07:55:36 +0000 (UTC)
 (envelope-from torek@elf.torek.net)
Received: from elf.torek.net (localhost [127.0.0.1])
 by elf.torek.net (8.15.2/8.15.2) with ESMTPS id v3C7tYdL016700
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Wed, 12 Apr 2017 00:55:34 -0700 (PDT)
 (envelope-from torek@elf.torek.net)
Received: (from torek@localhost)
 by elf.torek.net (8.15.2/8.15.2/Submit) id v3C7tYUH016699;
 Wed, 12 Apr 2017 00:55:34 -0700 (PDT) (envelope-from torek)
Date: Wed, 12 Apr 2017 00:55:34 -0700 (PDT)
From: Chris Torek <torek@elf.torek.net>
Message-Id: <201704120755.v3C7tYUH016699@elf.torek.net>
To: ablacktshirt@gmail.com, imp@bsdimp.com
Subject: Re: Understanding the FreeBSD locking mechanism
Cc: ed@nuxi.nl, freebsd-hackers@freebsd.org, rysto32@gmail.com
In-Reply-To: <99e3673e-d490-faef-359d-c6ec8a36ee0c@gmail.com>
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.6.2
 (elf.torek.net [127.0.0.1]); Wed, 12 Apr 2017 00:55:34 -0700 (PDT)
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 12 Apr 2017 07:55:37 -0000

>clear. How can I distinguish these two conditions? I mean, whether I
>am using my own state/stack or borrowing others' state.

You choose it when you establish your interrupt handler.  If you
say you are a filter interrupt, then you *are* one, and the rest
of your code must be written as one.  Unless you know what you
are doing, don't do this, and then you *aren't* one and the rest
of your code can be written using the much more relaxed model.

>What do you mean by "must take MTX_SPIN locks 'inside' any outer
>MTX_DEF locks?

This means that any code path that is going to hold a spin-type
lock must obtain it while already holding any applicable non-spin
locks.  For instance, if we look at <sys/proc.h> we find these:

	#define	PROC_STATLOCK(p)	mtx_lock_spin(&(p)->p_statmtx)
	#define	PROC_ITIMLOCK(p)	mtx_lock_spin(&(p)->p_itimmtx)
	#define	PROC_PROFLOCK(p)	mtx_lock_spin(&(p)->p_profmtx)

Let's find a bit of code that uses one, such as in kern_time.c:

https://github.com/freebsd/freebsd/blob/master/sys/kern/kern_time.c#L338

(kern_clock_gettime()).  This code reads:

	case CLOCK_PROF:
		PROC_LOCK(p);
		PROC_STATLOCK(p);
		calcru(p, &user, &sys);
		PROC_STATUNLOCK(p);
		PROC_UNLOCK(p);
		timevaladd(&user, &sys);
		TIMEVAL_TO_TIMESPEC(&user, ats);
		break;

Note that the call to PROC_LOCK comes first, then the call to
PROC_STATLOCK.  This is because PROC_LOCK

https://github.com/freebsd/freebsd/blob/master/sys/sys/proc.h#L825

is defined as:

	#define	PROC_LOCK(p)	mtx_lock(&(p)->p_mtx)

If you obtain the locks in the other order -- i.e., if you grab
the PROC_STATLOCK first, then try to lock PROC_LOCK -- you are
trying to take a spin-type mutex while holding a default mutex,
and this is not allowed (can cause deadlock).  But taking the
PROC_LOCK first (which may block), then taking the PROC_STATLOCK
(a spin lock) "inside" the outer PROC_LOCK default mutex, is OK.

(This is one of my mild objections to macros like PROC_LOCK and
PROC_STATLOCK: they hide whether the mutex in question is a spin
lock.)

Incidentally, any time you take *any* lock while holding any
other lock (e.g., lock A, then lock B while holding A), you have
created a "lock order" in which A predeces B.  If some other
code path locks B first, then while holding B, attempts to lock
A, you get a deadlock if both code paths are running at the same
time.  The WITNESS code dynamically discovers these various orders
and warns you at run time if you have a "lock order reversal"
(a case where one code path does A-then-B while another does
B-then-A).

(This is, in a sense, the same problem as discovering whether
there is a loop in a directed graph, or whether this directed
graph is acyclic.  If you can force the graph to take the shape of
a tree, rather than the more general graph, there will never be
any loops in it, and you will never have lock order reversals.
And of course if you have only *one* lock for some data, there is
nothing to be reversed.  Not all lock order reversals are
guaranteed to lead to deadlock, but sorting out which ones are
really OK, and which are not, is ... challenging.)

Chris