From owner-freebsd-hackers@freebsd.org  Thu Apr 13 12:18:20 2017
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 70AF4D380E3
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Thu, 13 Apr 2017 12:18:20 +0000 (UTC)
 (envelope-from torek@elf.torek.net)
Received: from elf.torek.net (mail.torek.net [96.90.199.121])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client CN "elf.torek.net", Issuer "elf.torek.net" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id DA38CF1C
 for <freebsd-hackers@freebsd.org>; Thu, 13 Apr 2017 12:18:19 +0000 (UTC)
 (envelope-from torek@elf.torek.net)
Received: from elf.torek.net (localhost [127.0.0.1])
 by elf.torek.net (8.15.2/8.15.2) with ESMTPS id v3DCIBg4093208
 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO);
 Thu, 13 Apr 2017 05:18:11 -0700 (PDT)
 (envelope-from torek@elf.torek.net)
Received: (from torek@localhost)
 by elf.torek.net (8.15.2/8.15.2/Submit) id v3DCIBJg093207;
 Thu, 13 Apr 2017 05:18:11 -0700 (PDT) (envelope-from torek)
Date: Thu, 13 Apr 2017 05:18:11 -0700 (PDT)
From: Chris Torek <torek@elf.torek.net>
Message-Id: <201704131218.v3DCIBJg093207@elf.torek.net>
To: ablacktshirt@gmail.com, imp@bsdimp.com
Subject: Re: Understanding the FreeBSD locking mechanism
Cc: ed@nuxi.nl, freebsd-hackers@freebsd.org, kostikbel@gmail.com,
 rysto32@gmail.com
In-Reply-To: <06a30d21-acff-efb2-ff58-9aa66793e929@gmail.com>
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.6.2
 (elf.torek.net [127.0.0.1]); Thu, 13 Apr 2017 05:18:11 -0700 (PDT)
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 13 Apr 2017 12:18:20 -0000

>I discover that in the current implementation in FreeBSD, spinlock
>does not disable interrupt entirely:
[extra-snipped here]
>   610                 /* Give interrupts a chance while we spin. */
>   611                 spinlock_exit();
>   612                 while (m->mtx_lock != MTX_UNOWNED) {
[more snip]

>This is `_mtx_lock_spin_cookie(...)` in kern/kern_mutex.c, which
>implements the core logic of spinning. However, as you can see, while
>spinning, it would enable interrupt "occasionally" and disable it
>again... What is the rationale for that?

This code snippet is slightly misleading.  The full code path runs
from mtx_lock_spin() through __mtx_lock_spin(), which first
invokes spinlock_enter() and then, in the *contested* case (only),
calls _mtx_lock_spin_cookie().

spinlock_enter() is:

	td = curthread;
	if (td->td_md.md_spinlock_count == 0) {
		flags = intr_disable();
		td->td_md.md_spinlock_count = 1;
		td->td_md.md_saved_flags = flags;
	} else
		td->td_md.md_spinlock_count++;
	critical_enter();

so it actualy disables interrupts *only* on the transition from
td->td_md.md_spinlock_count = 0 to td->td_md.md_spinlock_count = 1,
i.e., the first time we take a spin lock in this thread, whether
this is a borrowed thread or not.  It's possible that interrupts
are actually disabled at this point.  If so, td->td_md.md_saved_flags
has interrupts disabled as well.  This is all just an optimization
to use a thread-local variable so as to avoid touching hardware.
The details vary widely, but typically, touching the actual hardware
controls requires flushing the CPU's instruction pipeline.

If the compare-and-swap fails, we enter _mtx_lock_spin_cookie()
and loop waiting to see if we can obtain the spin lock in time.
In that case, we don't actually *hold* this particular spin lock
itself yet, so we can call spinlock_exit() to undo the effect
of the outermost spinlock_enter() (in __mtx_lock_spin).  That
decrements the counter.  *If* it goes to zero, that also calls
intr_restore(td->td_md.md_saved_flags).

Hence, if we have failed to obtain our first spin lock, we restore
the interrupt setting to whatever we saved.  If interrupts were
already locked out (as in a filter type interrupt handler) this is
a potentially-somewhat-expensive no-op.  If interrupts were
enabled previously, this is a somewhat expensive re-enable of
interrupts -- but that's OK, and maybe good, because we have no
spin locks of our own yet.  That means we can take hardware
interrupts now, and let them borrow our current thread if they are
that kind of interrupt, or schedule another thread to run if
appropriate.  That might even preempt us, since we do not yet hold
any spin locks.  (But it won't preempt us if we have done a
critical_enter() before this point.)

(In fact, the spinlock exit/enter calls that you see inside
_mtx_lock_spin_cookie() wrap a loop that does not use compare-and-
swap operations at all, but rather ordinary memory reads.  These
are cheaper than CAS operations on a lot of CPUs, but they may
produce wrong answers when two CPUs are racing to write the same
location; only a CAS produces a guaranteed answer, which might
still be "you lost the race".  The inner loop you are looking at
occurs after losing a CAS race.  Once we think we might *win* a
future CAS race, _mtx_lock_spin_cookie() calls spinlock_enter()
again and tries the actual CAS operation, _mtx_obtain_lock_fetch(),
with interrupts disabled.  Note also the calls to cpu_spinwait()
-- the Linux equivalent macro is cpu_relax() -- which translates
to a "pause" instruction on amd64.)

Chris