From owner-freebsd-hackers@freebsd.org  Thu Apr 13 13:46:28 2017
Return-Path: <owner-freebsd-hackers@freebsd.org>
Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id B8E57D3BC97
 for <freebsd-hackers@mailman.ysv.freebsd.org>;
 Thu, 13 Apr 2017 13:46:28 +0000 (UTC)
 (envelope-from ablacktshirt@gmail.com)
Received: from mail-pf0-x242.google.com (mail-pf0-x242.google.com
 [IPv6:2607:f8b0:400e:c00::242])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 898F9151
 for <freebsd-hackers@freebsd.org>; Thu, 13 Apr 2017 13:46:28 +0000 (UTC)
 (envelope-from ablacktshirt@gmail.com)
Received: by mail-pf0-x242.google.com with SMTP id o126so10966280pfb.1
 for <freebsd-hackers@freebsd.org>; Thu, 13 Apr 2017 06:46:28 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=subject:to:references:cc:from:message-id:date:user-agent
 :mime-version:in-reply-to:content-transfer-encoding;
 bh=zAOW7bioTJCjAtfqpCe+yL3nIsRs0wAHd4JyPxzGVCc=;
 b=SmEWB0yDWfm8bSmLNHEwOk2vhhL1uTNREjQ3cidPWLOuCy9+x9dq6mok9m6/aJKr+Z
 x/AKmDQatPScz/E8abVAu+GSSp20+ntTAC5PAv/QJvivSq7ZuBIXW9Lz/NIDLipkMvXZ
 okuf7/SsVMR3DWGwUVyRCWbNk6a7gx1LJZy8lLS1qSSgpdVpRDOGSCj3SBnSJonlBUWP
 2o8jY0Pg2G6Fzd6bm8s33np+beU03qzHoJkdbFPXBeR0D7Mx2AoFkayOoa/xndhbvpDW
 mcA94T92PhAaf/pUQ2dNm3Oq049KAxO3qbOJ+HJ79Tc06/iCvhUoN7iQzgxo1k5ufy18
 MAkQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:subject:to:references:cc:from:message-id:date
 :user-agent:mime-version:in-reply-to:content-transfer-encoding;
 bh=zAOW7bioTJCjAtfqpCe+yL3nIsRs0wAHd4JyPxzGVCc=;
 b=apiRI+zU/ZS6uDXKFDTWN9+e/l93gnkBNJ2wodRBzOdOjiTZtoI4X4aPuSBWecnO6G
 P6hhZDcPMhB/O+NZJf8knnJlyqMiqDk2boFomTQWYWOxqmOZ+kDCxfm/mxROteHo1Oqx
 45xfBFdFJSxNhnzvjL1vmyXn4hyrJzx4V5J1SuS2FFKRpbuSkOYMDd8Wx7rRRcEx+qT4
 Y16RAF4CsdWLNJTwN4SB0az3ZnNUNcta2zaKyh71iP8Cg7UVCFW4c6lC/bJKJQJFCCmp
 lvff5quHt6cJujz+RQhfLB5Vkj+6OojsfS7OhrehS1+6VegvzrO7ajeG6u09TR6NvTYG
 4G1w==
X-Gm-Message-State: AN3rC/4y0cDgk2SfY0EUd5Uv9aVj9FAxRjuvfNfh81mYvZDt03dbewiB
 m1VEpVf5cqMZRA==
X-Received: by 10.99.65.4 with SMTP id o4mr3478761pga.90.1492091187944;
 Thu, 13 Apr 2017 06:46:27 -0700 (PDT)
Received: from [192.168.2.211] ([116.56.129.146])
 by smtp.gmail.com with ESMTPSA id 74sm42776530pfn.102.2017.04.13.06.46.24
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Thu, 13 Apr 2017 06:46:26 -0700 (PDT)
Subject: Re: Understanding the FreeBSD locking mechanism
To: Chris Torek <torek@elf.torek.net>, imp@bsdimp.com
References: <201704131218.v3DCIBJg093207@elf.torek.net>
Cc: ed@nuxi.nl, freebsd-hackers@freebsd.org, kostikbel@gmail.com,
 rysto32@gmail.com
From: Yubin Ruan <ablacktshirt@gmail.com>
Message-ID: <a51d29c2-4cab-0dfc-6fdc-81d7b2188d61@gmail.com>
Date: Thu, 13 Apr 2017 21:46:26 +0800
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
 Thunderbird/45.8.0
MIME-Version: 1.0
In-Reply-To: <201704131218.v3DCIBJg093207@elf.torek.net>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 13 Apr 2017 13:46:28 -0000

On 2017年04月13日 20:18, Chris Torek wrote:
>> I discover that in the current implementation in FreeBSD, spinlock
>> does not disable interrupt entirely:
> [extra-snipped here]
>>   610                 /* Give interrupts a chance while we spin. */
>>   611                 spinlock_exit();
>>   612                 while (m->mtx_lock != MTX_UNOWNED) {
> [more snip]
>
>> This is `_mtx_lock_spin_cookie(...)` in kern/kern_mutex.c, which
>> implements the core logic of spinning. However, as you can see, while
>> spinning, it would enable interrupt "occasionally" and disable it
>> again... What is the rationale for that?
>
> This code snippet is slightly misleading.  The full code path runs
> from mtx_lock_spin() through __mtx_lock_spin(), which first
> invokes spinlock_enter() and then, in the *contested* case (only),
> calls _mtx_lock_spin_cookie().
>
> spinlock_enter() is:
>
> 	td = curthread;
> 	if (td->td_md.md_spinlock_count == 0) {
> 		flags = intr_disable();
> 		td->td_md.md_spinlock_count = 1;
> 		td->td_md.md_saved_flags = flags;
> 	} else
> 		td->td_md.md_spinlock_count++;
> 	critical_enter();
>
> so it actualy disables interrupts *only* on the transition from
> td->td_md.md_spinlock_count = 0 to td->td_md.md_spinlock_count = 1,
> i.e., the first time we take a spin lock in this thread, whether
> this is a borrowed thread or not.  It's possible that interrupts
> are actually disabled at this point.  If so, td->td_md.md_saved_flags
> has interrupts disabled as well.  This is all just an optimization
> to use a thread-local variable so as to avoid touching hardware.
> The details vary widely, but typically, touching the actual hardware
> controls requires flushing the CPU's instruction pipeline.
>
> If the compare-and-swap fails, we enter _mtx_lock_spin_cookie()
> and loop waiting to see if we can obtain the spin lock in time.
> In that case, we don't actually *hold* this particular spin lock
> itself yet, so we can call spinlock_exit() to undo the effect
> of the outermost spinlock_enter() (in __mtx_lock_spin).  That
> decrements the counter.  *If* it goes to zero, that also calls
> intr_restore(td->td_md.md_saved_flags).
>
> Hence, if we have failed to obtain our first spin lock, we restore
> the interrupt setting to whatever we saved.  If interrupts were
> already locked out (as in a filter type interrupt handler) this is
> a potentially-somewhat-expensive no-op.  If interrupts were
> enabled previously, this is a somewhat expensive re-enable of
> interrupts -- but that's OK, and maybe good, because we have no
> spin locks of our own yet.  That means we can take hardware
> interrupts now, and let them borrow our current thread if they are
> that kind of interrupt, or schedule another thread to run if
> appropriate.  That might even preempt us, since we do not yet hold
> any spin locks.  (But it won't preempt us if we have done a
> critical_enter() before this point.)

Good explanation. I just missed that "local" interrupt point.

> (In fact, the spinlock exit/enter calls that you see inside
> _mtx_lock_spin_cookie() wrap a loop that does not use compare-and-
> swap operations at all, but rather ordinary memory reads.  These
> are cheaper than CAS operations on a lot of CPUs, but they may
> produce wrong answers when two CPUs are racing to write the same

why would that produce wrong result? I think what the inner loop wants
to do is to perform some no-op for a while before it tries again to
acquire the spinlock. So there is no race here.

> location; only a CAS produces a guaranteed answer, which might
> still be "you lost the race".  The inner loop you are looking at
> occurs after losing a CAS race.  Once we think we might *win* a
> future CAS race, _mtx_lock_spin_cookie() calls spinlock_enter()
> again and tries the actual CAS operation, _mtx_obtain_lock_fetch(),
> with interrupts disabled.  Note also the calls to cpu_spinwait()
> -- the Linux equivalent macro is cpu_relax() -- which translates
> to a "pause" instruction on amd64.)

Regards,
Yubin Ruan