From owner-svn-src-stable@freebsd.org  Sun Dec 31 05:06:37 2017
Return-Path: <owner-svn-src-stable@freebsd.org>
Delivered-To: svn-src-stable@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5F9B2EA0ACB;
 Sun, 31 Dec 2017 05:06:37 +0000 (UTC) (envelope-from mjg@FreeBSD.org)
Received: from repo.freebsd.org (repo.freebsd.org
 [IPv6:2610:1c1:1:6068::e6a:0])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id F11232438;
 Sun, 31 Dec 2017 05:06:36 +0000 (UTC) (envelope-from mjg@FreeBSD.org)
Received: from repo.freebsd.org ([127.0.1.37])
 by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id vBV56aqP037698;
 Sun, 31 Dec 2017 05:06:36 GMT (envelope-from mjg@FreeBSD.org)
Received: (from mjg@localhost)
 by repo.freebsd.org (8.15.2/8.15.2/Submit) id vBV56aBN037695;
 Sun, 31 Dec 2017 05:06:36 GMT (envelope-from mjg@FreeBSD.org)
Message-Id: <201712310506.vBV56aBN037695@repo.freebsd.org>
X-Authentication-Warning: repo.freebsd.org: mjg set sender to mjg@FreeBSD.org
 using -f
From: Mateusz Guzik <mjg@FreeBSD.org>
Date: Sun, 31 Dec 2017 05:06:36 +0000 (UTC)
To: src-committers@freebsd.org, svn-src-all@freebsd.org,
 svn-src-stable@freebsd.org, svn-src-stable-11@freebsd.org
Subject: svn commit: r327413 - in stable/11/sys: kern sys
X-SVN-Group: stable-11
X-SVN-Commit-Author: mjg
X-SVN-Commit-Paths: in stable/11/sys: kern sys
X-SVN-Commit-Revision: 327413
X-SVN-Commit-Repository: base
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-BeenThere: svn-src-stable@freebsd.org
X-Mailman-Version: 2.1.25
Precedence: list
List-Id: SVN commit messages for all the -stable branches of the src tree
 <svn-src-stable.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/svn-src-stable>, 
 <mailto:svn-src-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-stable/>
List-Post: <mailto:svn-src-stable@freebsd.org>
List-Help: <mailto:svn-src-stable-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/svn-src-stable>,
 <mailto:svn-src-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 31 Dec 2017 05:06:37 -0000

Author: mjg
Date: Sun Dec 31 05:06:35 2017
New Revision: 327413
URL: https://svnweb.freebsd.org/changeset/base/327413

Log:
  MFC r320561,r323236,r324041,r324314,r324609,r324613,r324778,r324780,r324787,
      r324803,r324836,r325469,r325706,r325917,r325918,r325919,r325920,r325921,
      r325922,r325925,r325963,r326106,r326107,r326110,r326111,r326112,r326194,
      r326195,r326196,r326197,r326198,r326199,r326200,r326237:
  
      rwlock: perform the typically false td_rw_rlocks check later
  
      Check if the lock is available first instead.
  
  =============
  
      Sprinkle __read_frequently on few obvious places.
  
      Note that some of annotated variables should probably change their types
      to something smaller, preferably bit-sized.
  
  =============
  
      mtx: drop the tid argument from _mtx_lock_sleep
  
      tid must be equal to curthread and the target routine was already reading
      it anyway, which is not a problem. Not passing it as a parameter allows for
      a little bit shorter code in callers.
  
  =============
  
      locks: partially tidy up waiting on readers
  
      spin first instant of instantly re-readoing and don't re-read after
      spinning is finished - the state is already known.
  
      Note the code is subject to significant changes later.
  
  =============
  
      locks: take the number of readers into account when waiting
  
      Previous code would always spin once before checking the lock. But a lock
      with e.g. 6 readers is not going to become free in the duration of once spin
      even if they start draining immediately.
  
      Conservatively perform one for each reader.
  
      Note that the total number of allowed spins is still extremely small and is
      subject to change later.
  
  =============
  
      mtx: change MTX_UNOWNED from 4 to 0
  
      The value is spread all over the kernel and zeroing a register is
      cheaper/shorter than setting it up to an arbitrary value.
  
      Reduces amd64 GENERIC-NODEBUG .text size by 0.4%.
  
  =============
  
      mtx: fix up owner_mtx after r324609
  
      Now that MTX_UNOWNED is 0 the test was alwayas false.
  
  =============
  
      mtx: clean up locking spin mutexes
  
      1) shorten the fast path by pushing the lockstat probe to the slow path
      2) test for kernel panic only after it turns out we will have to spin,
      in particular test only after we know we are not recursing
  
  =============
  
      mtx: stop testing SCHEDULER_STOPPED in kabi funcs for spin mutexes
  
      There is nothing panic-breaking to do in the unlock case and the lock
      case will fallback to the slow path doing the check already.
  
  =============
  
      rwlock: reduce lockstat branches in the slowpath
  
  =============
  
      mtx: fix up UP build after r324778
  
  =============
  
      mtx: implement thread lock fastpath
  
  =============
  
      rwlock: fix up compilation without KDTRACE_HOOKS after r324787
  
  =============
  
      rwlock: use fcmpset for setting RW_LOCK_WRITE_SPINNER
  
  =============
  
      sx: avoid branches if in the slow path if lockstat is disabled
  
  =============
  
      rwlock: avoid branches in the slow path if lockstat is disabled
  
  =============
  
      locks: pull up PMC_SOFT_CALLs out of slow path loops
  
  =============
  
      mtx: unlock before traversing threads to wake up
  
      This shortens the lock hold time while not affecting corretness.
      All the woken up threads end up competing can lose the race against
      a completely unrelated thread getting the lock anyway.
  
  =============
  
      rwlock: unlock before traversing threads to wake up
  
      While here perform a minor cleanup of the unlock path.
  
  =============
  
      sx: perform a minor cleanup of the unlock slowpath
  
      No functional changes.
  
  =============
  
      mtx: add missing parts of the diff in r325920
  
      Fixes build breakage.
  
  =============
  
      locks: fix compilation issues without SMP or KDTRACE_HOOKS
  
  =============
  
      locks: remove the file + line argument from internal primitives when not used
  
      The pair is of use only in debug or LOCKPROF kernels, but was passed (zeroed)
      for many locks even in production kernels.
  
      While here whack the tid argument from wlock hard and xlock hard.
  
      There is no kbi change of any sort - "external" primitives still accept the
      pair.
  
  =============
  
      locks: pass the found lock value to unlock slow path
  
      This avoids an explicit read later.
  
      While here whack the cheaply obtainable 'tid' argument.
  
  =============
  
      rwlock: don't check for curthread's read lock count in the fast path
  
  =============
  
      rwlock: unbreak WITNESS builds after r326110
  
  =============
  
      sx: unbreak debug after r326107
  
      An assertion was modified to use the found value, but it was not updated to
      handle a race where blocked threads appear after the entrance to the func.
  
      Move the assertion down to the area protected with sleepq lock where the
      lock is read anyway. This does not affect coverage of the assertion and
      is consistent with what rw locks are doing.
  
  =============
  
      rwlock: stop re-reading the owner when going to sleep
  
  =============
  
      locks: retry turnstile/sleepq loops on failed cmpset
  
      In order to go to sleep threads set waiter flags, but that can spuriously
      fail e.g. when a new reader arrives. Instead of unlocking everything and
      looping back, re-evaluate the new state while still holding the lock necessary
      to go to sleep.
  
  =============
  
      sx: change sunlock to wake waiters up if it locked sleepq
  
      sleepq is only locked if the curhtread is the last reader. By the time
      the lock gets acquired new ones could have arrived. The previous code
      would unlock and loop back. This results spurious relocking of sleepq.
  
      This is a step towards xadd-based unlock routine.
  
  =============
  
      rwlock: add __rw_try_{r,w}lock_int
  
  =============
  
      rwlock: fix up compilation of the previous change
  
      commmitted wrong version of the patch
  
  =============
  
      Convert in-kernel thread_lock_flags calls to thread_lock when debug is disabled
  
      The flags argument is not used in this case.
  
  =============
  
      Add the missing lockstat check for thread lock.
  
  =============
  
      rw: fix runlock_hard when new readers show up
  
      When waiters/writer spinner flags are set no new readers can show up unless
      they already have a different rw rock read locked. The change in r326195 failed
      to take that into account - in presence of new readers it would spin until
      they all drain, which would be lead to trouble if e.g. they go off cpu and
      can get scheduled because of this thread.

Modified:
  stable/11/sys/kern/kern_mutex.c
  stable/11/sys/kern/kern_rwlock.c
  stable/11/sys/kern/kern_sx.c
  stable/11/sys/sys/lock.h
  stable/11/sys/sys/mutex.h
  stable/11/sys/sys/rwlock.h
  stable/11/sys/sys/sx.h
Directory Properties:
  stable/11/   (props changed)

Modified: stable/11/sys/kern/kern_mutex.c
==============================================================================
--- stable/11/sys/kern/kern_mutex.c	Sun Dec 31 04:09:40 2017	(r327412)
+++ stable/11/sys/kern/kern_mutex.c	Sun Dec 31 05:06:35 2017	(r327413)
@@ -217,7 +217,7 @@ owner_mtx(const struct lock_object *lock, struct threa
 	m = (const struct mtx *)lock;
 	x = m->mtx_lock;
 	*owner = (struct thread *)(x & ~MTX_FLAGMASK);
-	return (x != MTX_UNOWNED);
+	return (*owner != NULL);
 }
 #endif
 
@@ -248,7 +248,7 @@ __mtx_lock_flags(volatile uintptr_t *c, int opts, cons
 	tid = (uintptr_t)curthread;
 	v = MTX_UNOWNED;
 	if (!_mtx_obtain_lock_fetch(m, &v, tid))
-		_mtx_lock_sleep(m, v, tid, opts, file, line);
+		_mtx_lock_sleep(m, v, opts, file, line);
 	else
 		LOCKSTAT_PROFILE_OBTAIN_LOCK_SUCCESS(adaptive__acquire,
 		    m, 0, 0, file, line);
@@ -277,7 +277,7 @@ __mtx_unlock_flags(volatile uintptr_t *c, int opts, co
 	mtx_assert(m, MA_OWNED);
 
 #ifdef LOCK_PROFILING
-	__mtx_unlock_sleep(c, opts, file, line);
+	__mtx_unlock_sleep(c, (uintptr_t)curthread, opts, file, line);
 #else
 	__mtx_unlock(m, curthread, opts, file, line);
 #endif
@@ -289,10 +289,10 @@ __mtx_lock_spin_flags(volatile uintptr_t *c, int opts,
     int line)
 {
 	struct mtx *m;
+#ifdef SMP
+	uintptr_t tid, v;
+#endif
 
-	if (SCHEDULER_STOPPED())
-		return;
-
 	m = mtxlock2mtx(c);
 
 	KASSERT(m->mtx_lock != MTX_DESTROYED,
@@ -308,7 +308,18 @@ __mtx_lock_spin_flags(volatile uintptr_t *c, int opts,
 	opts &= ~MTX_RECURSE;
 	WITNESS_CHECKORDER(&m->lock_object, opts | LOP_NEWORDER | LOP_EXCLUSIVE,
 	    file, line, NULL);
+#ifdef SMP
+	spinlock_enter();
+	tid = (uintptr_t)curthread;
+	v = MTX_UNOWNED;
+	if (!_mtx_obtain_lock_fetch(m, &v, tid))
+		_mtx_lock_spin(m, v, opts, file, line);
+	else
+		LOCKSTAT_PROFILE_OBTAIN_LOCK_SUCCESS(spin__acquire,
+		    m, 0, 0, file, line);
+#else
 	__mtx_lock_spin(m, curthread, opts, file, line);
+#endif
 	LOCK_LOG_LOCK("LOCK", &m->lock_object, opts, m->mtx_recurse, file,
 	    line);
 	WITNESS_LOCK(&m->lock_object, opts | LOP_EXCLUSIVE, file, line);
@@ -348,9 +359,6 @@ __mtx_unlock_spin_flags(volatile uintptr_t *c, int opt
 {
 	struct mtx *m;
 
-	if (SCHEDULER_STOPPED())
-		return;
-
 	m = mtxlock2mtx(c);
 
 	KASSERT(m->mtx_lock != MTX_DESTROYED,
@@ -372,9 +380,8 @@ __mtx_unlock_spin_flags(volatile uintptr_t *c, int opt
  * is already owned, it will recursively acquire the lock.
  */
 int
-_mtx_trylock_flags_(volatile uintptr_t *c, int opts, const char *file, int line)
+_mtx_trylock_flags_int(struct mtx *m, int opts LOCK_FILE_LINE_ARG_DEF)
 {
-	struct mtx *m;
 	struct thread *td;
 	uintptr_t tid, v;
 #ifdef LOCK_PROFILING
@@ -389,8 +396,6 @@ _mtx_trylock_flags_(volatile uintptr_t *c, int opts, c
 	if (SCHEDULER_STOPPED_TD(td))
 		return (1);
 
-	m = mtxlock2mtx(c);
-
 	KASSERT(kdb_active != 0 || !TD_IS_IDLETHREAD(td),
 	    ("mtx_trylock() by idle thread %p on sleep mutex %s @ %s:%d",
 	    curthread, m->lock_object.lo_name, file, line));
@@ -435,6 +440,15 @@ _mtx_trylock_flags_(volatile uintptr_t *c, int opts, c
 	return (rval);
 }
 
+int
+_mtx_trylock_flags_(volatile uintptr_t *c, int opts, const char *file, int line)
+{
+	struct mtx *m;
+
+	m = mtxlock2mtx(c);
+	return (_mtx_trylock_flags_int(m, opts LOCK_FILE_LINE_ARG));
+}
+
 /*
  * __mtx_lock_sleep: the tougher part of acquiring an MTX_DEF lock.
  *
@@ -443,18 +457,18 @@ _mtx_trylock_flags_(volatile uintptr_t *c, int opts, c
  */
 #if LOCK_DEBUG > 0
 void
-__mtx_lock_sleep(volatile uintptr_t *c, uintptr_t v, uintptr_t tid, int opts,
-    const char *file, int line)
+__mtx_lock_sleep(volatile uintptr_t *c, uintptr_t v, int opts, const char *file,
+    int line)
 #else
 void
-__mtx_lock_sleep(volatile uintptr_t *c, uintptr_t v, uintptr_t tid)
+__mtx_lock_sleep(volatile uintptr_t *c, uintptr_t v)
 #endif
 {
+	struct thread *td;
 	struct mtx *m;
 	struct turnstile *ts;
-#ifdef ADAPTIVE_MUTEXES
-	volatile struct thread *owner;
-#endif
+	uintptr_t tid;
+	struct thread *owner;
 #ifdef KTR
 	int cont_logged = 0;
 #endif
@@ -473,8 +487,9 @@ __mtx_lock_sleep(volatile uintptr_t *c, uintptr_t v, u
 #if defined(KDTRACE_HOOKS) || defined(LOCK_PROFILING)
 	int doing_lockprof;
 #endif
-
-	if (SCHEDULER_STOPPED())
+	td = curthread;
+	tid = (uintptr_t)td;
+	if (SCHEDULER_STOPPED_TD(td))
 		return;
 
 #if defined(ADAPTIVE_MUTEXES)
@@ -486,7 +501,7 @@ __mtx_lock_sleep(volatile uintptr_t *c, uintptr_t v, u
 	if (__predict_false(v == MTX_UNOWNED))
 		v = MTX_READ_VALUE(m);
 
-	if (__predict_false(lv_mtx_owner(v) == (struct thread *)tid)) {
+	if (__predict_false(lv_mtx_owner(v) == td)) {
 		KASSERT((m->lock_object.lo_flags & LO_RECURSABLE) != 0 ||
 		    (opts & MTX_RECURSE) != 0,
 	    ("_mtx_lock_sleep: recursed on non-recursive mutex %s @ %s:%d\n",
@@ -618,7 +633,11 @@ __mtx_lock_sleep(volatile uintptr_t *c, uintptr_t v, u
 #ifdef KDTRACE_HOOKS
 		sleep_time -= lockstat_nsecs(&m->lock_object);
 #endif
-		turnstile_wait(ts, mtx_owner(m), TS_EXCLUSIVE_QUEUE);
+#ifndef ADAPTIVE_MUTEXES
+		owner = mtx_owner(m);
+#endif
+		MPASS(owner == mtx_owner(m));
+		turnstile_wait(ts, owner, TS_EXCLUSIVE_QUEUE);
 #ifdef KDTRACE_HOOKS
 		sleep_time += lockstat_nsecs(&m->lock_object);
 		sleep_cnt++;
@@ -679,12 +698,18 @@ _mtx_lock_spin_failed(struct mtx *m)
  * This is only called if we need to actually spin for the lock. Recursion
  * is handled inline.
  */
+#if LOCK_DEBUG > 0
 void
-_mtx_lock_spin_cookie(volatile uintptr_t *c, uintptr_t v, uintptr_t tid,
-    int opts, const char *file, int line)
+_mtx_lock_spin_cookie(volatile uintptr_t *c, uintptr_t v, int opts,
+    const char *file, int line)
+#else
+void
+_mtx_lock_spin_cookie(volatile uintptr_t *c, uintptr_t v)
+#endif
 {
 	struct mtx *m;
 	struct lock_delay_arg lda;
+	uintptr_t tid;
 #ifdef LOCK_PROFILING
 	int contested = 0;
 	uint64_t waittime = 0;
@@ -696,10 +721,7 @@ _mtx_lock_spin_cookie(volatile uintptr_t *c, uintptr_t
 	int doing_lockprof;
 #endif
 
-	if (SCHEDULER_STOPPED())
-		return;
-
-	lock_delay_arg_init(&lda, &mtx_spin_delay);
+	tid = (uintptr_t)curthread;
 	m = mtxlock2mtx(c);
 
 	if (__predict_false(v == MTX_UNOWNED))
@@ -710,6 +732,11 @@ _mtx_lock_spin_cookie(volatile uintptr_t *c, uintptr_t
 		return;
 	}
 
+	if (SCHEDULER_STOPPED())
+		return;
+
+	lock_delay_arg_init(&lda, &mtx_spin_delay);
+
 	if (LOCK_LOG_TEST(&m->lock_object, opts))
 		CTR1(KTR_LOCK, "_mtx_lock_spin: %p spinning", m);
 	KTR_STATE1(KTR_SCHED, "thread", sched_tdname((struct thread *)tid),
@@ -772,7 +799,74 @@ _mtx_lock_spin_cookie(volatile uintptr_t *c, uintptr_t
 }
 #endif /* SMP */
 
+#ifdef INVARIANTS
+static void
+thread_lock_validate(struct mtx *m, int opts, const char *file, int line)
+{
+
+	KASSERT(m->mtx_lock != MTX_DESTROYED,
+	    ("thread_lock() of destroyed mutex @ %s:%d", file, line));
+	KASSERT(LOCK_CLASS(&m->lock_object) == &lock_class_mtx_spin,
+	    ("thread_lock() of sleep mutex %s @ %s:%d",
+	    m->lock_object.lo_name, file, line));
+	if (mtx_owned(m))
+		KASSERT((m->lock_object.lo_flags & LO_RECURSABLE) != 0,
+		    ("thread_lock: recursed on non-recursive mutex %s @ %s:%d\n",
+		    m->lock_object.lo_name, file, line));
+	WITNESS_CHECKORDER(&m->lock_object,
+	    opts | LOP_NEWORDER | LOP_EXCLUSIVE, file, line, NULL);
+}
+#else
+#define thread_lock_validate(m, opts, file, line) do { } while (0)
+#endif
+
+#ifndef LOCK_PROFILING
+#if LOCK_DEBUG > 0
 void
+_thread_lock(struct thread *td, int opts, const char *file, int line)
+#else
+void
+_thread_lock(struct thread *td)
+#endif
+{
+	struct mtx *m;
+	uintptr_t tid, v;
+
+	tid = (uintptr_t)curthread;
+
+	if (__predict_false(LOCKSTAT_PROFILE_ENABLED(spin__acquire)))
+		goto slowpath_noirq;
+	spinlock_enter();
+	m = td->td_lock;
+	thread_lock_validate(m, 0, file, line);
+	v = MTX_READ_VALUE(m);
+	if (__predict_true(v == MTX_UNOWNED)) {
+		if (__predict_false(!_mtx_obtain_lock(m, tid)))
+			goto slowpath_unlocked;
+	} else if (v == tid) {
+		m->mtx_recurse++;
+	} else
+		goto slowpath_unlocked;
+	if (__predict_true(m == td->td_lock)) {
+		WITNESS_LOCK(&m->lock_object, LOP_EXCLUSIVE, file, line);
+		return;
+	}
+	if (m->mtx_recurse != 0)
+		m->mtx_recurse--;
+	else
+		_mtx_release_lock_quick(m);
+slowpath_unlocked:
+	spinlock_exit();
+slowpath_noirq:
+#if LOCK_DEBUG > 0
+	thread_lock_flags_(td, opts, file, line);
+#else
+	thread_lock_flags_(td, 0, 0, 0);
+#endif
+}
+#endif
+
+void
 thread_lock_flags_(struct thread *td, int opts, const char *file, int line)
 {
 	struct mtx *m;
@@ -815,17 +909,7 @@ retry:
 		v = MTX_UNOWNED;
 		spinlock_enter();
 		m = td->td_lock;
-		KASSERT(m->mtx_lock != MTX_DESTROYED,
-		    ("thread_lock() of destroyed mutex @ %s:%d", file, line));
-		KASSERT(LOCK_CLASS(&m->lock_object) == &lock_class_mtx_spin,
-		    ("thread_lock() of sleep mutex %s @ %s:%d",
-		    m->lock_object.lo_name, file, line));
-		if (mtx_owned(m))
-			KASSERT((m->lock_object.lo_flags & LO_RECURSABLE) != 0,
-	    ("thread_lock: recursed on non-recursive mutex %s @ %s:%d\n",
-			    m->lock_object.lo_name, file, line));
-		WITNESS_CHECKORDER(&m->lock_object,
-		    opts | LOP_NEWORDER | LOP_EXCLUSIVE, file, line, NULL);
+		thread_lock_validate(m, opts, file, line);
 		for (;;) {
 			if (_mtx_obtain_lock_fetch(m, &v, tid))
 				break;
@@ -925,24 +1009,27 @@ thread_lock_set(struct thread *td, struct mtx *new)
  */
 #if LOCK_DEBUG > 0
 void
-__mtx_unlock_sleep(volatile uintptr_t *c, int opts, const char *file, int line)
+__mtx_unlock_sleep(volatile uintptr_t *c, uintptr_t v, int opts,
+    const char *file, int line)
 #else
 void
-__mtx_unlock_sleep(volatile uintptr_t *c)
+__mtx_unlock_sleep(volatile uintptr_t *c, uintptr_t v)
 #endif
 {
 	struct mtx *m;
 	struct turnstile *ts;
-	uintptr_t tid, v;
+	uintptr_t tid;
 
 	if (SCHEDULER_STOPPED())
 		return;
 
 	tid = (uintptr_t)curthread;
 	m = mtxlock2mtx(c);
-	v = MTX_READ_VALUE(m);
 
-	if (v & MTX_RECURSED) {
+	if (__predict_false(v == tid))
+		v = MTX_READ_VALUE(m);
+
+	if (__predict_false(v & MTX_RECURSED)) {
 		if (--(m->mtx_recurse) == 0)
 			atomic_clear_ptr(&m->mtx_lock, MTX_RECURSED);
 		if (LOCK_LOG_TEST(&m->lock_object, opts))
@@ -959,12 +1046,12 @@ __mtx_unlock_sleep(volatile uintptr_t *c)
 	 * can be removed from the hash list if it is empty.
 	 */
 	turnstile_chain_lock(&m->lock_object);
+	_mtx_release_lock_quick(m);
 	ts = turnstile_lookup(&m->lock_object);
+	MPASS(ts != NULL);
 	if (LOCK_LOG_TEST(&m->lock_object, opts))
 		CTR1(KTR_LOCK, "_mtx_unlock_sleep: %p contested", m);
-	MPASS(ts != NULL);
 	turnstile_broadcast(ts, TS_EXCLUSIVE_QUEUE);
-	_mtx_release_lock_quick(m);
 
 	/*
 	 * This turnstile is now no longer associated with the mutex.  We can

Modified: stable/11/sys/kern/kern_rwlock.c
==============================================================================
--- stable/11/sys/kern/kern_rwlock.c	Sun Dec 31 04:09:40 2017	(r327412)
+++ stable/11/sys/kern/kern_rwlock.c	Sun Dec 31 05:06:35 2017	(r327413)
@@ -273,7 +273,7 @@ _rw_wlock_cookie(volatile uintptr_t *c, const char *fi
 	tid = (uintptr_t)curthread;
 	v = RW_UNLOCKED;
 	if (!_rw_write_lock_fetch(rw, &v, tid))
-		_rw_wlock_hard(rw, v, tid, file, line);
+		_rw_wlock_hard(rw, v, file, line);
 	else
 		LOCKSTAT_PROFILE_OBTAIN_RWLOCK_SUCCESS(rw__acquire, rw,
 		    0, 0, file, line, LOCKSTAT_WRITER);
@@ -284,9 +284,8 @@ _rw_wlock_cookie(volatile uintptr_t *c, const char *fi
 }
 
 int
-__rw_try_wlock(volatile uintptr_t *c, const char *file, int line)
+__rw_try_wlock_int(struct rwlock *rw LOCK_FILE_LINE_ARG_DEF)
 {
-	struct rwlock *rw;
 	struct thread *td;
 	uintptr_t tid, v;
 	int rval;
@@ -297,8 +296,6 @@ __rw_try_wlock(volatile uintptr_t *c, const char *file
 	if (SCHEDULER_STOPPED_TD(td))
 		return (1);
 
-	rw = rwlock2rw(c);
-
 	KASSERT(kdb_active != 0 || !TD_IS_IDLETHREAD(td),
 	    ("rw_try_wlock() by idle thread %p on rwlock %s @ %s:%d",
 	    curthread, rw->lock_object.lo_name, file, line));
@@ -334,6 +331,15 @@ __rw_try_wlock(volatile uintptr_t *c, const char *file
 	return (rval);
 }
 
+int
+__rw_try_wlock(volatile uintptr_t *c, const char *file, int line)
+{
+	struct rwlock *rw;
+
+	rw = rwlock2rw(c);
+	return (__rw_try_wlock_int(rw LOCK_FILE_LINE_ARG));
+}
+
 void
 _rw_wunlock_cookie(volatile uintptr_t *c, const char *file, int line)
 {
@@ -364,14 +370,21 @@ _rw_wunlock_cookie(volatile uintptr_t *c, const char *
  * is unlocked and has no writer waiters or spinners.  Failing otherwise
  * prioritizes writers before readers.
  */
-#define	RW_CAN_READ(td, _rw)						\
-    (((td)->td_rw_rlocks && (_rw) & RW_LOCK_READ) || ((_rw) &	\
-    (RW_LOCK_READ | RW_LOCK_WRITE_WAITERS | RW_LOCK_WRITE_SPINNER)) ==	\
-    RW_LOCK_READ)
+static bool __always_inline
+__rw_can_read(struct thread *td, uintptr_t v, bool fp)
+{
 
+	if ((v & (RW_LOCK_READ | RW_LOCK_WRITE_WAITERS | RW_LOCK_WRITE_SPINNER))
+	    == RW_LOCK_READ)
+		return (true);
+	if (!fp && td->td_rw_rlocks && (v & RW_LOCK_READ))
+		return (true);
+	return (false);
+}
+
 static bool __always_inline
-__rw_rlock_try(struct rwlock *rw, struct thread *td, uintptr_t *vp,
-    const char *file, int line)
+__rw_rlock_try(struct rwlock *rw, struct thread *td, uintptr_t *vp, bool fp
+    LOCK_FILE_LINE_ARG_DEF)
 {
 
 	/*
@@ -384,7 +397,7 @@ __rw_rlock_try(struct rwlock *rw, struct thread *td, u
 	 * completely unlocked rwlock since such a lock is encoded
 	 * as a read lock with no waiters.
 	 */
-	while (RW_CAN_READ(td, *vp)) {
+	while (__rw_can_read(td, *vp, fp)) {
 		if (atomic_fcmpset_acq_ptr(&rw->rw_lock, vp,
 			*vp + RW_ONE_READER)) {
 			if (LOCK_LOG_TEST(&rw->lock_object, 0))
@@ -400,13 +413,12 @@ __rw_rlock_try(struct rwlock *rw, struct thread *td, u
 }
 
 static void __noinline
-__rw_rlock_hard(volatile uintptr_t *c, struct thread *td, uintptr_t v,
-    const char *file, int line)
+__rw_rlock_hard(struct rwlock *rw, struct thread *td, uintptr_t v
+    LOCK_FILE_LINE_ARG_DEF)
 {
-	struct rwlock *rw;
 	struct turnstile *ts;
+	struct thread *owner;
 #ifdef ADAPTIVE_RWLOCKS
-	volatile struct thread *owner;
 	int spintries = 0;
 	int i;
 #endif
@@ -418,11 +430,14 @@ __rw_rlock_hard(volatile uintptr_t *c, struct thread *
 	struct lock_delay_arg lda;
 #endif
 #ifdef KDTRACE_HOOKS
-	uintptr_t state;
 	u_int sleep_cnt = 0;
 	int64_t sleep_time = 0;
 	int64_t all_time = 0;
 #endif
+#if defined(KDTRACE_HOOKS) || defined(LOCK_PROFILING)
+	uintptr_t state;
+	int doing_lockprof;
+#endif
 
 	if (SCHEDULER_STOPPED())
 		return;
@@ -432,25 +447,30 @@ __rw_rlock_hard(volatile uintptr_t *c, struct thread *
 #elif defined(KDTRACE_HOOKS)
 	lock_delay_arg_init(&lda, NULL);
 #endif
-	rw = rwlock2rw(c);
 
-#ifdef KDTRACE_HOOKS
-	all_time -= lockstat_nsecs(&rw->lock_object);
+#ifdef HWPMC_HOOKS
+	PMC_SOFT_CALL( , , lock, failed);
 #endif
-#ifdef KDTRACE_HOOKS
+	lock_profile_obtain_lock_failed(&rw->lock_object,
+	    &contested, &waittime);
+
+#ifdef LOCK_PROFILING
+	doing_lockprof = 1;
 	state = v;
+#elif defined(KDTRACE_HOOKS)
+	doing_lockprof = lockstat_enabled;
+	if (__predict_false(doing_lockprof)) {
+		all_time -= lockstat_nsecs(&rw->lock_object);
+		state = v;
+	}
 #endif
+
 	for (;;) {
-		if (__rw_rlock_try(rw, td, &v, file, line))
+		if (__rw_rlock_try(rw, td, &v, false LOCK_FILE_LINE_ARG))
 			break;
 #ifdef KDTRACE_HOOKS
 		lda.spin_cnt++;
 #endif
-#ifdef HWPMC_HOOKS
-		PMC_SOFT_CALL( , , lock, failed);
-#endif
-		lock_profile_obtain_lock_failed(&rw->lock_object,
-		    &contested, &waittime);
 
 #ifdef ADAPTIVE_RWLOCKS
 		/*
@@ -483,12 +503,11 @@ __rw_rlock_hard(volatile uintptr_t *c, struct thread *
 			    "spinning", "lockname:\"%s\"",
 			    rw->lock_object.lo_name);
 			for (i = 0; i < rowner_loops; i++) {
+				cpu_spinwait();
 				v = RW_READ_VALUE(rw);
-				if ((v & RW_LOCK_READ) == 0 || RW_CAN_READ(td, v))
+				if ((v & RW_LOCK_READ) == 0 || __rw_can_read(td, v, false))
 					break;
-				cpu_spinwait();
 			}
-			v = RW_READ_VALUE(rw);
 #ifdef KDTRACE_HOOKS
 			lda.spin_cnt += rowner_loops - i;
 #endif
@@ -512,11 +531,14 @@ __rw_rlock_hard(volatile uintptr_t *c, struct thread *
 		 * recheck its state and restart the loop if needed.
 		 */
 		v = RW_READ_VALUE(rw);
-		if (RW_CAN_READ(td, v)) {
+retry_ts:
+		if (__rw_can_read(td, v, false)) {
 			turnstile_cancel(ts);
 			continue;
 		}
 
+		owner = lv_rw_wowner(v);
+
 #ifdef ADAPTIVE_RWLOCKS
 		/*
 		 * The current lock owner might have started executing
@@ -525,8 +547,7 @@ __rw_rlock_hard(volatile uintptr_t *c, struct thread *
 		 * chain lock.  If so, drop the turnstile lock and try
 		 * again.
 		 */
-		if ((v & RW_LOCK_READ) == 0) {
-			owner = (struct thread *)RW_OWNER(v);
+		if (owner != NULL) {
 			if (TD_IS_RUNNING(owner)) {
 				turnstile_cancel(ts);
 				continue;
@@ -537,7 +558,7 @@ __rw_rlock_hard(volatile uintptr_t *c, struct thread *
 		/*
 		 * The lock is held in write mode or it already has waiters.
 		 */
-		MPASS(!RW_CAN_READ(td, v));
+		MPASS(!__rw_can_read(td, v, false));
 
 		/*
 		 * If the RW_LOCK_READ_WAITERS flag is already set, then
@@ -546,12 +567,9 @@ __rw_rlock_hard(volatile uintptr_t *c, struct thread *
 		 * lock and restart the loop.
 		 */
 		if (!(v & RW_LOCK_READ_WAITERS)) {
-			if (!atomic_cmpset_ptr(&rw->rw_lock, v,
-			    v | RW_LOCK_READ_WAITERS)) {
-				turnstile_cancel(ts);
-				v = RW_READ_VALUE(rw);
-				continue;
-			}
+			if (!atomic_fcmpset_ptr(&rw->rw_lock, &v,
+			    v | RW_LOCK_READ_WAITERS))
+				goto retry_ts;
 			if (LOCK_LOG_TEST(&rw->lock_object, 0))
 				CTR2(KTR_LOCK, "%s: %p set read waiters flag",
 				    __func__, rw);
@@ -567,7 +585,8 @@ __rw_rlock_hard(volatile uintptr_t *c, struct thread *
 #ifdef KDTRACE_HOOKS
 		sleep_time -= lockstat_nsecs(&rw->lock_object);
 #endif
-		turnstile_wait(ts, rw_owner(rw), TS_SHARED_QUEUE);
+		MPASS(owner == rw_owner(rw));
+		turnstile_wait(ts, owner, TS_SHARED_QUEUE);
 #ifdef KDTRACE_HOOKS
 		sleep_time += lockstat_nsecs(&rw->lock_object);
 		sleep_cnt++;
@@ -577,6 +596,10 @@ __rw_rlock_hard(volatile uintptr_t *c, struct thread *
 			    __func__, rw);
 		v = RW_READ_VALUE(rw);
 	}
+#if defined(KDTRACE_HOOKS) || defined(LOCK_PROFILING)
+	if (__predict_true(!doing_lockprof))
+		return;
+#endif
 #ifdef KDTRACE_HOOKS
 	all_time += lockstat_nsecs(&rw->lock_object);
 	if (sleep_time)
@@ -600,14 +623,12 @@ __rw_rlock_hard(volatile uintptr_t *c, struct thread *
 }
 
 void
-__rw_rlock(volatile uintptr_t *c, const char *file, int line)
+__rw_rlock_int(struct rwlock *rw LOCK_FILE_LINE_ARG_DEF)
 {
-	struct rwlock *rw;
 	struct thread *td;
 	uintptr_t v;
 
 	td = curthread;
-	rw = rwlock2rw(c);
 
 	KASSERT(kdb_active != 0 || SCHEDULER_STOPPED_TD(td) ||
 	    !TD_IS_IDLETHREAD(td),
@@ -622,25 +643,31 @@ __rw_rlock(volatile uintptr_t *c, const char *file, in
 
 	v = RW_READ_VALUE(rw);
 	if (__predict_false(LOCKSTAT_OOL_PROFILE_ENABLED(rw__acquire) ||
-	    !__rw_rlock_try(rw, td, &v, file, line)))
-		__rw_rlock_hard(c, td, v, file, line);
+	    !__rw_rlock_try(rw, td, &v, true LOCK_FILE_LINE_ARG)))
+		__rw_rlock_hard(rw, td, v LOCK_FILE_LINE_ARG);
 
 	LOCK_LOG_LOCK("RLOCK", &rw->lock_object, 0, 0, file, line);
 	WITNESS_LOCK(&rw->lock_object, 0, file, line);
 	TD_LOCKS_INC(curthread);
 }
 
-int
-__rw_try_rlock(volatile uintptr_t *c, const char *file, int line)
+void
+__rw_rlock(volatile uintptr_t *c, const char *file, int line)
 {
 	struct rwlock *rw;
+
+	rw = rwlock2rw(c);
+	__rw_rlock_int(rw LOCK_FILE_LINE_ARG);
+}
+
+int
+__rw_try_rlock_int(struct rwlock *rw LOCK_FILE_LINE_ARG_DEF)
+{
 	uintptr_t x;
 
 	if (SCHEDULER_STOPPED())
 		return (1);
 
-	rw = rwlock2rw(c);
-
 	KASSERT(kdb_active != 0 || !TD_IS_IDLETHREAD(curthread),
 	    ("rw_try_rlock() by idle thread %p on rwlock %s @ %s:%d",
 	    curthread, rw->lock_object.lo_name, file, line));
@@ -667,6 +694,15 @@ __rw_try_rlock(volatile uintptr_t *c, const char *file
 	return (0);
 }
 
+int
+__rw_try_rlock(volatile uintptr_t *c, const char *file, int line)
+{
+	struct rwlock *rw;
+
+	rw = rwlock2rw(c);
+	return (__rw_try_rlock_int(rw LOCK_FILE_LINE_ARG));
+}
+
 static bool __always_inline
 __rw_runlock_try(struct rwlock *rw, struct thread *td, uintptr_t *vp)
 {
@@ -712,18 +748,15 @@ __rw_runlock_try(struct rwlock *rw, struct thread *td,
 }
 
 static void __noinline
-__rw_runlock_hard(volatile uintptr_t *c, struct thread *td, uintptr_t v,
-    const char *file, int line)
+__rw_runlock_hard(struct rwlock *rw, struct thread *td, uintptr_t v
+    LOCK_FILE_LINE_ARG_DEF)
 {
-	struct rwlock *rw;
 	struct turnstile *ts;
 	uintptr_t x, queue;
 
 	if (SCHEDULER_STOPPED())
 		return;
 
-	rw = rwlock2rw(c);
-
 	for (;;) {
 		if (__rw_runlock_try(rw, td, &v))
 			break;
@@ -733,7 +766,14 @@ __rw_runlock_hard(volatile uintptr_t *c, struct thread
 		 * last reader, so grab the turnstile lock.
 		 */
 		turnstile_chain_lock(&rw->lock_object);
-		v = rw->rw_lock & (RW_LOCK_WAITERS | RW_LOCK_WRITE_SPINNER);
+		v = RW_READ_VALUE(rw);
+retry_ts:
+		if (__predict_false(RW_READERS(v) > 1)) {
+			turnstile_chain_unlock(&rw->lock_object);
+			continue;
+		}
+
+		v &= (RW_LOCK_WAITERS | RW_LOCK_WRITE_SPINNER);
 		MPASS(v & RW_LOCK_WAITERS);
 
 		/*
@@ -758,12 +798,9 @@ __rw_runlock_hard(volatile uintptr_t *c, struct thread
 			x |= (v & RW_LOCK_READ_WAITERS);
 		} else
 			queue = TS_SHARED_QUEUE;
-		if (!atomic_cmpset_rel_ptr(&rw->rw_lock, RW_READERS_LOCK(1) | v,
-		    x)) {
-			turnstile_chain_unlock(&rw->lock_object);
-			v = RW_READ_VALUE(rw);
-			continue;
-		}
+		v |= RW_READERS_LOCK(1);
+		if (!atomic_fcmpset_rel_ptr(&rw->rw_lock, &v, x))
+			goto retry_ts;
 		if (LOCK_LOG_TEST(&rw->lock_object, 0))
 			CTR2(KTR_LOCK, "%s: %p last succeeded with waiters",
 			    __func__, rw);
@@ -787,17 +824,14 @@ __rw_runlock_hard(volatile uintptr_t *c, struct thread
 }
 
 void
-_rw_runlock_cookie(volatile uintptr_t *c, const char *file, int line)
+_rw_runlock_cookie_int(struct rwlock *rw LOCK_FILE_LINE_ARG_DEF)
 {
-	struct rwlock *rw;
 	struct thread *td;
 	uintptr_t v;
 
-	rw = rwlock2rw(c);
-
 	KASSERT(rw->rw_lock != RW_DESTROYED,
 	    ("rw_runlock() of destroyed rwlock @ %s:%d", file, line));
-	__rw_assert(c, RA_RLOCKED, file, line);
+	__rw_assert(&rw->rw_lock, RA_RLOCKED, file, line);
 	WITNESS_UNLOCK(&rw->lock_object, 0, file, line);
 	LOCK_LOG_LOCK("RUNLOCK", &rw->lock_object, 0, 0, file, line);
 
@@ -806,24 +840,33 @@ _rw_runlock_cookie(volatile uintptr_t *c, const char *
 
 	if (__predict_false(LOCKSTAT_OOL_PROFILE_ENABLED(rw__release) ||
 	    !__rw_runlock_try(rw, td, &v)))
-		__rw_runlock_hard(c, td, v, file, line);
+		__rw_runlock_hard(rw, td, v LOCK_FILE_LINE_ARG);
 
 	TD_LOCKS_DEC(curthread);
 }
 
+void
+_rw_runlock_cookie(volatile uintptr_t *c, const char *file, int line)
+{
+	struct rwlock *rw;
+
+	rw = rwlock2rw(c);
+	_rw_runlock_cookie_int(rw LOCK_FILE_LINE_ARG);
+}
+
 /*
  * This function is called when we are unable to obtain a write lock on the
  * first try.  This means that at least one other thread holds either a
  * read or write lock.
  */
 void
-__rw_wlock_hard(volatile uintptr_t *c, uintptr_t v, uintptr_t tid,
-    const char *file, int line)
+__rw_wlock_hard(volatile uintptr_t *c, uintptr_t v LOCK_FILE_LINE_ARG_DEF)
 {
+	uintptr_t tid;
 	struct rwlock *rw;
 	struct turnstile *ts;
+	struct thread *owner;
 #ifdef ADAPTIVE_RWLOCKS
-	volatile struct thread *owner;
 	int spintries = 0;
 	int i;
 #endif
@@ -836,12 +879,16 @@ __rw_wlock_hard(volatile uintptr_t *c, uintptr_t v, ui
 	struct lock_delay_arg lda;
 #endif
 #ifdef KDTRACE_HOOKS
-	uintptr_t state;
 	u_int sleep_cnt = 0;
 	int64_t sleep_time = 0;
 	int64_t all_time = 0;
 #endif
+#if defined(KDTRACE_HOOKS) || defined(LOCK_PROFILING)
+	uintptr_t state;
+	int doing_lockprof;
+#endif
 
+	tid = (uintptr_t)curthread;
 	if (SCHEDULER_STOPPED())
 		return;
 
@@ -869,10 +916,23 @@ __rw_wlock_hard(volatile uintptr_t *c, uintptr_t v, ui
 		CTR5(KTR_LOCK, "%s: %s contested (lock=%p) at %s:%d", __func__,
 		    rw->lock_object.lo_name, (void *)rw->rw_lock, file, line);
 
-#ifdef KDTRACE_HOOKS
-	all_time -= lockstat_nsecs(&rw->lock_object);
+#ifdef HWPMC_HOOKS
+	PMC_SOFT_CALL( , , lock, failed);
+#endif
+	lock_profile_obtain_lock_failed(&rw->lock_object,
+	    &contested, &waittime);
+
+#ifdef LOCK_PROFILING
+	doing_lockprof = 1;
 	state = v;
+#elif defined(KDTRACE_HOOKS)
+	doing_lockprof = lockstat_enabled;
+	if (__predict_false(doing_lockprof)) {
+		all_time -= lockstat_nsecs(&rw->lock_object);
+		state = v;
+	}
 #endif
+
 	for (;;) {
 		if (v == RW_UNLOCKED) {
 			if (_rw_write_lock_fetch(rw, &v, tid))
@@ -882,11 +942,7 @@ __rw_wlock_hard(volatile uintptr_t *c, uintptr_t v, ui
 #ifdef KDTRACE_HOOKS
 		lda.spin_cnt++;
 #endif
-#ifdef HWPMC_HOOKS
-		PMC_SOFT_CALL( , , lock, failed);
-#endif
-		lock_profile_obtain_lock_failed(&rw->lock_object,
-		    &contested, &waittime);
+
 #ifdef ADAPTIVE_RWLOCKS
 		/*
 		 * If the lock is write locked and the owner is
@@ -913,9 +969,8 @@ __rw_wlock_hard(volatile uintptr_t *c, uintptr_t v, ui
 		if ((v & RW_LOCK_READ) && RW_READERS(v) &&
 		    spintries < rowner_retries) {
 			if (!(v & RW_LOCK_WRITE_SPINNER)) {
-				if (!atomic_cmpset_ptr(&rw->rw_lock, v,
+				if (!atomic_fcmpset_ptr(&rw->rw_lock, &v,
 				    v | RW_LOCK_WRITE_SPINNER)) {
-					v = RW_READ_VALUE(rw);
 					continue;
 				}
 			}
@@ -924,13 +979,13 @@ __rw_wlock_hard(volatile uintptr_t *c, uintptr_t v, ui
 			    "spinning", "lockname:\"%s\"",
 			    rw->lock_object.lo_name);
 			for (i = 0; i < rowner_loops; i++) {
-				if ((rw->rw_lock & RW_LOCK_WRITE_SPINNER) == 0)
-					break;
 				cpu_spinwait();
+				v = RW_READ_VALUE(rw);
+				if ((v & RW_LOCK_WRITE_SPINNER) == 0)
+					break;
 			}
 			KTR_STATE0(KTR_SCHED, "thread", sched_tdname(curthread),
 			    "running");
-			v = RW_READ_VALUE(rw);
 #ifdef KDTRACE_HOOKS
 			lda.spin_cnt += rowner_loops - i;
 #endif
@@ -940,6 +995,8 @@ __rw_wlock_hard(volatile uintptr_t *c, uintptr_t v, ui
 #endif
 		ts = turnstile_trywait(&rw->lock_object);
 		v = RW_READ_VALUE(rw);
+retry_ts:
+		owner = lv_rw_wowner(v);
 
 #ifdef ADAPTIVE_RWLOCKS
 		/*
@@ -949,8 +1006,7 @@ __rw_wlock_hard(volatile uintptr_t *c, uintptr_t v, ui
 		 * chain lock.  If so, drop the turnstile lock and try
 		 * again.
 		 */
-		if (!(v & RW_LOCK_READ)) {
-			owner = (struct thread *)RW_OWNER(v);
+		if (owner != NULL) {
 			if (TD_IS_RUNNING(owner)) {
 				turnstile_cancel(ts);
 				continue;
@@ -967,16 +1023,14 @@ __rw_wlock_hard(volatile uintptr_t *c, uintptr_t v, ui
 		x = v & (RW_LOCK_WAITERS | RW_LOCK_WRITE_SPINNER);
 		if ((v & ~x) == RW_UNLOCKED) {
 			x &= ~RW_LOCK_WRITE_SPINNER;
-			if (atomic_cmpset_acq_ptr(&rw->rw_lock, v, tid | x)) {
+			if (atomic_fcmpset_acq_ptr(&rw->rw_lock, &v, tid | x)) {
 				if (x)
 					turnstile_claim(ts);
 				else
 					turnstile_cancel(ts);
 				break;
 			}
-			turnstile_cancel(ts);
-			v = RW_READ_VALUE(rw);
-			continue;
+			goto retry_ts;
 		}
 		/*
 		 * If the RW_LOCK_WRITE_WAITERS flag isn't set, then try to
@@ -984,12 +1038,9 @@ __rw_wlock_hard(volatile uintptr_t *c, uintptr_t v, ui
 		 * again.
 		 */
 		if (!(v & RW_LOCK_WRITE_WAITERS)) {
-			if (!atomic_cmpset_ptr(&rw->rw_lock, v,
-			    v | RW_LOCK_WRITE_WAITERS)) {
-				turnstile_cancel(ts);
-				v = RW_READ_VALUE(rw);
-				continue;
-			}
+			if (!atomic_fcmpset_ptr(&rw->rw_lock, &v,
+			    v | RW_LOCK_WRITE_WAITERS))
+				goto retry_ts;
 			if (LOCK_LOG_TEST(&rw->lock_object, 0))
 				CTR2(KTR_LOCK, "%s: %p set write waiters flag",
 				    __func__, rw);
@@ -1004,7 +1055,8 @@ __rw_wlock_hard(volatile uintptr_t *c, uintptr_t v, ui
 #ifdef KDTRACE_HOOKS
 		sleep_time -= lockstat_nsecs(&rw->lock_object);
 #endif
-		turnstile_wait(ts, rw_owner(rw), TS_EXCLUSIVE_QUEUE);
+		MPASS(owner == rw_owner(rw));
+		turnstile_wait(ts, owner, TS_EXCLUSIVE_QUEUE);
 #ifdef KDTRACE_HOOKS
 		sleep_time += lockstat_nsecs(&rw->lock_object);
 		sleep_cnt++;
@@ -1017,6 +1069,10 @@ __rw_wlock_hard(volatile uintptr_t *c, uintptr_t v, ui
 #endif
 		v = RW_READ_VALUE(rw);
 	}
+#if defined(KDTRACE_HOOKS) || defined(LOCK_PROFILING)
+	if (__predict_true(!doing_lockprof))
+		return;
+#endif
 #ifdef KDTRACE_HOOKS
 	all_time += lockstat_nsecs(&rw->lock_object);
 	if (sleep_time)
@@ -1041,19 +1097,21 @@ __rw_wlock_hard(volatile uintptr_t *c, uintptr_t v, ui
  * on this lock.
  */
 void
-__rw_wunlock_hard(volatile uintptr_t *c, uintptr_t tid, const char *file,
-    int line)
+__rw_wunlock_hard(volatile uintptr_t *c, uintptr_t v LOCK_FILE_LINE_ARG_DEF)
 {
 	struct rwlock *rw;
 	struct turnstile *ts;
-	uintptr_t v;
+	uintptr_t tid, setv;
 	int queue;
 
+	tid = (uintptr_t)curthread;
 	if (SCHEDULER_STOPPED())
 		return;
 
 	rw = rwlock2rw(c);
-	v = RW_READ_VALUE(rw);
+	if (__predict_false(v == tid))
+		v = RW_READ_VALUE(rw);
+
 	if (v & RW_LOCK_WRITER_RECURSED) {
 		if (--(rw->rw_recurse) == 0)
 			atomic_clear_ptr(&rw->rw_lock, RW_LOCK_WRITER_RECURSED);
@@ -1073,8 +1131,6 @@ __rw_wunlock_hard(volatile uintptr_t *c, uintptr_t tid
 		CTR2(KTR_LOCK, "%s: %p contested", __func__, rw);
 
 	turnstile_chain_lock(&rw->lock_object);
-	ts = turnstile_lookup(&rw->lock_object);
-	MPASS(ts != NULL);
 
 	/*
 	 * Use the same algo as sx locks for now.  Prefer waking up shared
@@ -1092,19 +1148,23 @@ __rw_wunlock_hard(volatile uintptr_t *c, uintptr_t tid
 	 * there that could be worked around either by waking both queues
 	 * of waiters or doing some complicated lock handoff gymnastics.
 	 */
-	v = RW_UNLOCKED;
-	if (rw->rw_lock & RW_LOCK_WRITE_WAITERS) {
+	setv = RW_UNLOCKED;
+	v = RW_READ_VALUE(rw);
+	queue = TS_SHARED_QUEUE;
+	if (v & RW_LOCK_WRITE_WAITERS) {
 		queue = TS_EXCLUSIVE_QUEUE;
-		v |= (rw->rw_lock & RW_LOCK_READ_WAITERS);
-	} else
-		queue = TS_SHARED_QUEUE;
+		setv |= (v & RW_LOCK_READ_WAITERS);
+	}
+	atomic_store_rel_ptr(&rw->rw_lock, setv);
 
 	/* Wake up all waiters for the specific queue. */
 	if (LOCK_LOG_TEST(&rw->lock_object, 0))
 		CTR3(KTR_LOCK, "%s: %p waking up %s waiters", __func__, rw,
 		    queue == TS_SHARED_QUEUE ? "read" : "write");
+
+	ts = turnstile_lookup(&rw->lock_object);
+	MPASS(ts != NULL);

*** DIFF OUTPUT TRUNCATED AT 1000 LINES ***