Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 22 Dec 2016 11:26:01 -0800
From:      Mark Johnston <markj@freebsd.org>
To:        John Baldwin <jhb@freebsd.org>
Cc:        src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   Re: svn commit: r310423 - head/sys/kern
Message-ID:  <20161222192601.GA78778@wkstn-mjohnston.west.isilon.com>
In-Reply-To: <6562460.a4qdZuDa0s@ralph.baldwin.cx>
References:  <201612221751.uBMHpim4062786@repo.freebsd.org> <6562460.a4qdZuDa0s@ralph.baldwin.cx>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Dec 22, 2016 at 10:39:12AM -0800, John Baldwin wrote:
> On Thursday, December 22, 2016 05:51:44 PM Mark Johnston wrote:
> > Author: markj
> > Date: Thu Dec 22 17:51:44 2016
> > New Revision: 310423
> > URL: https://svnweb.freebsd.org/changeset/base/310423
> > 
> > Log:
> >   Revert part of r300109.
> >   
> >   The removal of TAILQ_FOREACH_SAFE introduced a small race: when the last
> >   thread on a sleepqueue is awoken, it reclaims the sleepqueue and may begin
> >   executing on a different CPU before sleepq_resume_thread() returns. This
> >   leaves a window during which it may go back to sleep and incorrectly be
> >   awoken again by the caller of sleepq_broadcast().
> 
> This is very subtle.  

:(

> The issue is that the last sleepq_resume_thread transfers
> ownership of 'sq' from the wait channel that the sleepq_broadcast has locked,
> to the thread being resumed.  

Right, that's what I meant by "reclaims the sleepqueue." One other
requirement for hitting the race is that the thread goes back to sleep
on a wait channel that hashes to a different sleepchain, else the
sleepchain lock held by the sleepq_broadcast() caller is, I believe,
sufficient to prevent the reuse of the sleepqueue before the loop has
terminated.

> I thought about using a local TAILQ_HEAD and
> using TAILQ_CONCAT to move the list of threads out of the sleep queue and then
> walking that list.  However, a comment explaining this transfer of ownership
> (and that we can't safely access 'sq' after the last thread is resumed) is
> probably sufficient (but necessary I think).  Do you feel like adding one?

How about:

Index: subr_sleepqueue.c
===================================================================
--- subr_sleepqueue.c	(revision 310423)
+++ subr_sleepqueue.c	(working copy)
@@ -892,7 +892,12 @@
 	KASSERT(sq->sq_type == (flags & SLEEPQ_TYPE),
 	    ("%s: mismatch between sleep/wakeup and cv_*", __func__));
 
-	/* Resume all blocked threads on the sleep queue. */
+	/*
+	 * Resume all blocked threads on the sleep queue.  The last thread will
+	 * be given ownership of sq and may re-enqueue itself before
+	 * sleepq_resume_thread() returns, so we must cache the "next" queue
+	 * item at the beginning of the final iteration.
+	 */
 	wakeup_swapper = 0;
 	TAILQ_FOREACH_SAFE(td, &sq->sq_blocked[queue], td_slpq, tdn) {
 		thread_lock(td);



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20161222192601.GA78778>