From owner-freebsd-arch  Sat Jan 20  1:57:54 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from magnesium.net (toxic.magnesium.net [207.154.84.15])
	by hub.freebsd.org (Postfix) with SMTP id 1283C37B400
	for <freebsd-arch@freebsd.org>; Sat, 20 Jan 2001 01:57:35 -0800 (PST)
Received: (qmail 18166 invoked by uid 1142); 20 Jan 2001 09:57:28 -0000
Date: 20 Jan 2001 01:57:28 -0800
Date: Sat, 20 Jan 2001 01:57:20 -0800
From: Jason Evans <jasone@canonware.com>
To: Bosko Milekic <bmilekic@technokratis.com>
Cc: freebsd-arch@freebsd.org
Subject: Re: wakeup_one() return value
Message-ID: <20010120015720.K69199@canonware.com>
References: <000701c0829c$e0378d60$1f90c918@jehovah>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <000701c0829c$e0378d60$1f90c918@jehovah>; from bmilekic@technokratis.com on Fri, Jan 19, 2001 at 11:53:05PM -0500
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Fri, Jan 19, 2001 at 11:53:05PM -0500, Bosko Milekic wrote:
>     Here's a diff: http://people.freebsd.org/~bmilekic/code/wakeupone.diff
> that makes wakeup_one() return an integer specifying whether a process was
> actually taken off the sleep queue, or not. The reason for the proposed
> change as to do with SMP and potential race conditions in certain situations.
> For example, wakeup_one() is sometimes (and quite effectively) used in this
> scenario:
> 
> thread 1:
> ---------
> 
>   mtx_enter(lock_A, foo);
>   do stuff;
>   slp_cnt++;
>   if (msleep(foobar, lock_A, ..., some_time_period) == EWOULDBLOCK)
>         slp_cnt--;
>   do more stuff;
>   mtx_exit(lock_A, foo);
> 
> 
> thread 2:
> ---------
> 
>   mtx_enter(lock_A, foo);
>   do stuff;
>   if (slp_cnt >= 0) {
>         wakeup_one(foobar);
>         slp_cnt--;
>   }
>   mtx_exit(lock_A, foo);
> 
> 
> The problem is that under some conditions, there is a race and slp_cnt can
> get decremented twice. For example, consider slp_cnt == 0, and thread 1 comes
> in and increments it to 1. It goes to sleep but wakes up and grabs sched_lock
> (it hasn't yet regrabbed lock_A). At this time, thread 2 has lock_A and is
> spinning on sched_lock in wakeup_one(). But, thread 1 has timed out, and
> removes itself from the sleep queue; it returns EWOULDBLOCK, so slp_cnt goes
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> down to 0. sched_lock is dropped, the thread blocks waiting for thread 2 to
  ^^^^^^^^^^                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> let go of lock_A so it can "do more stuff." Meanwhile, thread 2 gets
  ^^^^^^^^^^^^^^^^
> sched_lock and does a wakeup_one() (effectively a no-op) and decrements
> slp_cnt again. There's an underflow.

I don't quite follow this paragraph, but I think that part of it rests on
the assumption that slp_cnt is decremented before lock_A is reacquired,
which is impossible.  The ordering that you describe in the underscored
text is not possible, since msleep() returns with lock_A held.  It is
possible for msleep() to return EWOULDBLOCK _after_ thread 2 has released
lock_A though, so the race condition you describe is still possible.  Your
example can be "fixed" as follows:

thread 1:
---------

  mtx_enter(lock_A, foo);
  do stuff;
  slp_cnt++;
  msleep(foobar, lock_A, ..., some_time_period);
  slp_cnt--;
  do more stuff;
  mtx_exit(lock_A, foo);


thread 2:
---------

  mtx_enter(lock_A, foo);
  do stuff;
  if (slp_cnt >= 0)
        wakeup_one(foobar);
  mtx_exit(lock_A, foo);

Now, slp_cnt will always be correct.  However, if the example is actually
more complex than what you present and there are multiple threads that can
potentially call wakeup_one(), it is possible to call wakeup_one() more
times than there are sleeping threads.  If the example is even more
complex, perhaps that means that too many threads were woken (i.e. it's a
bad thing to have woken two threads instead of just one).

This new problem can be "fixed" by introducing another variable that keeps
track of the number of times wakeup_one() is called.  I won't go there now
though. =)

> Any objections? If this goes in, we should probably do similar to the
> cond-var interface function that wakes up one sleeper.

I object to this change (even if you do come up with an example that would
benefit from the change =) ).  The example tries to use slp_cnt to pass
information between two threads in an unsafe way.  Perhaps you
oversimplified the example; there is no reason I can see for even using
slp_cnt.  The following works fine:

thread 1:
---------

  mtx_enter(lock_A, foo);
  do stuff;
  msleep(foobar, lock_A, ..., some_time_period);
  do more stuff;
  mtx_exit(lock_A, foo);


thread 2:
---------

  mtx_enter(lock_A, foo);
  do stuff;
  wakeup_one(foobar);
  mtx_exit(lock_A, foo);

If thread 1 isn't sleeping, the call to wakeup_one() has no effect, which
is fine.

Basically, all my rambling can be summarized thus:

1) The example code is broken (as you point out), but it may fail to
   highlight the point you were trying to make.

2) The methodology behind any similar example that does highlight a race
   condition would be demonstrating a flawed programming idiom, not a
   weakness of wakeup_one().

Thread synchronization rests on the foundation of mutexes and condition
variables (msleep() is an equivalent).  You are suggesting a fundamental
change to one of those synchronization primitives.  That should be a pretty
strong indicator by itself that the change isn't needed. =)

Jason


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message