From owner-freebsd-arch  Sat Jan 20 10:28:34 2001
Delivered-To: freebsd-arch@freebsd.org
Received: from VL-MS-MR002.sc1.videotron.ca (relais.videotron.ca [24.201.245.36])
	by hub.freebsd.org (Postfix) with ESMTP id 50E3437B400
	for <freebsd-arch@freebsd.org>; Sat, 20 Jan 2001 10:28:13 -0800 (PST)
Received: from jehovah ([24.201.144.31]) by
          VL-MS-MR002.sc1.videotron.ca (Netscape Messaging Server 4.15)
          with SMTP id G7H4MR02.59T; Sat, 20 Jan 2001 13:28:03 -0500 
Message-ID: <007c01c0830e$f1c84a70$1f90c918@jehovah>
From: "Bosko Milekic" <bmilekic@technokratis.com>
To: "Jason Evans" <jasone@canonware.com>
Cc: <freebsd-arch@freebsd.org>
References: <000701c0829c$e0378d60$1f90c918@jehovah> <20010120015720.K69199@canonware.com>
Subject: Re: wakeup_one() return value
Date: Sat, 20 Jan 2001 13:29:37 -0500
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.00.2919.6700
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2919.6700
Sender: owner-freebsd-arch@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


Jason Evans wrote:

> On Fri, Jan 19, 2001 at 11:53:05PM -0500, Bosko Milekic wrote:
> >     Here's a diff:
http://people.freebsd.org/~bmilekic/code/wakeupone.diff
> > that makes wakeup_one() return an integer specifying whether a process
was
> > actually taken off the sleep queue, or not. The reason for the proposed
> > change as to do with SMP and potential race conditions in certain
situations.
> > For example, wakeup_one() is sometimes (and quite effectively) used in
this
> > scenario:
> >
> > thread 1:
> > ---------
> >
> >   mtx_enter(lock_A, foo);
> >   do stuff;
> >   slp_cnt++;
> >   if (msleep(foobar, lock_A, ..., some_time_period) == EWOULDBLOCK)
> >         slp_cnt--;
> >   do more stuff;
> >   mtx_exit(lock_A, foo);
> >
> >
> > thread 2:
> > ---------
> >
> >   mtx_enter(lock_A, foo);
> >   do stuff;
> >   if (slp_cnt >= 0) {
> >         wakeup_one(foobar);
> >         slp_cnt--;
> >   }
> >   mtx_exit(lock_A, foo);
> >
> >
> > The problem is that under some conditions, there is a race and slp_cnt
can
> > get decremented twice. For example, consider slp_cnt == 0, and thread 1
comes
> > in and increments it to 1. It goes to sleep but wakes up and grabs
sched_lock
> > (it hasn't yet regrabbed lock_A). At this time, thread 2 has lock_A and
is
> > spinning on sched_lock in wakeup_one(). But, thread 1 has timed out, and
> > removes itself from the sleep queue; it returns EWOULDBLOCK, so slp_cnt
goes
>
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > down to 0. sched_lock is dropped, the thread blocks waiting for thread 2
to
>   ^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > let go of lock_A so it can "do more stuff." Meanwhile, thread 2 gets
>   ^^^^^^^^^^^^^^^^
> > sched_lock and does a wakeup_one() (effectively a no-op) and decrements
> > slp_cnt again. There's an underflow.
>
> I don't quite follow this paragraph, but I think that part of it rests on
> the assumption that slp_cnt is decremented before lock_A is reacquired,
> which is impossible.  The ordering that you describe in the underscored
> text is not possible, since msleep() returns with lock_A held.  It is
> possible for msleep() to return EWOULDBLOCK _after_ thread 2 has released
> lock_A though, so the race condition you describe is still possible.  Your

    That's exactly what I described. See "thread 1 blocks waiting for thread
2 to let go of lock_A so [that it can return EWOULDBLOCK finally] and do more
stuff." Perhaps it was a tad misleading.

> example can be "fixed" as follows:
>
> thread 1:
> ---------
>
>   mtx_enter(lock_A, foo);
>   do stuff;
>   slp_cnt++;
>   msleep(foobar, lock_A, ..., some_time_period);
>   slp_cnt--;
>   do more stuff;
>   mtx_exit(lock_A, foo);
>
>
> thread 2:
> ---------
>
>   mtx_enter(lock_A, foo);
>   do stuff;
>   if (slp_cnt >= 0)
>         wakeup_one(foobar);
>   mtx_exit(lock_A, foo);
>
> Now, slp_cnt will always be correct.  However, if the example is actually
> more complex than what you present and there are multiple threads that can
> potentially call wakeup_one(), it is possible to call wakeup_one() more
> times than there are sleeping threads.  If the example is even more
> complex, perhaps that means that too many threads were woken (i.e. it's a
> bad thing to have woken two threads instead of just one).
>
> This new problem can be "fixed" by introducing another variable that keeps
> track of the number of times wakeup_one() is called.  I won't go there now
> though. =)
>
> > Any objections? If this goes in, we should probably do similar to the
> > cond-var interface function that wakes up one sleeper.
>
> I object to this change (even if you do come up with an example that would
> benefit from the change =) ).  The example tries to use slp_cnt to pass
> information between two threads in an unsafe way.  Perhaps you
> oversimplified the example; there is no reason I can see for even using
> slp_cnt.  The following works fine:
>
> thread 1:
> ---------
>
>   mtx_enter(lock_A, foo);
>   do stuff;
>   msleep(foobar, lock_A, ..., some_time_period);
>   do more stuff;
>   mtx_exit(lock_A, foo);
>
>
> thread 2:
> ---------
>
>   mtx_enter(lock_A, foo);
>   do stuff;
>   wakeup_one(foobar);
>   mtx_exit(lock_A, foo);
>
> If thread 1 isn't sleeping, the call to wakeup_one() has no effect, which
> is fine.
>
> Basically, all my rambling can be summarized thus:
>
> 1) The example code is broken (as you point out), but it may fail to
>    highlight the point you were trying to make.
>
> 2) The methodology behind any similar example that does highlight a race
>    condition would be demonstrating a flawed programming idiom, not a
>    weakness of wakeup_one().
>
> Thread synchronization rests on the foundation of mutexes and condition
> variables (msleep() is an equivalent).  You are suggesting a fundamental
> change to one of those synchronization primitives.  That should be a pretty
> strong indicator by itself that the change isn't needed. =)
>
> Jason

    Okay, this is much better. Your first solution is better, because:

* msleep is called with A held.
* wakeup_one only happens when A is held.

so wakeup_one() may be called multiple times if, during the period between
thread1 dropping sched_lock and thread2 doing it's stuff and releasing A,
some thread3 jumps in with perfect timing and gets A before thread1 does
(this is probably highly unlikely). But even if it's called multiple times,
as long as slp_cnt is only decremented once, there's no problem. As for your
last solution, I would prefer calling wakeup_one() [and traversing the sleep
queue with sched_lock held for no good reason] only if need be.

Conclusion: I'll scrap the diff and use the method described in your first
solution (i.e. sleeping thread does all tampering with the slp_cnt).

Note: I think there used to be a reason why I didn't do it this way before, I
just can't remember it right now. Maybe it just slipped through. Sorry about
that and thanks for the pointer. :-)

Cheers,
Bosko.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message