From owner-freebsd-hackers  Tue Dec  4 10:32:18 2001
Delivered-To: freebsd-hackers@freebsd.org
Received: from net2.gendyn.com (nat2.gendyn.com [204.60.171.12])
	by hub.freebsd.org (Postfix) with ESMTP
	id 9675637B405; Tue,  4 Dec 2001 10:32:08 -0800 (PST)
Received: from [153.11.11.3] (helo=plunger.gdeb.com)
	by net2.gendyn.com with esmtp (Exim 2.12 #1)
	id 16BKMJ-0005Ds-00; Tue, 4 Dec 2001 13:32:00 -0500
Received: from clcrtr.gdeb.com ([153.11.109.11])
	by plunger.gdeb.com  with SMTP id NAA27465;
	Tue, 4 Dec 2001 13:19:30 -0500 (EST)
Received: from gdeb.com (gpz.clc.gdeb.com [192.168.3.12])
	by clcrtr.gdeb.com (8.11.4/8.11.4) with ESMTP id fB4Id8K87093;
	Tue, 4 Dec 2001 13:39:13 -0500 (EST)
	(envelope-from deischen@gdeb.com)
Message-ID: <3C0D1680.E3461FB@gdeb.com>
Date: Tue, 04 Dec 2001 13:31:28 -0500
From: Daniel Eischen <deischen@gdeb.com>
X-Mailer: Mozilla 4.78 [en] (X11; U; SunOS 5.8 sun4u)
X-Accept-Language: en
MIME-Version: 1.0
To: Alfred Perlstein <bright@mu.org>
Cc: Dan Eischen <eischen@vigrid.com>,
	Louis-Philippe Gagnon <louisphilippe@macadamian.com>,
	freebsd-current@FreeBSD.ORG, freebsd-hackers@FreeBSD.ORG
Subject: Re: Possible libc_r pthread bug
References: <094601c179ea$7cca85c0$2964a8c0@MACADAMIAN.com> <Pine.SUN.3.91.1011130170847.14642A-100000@pcnet1.pcnet.com> <20011204021815.E92148@elvis.mu.org> <3C0CC2FE.275F4C68@vigrid.com> <20011204114236.H92148@elvis.mu.org>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-hackers.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-hackers>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-hackers>
X-Loop: FreeBSD.ORG

Alfred Perlstein wrote:
> 
> * Dan Eischen <eischen@vigrid.com> [011204 06:26] wrote:
> >
> > There are already cancellation tests when resuming threads
> > whose contexts are not saved as a result of a signal interrupt
> > (ctxtype != CTX_UC). You shouldn't test for cancellation when
> > ctxtype == CTX_UC because you are running on the scheduler
> > stack, not the threads stack.
> 
> That makes sense, but why?

Because when a thread gets cancelled, pthread_exit gets called
which then calls the scheduler again.  It is also possible to
get interrupted during this process and the threads context
(which is operating on the scheduler stack) could get saved.
The scheduler could get entered again, and if the thread
gets resumed, it'll longjmp to the saved context which is the
scheduler stack (and which was just trashed by entering the
scheduler again).

It is too confusing to try to handle conditions like this, and
the threads library doesn't need to get any more confusing ;-)
Once the scheduler is entered, no pthread routines should
be called and the scheduler should not be recursively
entered.  The only way out of the scheduler should be a
longjmp or sigreturn to a saved threads context.

> 
> >                                  You also have a bug in the
> > way you changed the check for cancellation flags.
> 
> What?

When a thread is at a cancellation point, you want to let the
cancellable routine handle the cancel.  The check as coded before
avoided calling pthread_testcancel() when at a cancellation
point.  I think you check for either PTHREAD_AT_CANCEL_POINT
or PTHREAD_CANCEL_ASYNCHRONOUS being set when you really want
((flags & PTHREAD_AT_CANCEL_POINT) == 0) &&
((flags & PTHREAD_CANCEL_ASYNCHRONOUS) != 0))

> 
> > There only clean way to fix this is to add a return frame
> > to the interrupted context so that it can check for cancellation
> > (and other things) before returning to the threads interrupted
> > context.
> 
> No way to work around this?  Shouldn't the thread exit library
> know which stack exactly to clean up even in the context of a
> signal handler?

It assumes that you're running on the current threads stack.

I don't view this particular bug as a big problem.  It is a
somewhat perverse program that has a CPU bound thread that
never gets to any sort of blocking condition and yet still
wants to be cancelled.  The submitter of the problem doesn't
even want to upgrade to get a fix.  It can be worked around
easily enough by checking for cancellation or by using
pthread_kill to send a signal to the thread and have the
signal handler exit the thread or longjmp back to the thread
at a place that can exit and cleanup.

There is already a minor race condition in trying to resume
a thread that was interrupted by a signal.  Adding some code
to munge the stack of an interrupted context so that it calls
a wrapper function would solve both problems.  The signal
handling code already does this to install a signal handler
wrapper on a threads stack.

-- 
Dan Eischen

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message