From owner-freebsd-hackers  Tue Dec  4 12: 7:52 2001
Delivered-To: freebsd-hackers@freebsd.org
Received: from net2.gendyn.com (nat2.gendyn.com [204.60.171.12])
	by hub.freebsd.org (Postfix) with ESMTP
	id EBCBC37B41A; Tue,  4 Dec 2001 12:07:39 -0800 (PST)
Received: from [153.11.11.3] (helo=plunger.gdeb.com)
	by net2.gendyn.com with esmtp (Exim 2.12 #1)
	id 16BLqc-0009ZJ-00; Tue, 4 Dec 2001 15:07:28 -0500
Received: from clcrtr.gdeb.com ([153.11.109.11])
	by plunger.gdeb.com  with SMTP id OAA05518;
	Tue, 4 Dec 2001 14:54:58 -0500 (EST)
Received: from vigrid.com (gpz.clc.gdeb.com [192.168.3.12])
	by clcrtr.gdeb.com (8.11.4/8.11.4) with ESMTP id fB4KCaK87164;
	Tue, 4 Dec 2001 15:14:43 -0500 (EST)
	(envelope-from eischen@vigrid.com)
Message-ID: <3C0D2C69.8521471D@vigrid.com>
Date: Tue, 04 Dec 2001 15:04:57 -0500
From: Daniel Eischen <eischen@vigrid.com>
X-Mailer: Mozilla 4.78 [en] (X11; U; SunOS 5.8 sun4u)
X-Accept-Language: en
MIME-Version: 1.0
To: Alfred Perlstein <bright@mu.org>
Cc: Louis-Philippe Gagnon <louisphilippe@macadamian.com>,
	freebsd-current@FreeBSD.ORG, freebsd-hackers@FreeBSD.ORG
Subject: Re: Possible libc_r pthread bug
References: <094601c179ea$7cca85c0$2964a8c0@MACADAMIAN.com> <Pine.SUN.3.91.1011130170847.14642A-100000@pcnet1.pcnet.com> <20011204021815.E92148@elvis.mu.org> <3C0CC2FE.275F4C68@vigrid.com> <20011204114236.H92148@elvis.mu.org> <3C0D1680.E3461FB@gdeb.com> <20011204124624.L92148@elvis.mu.org>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-hackers.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-hackers>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-hackers>
X-Loop: FreeBSD.ORG

Alfred Perlstein wrote:
> 
> * Daniel Eischen <deischen@gdeb.com> [011204 12:32] wrote:
> > Alfred Perlstein wrote:
> > >
> > > * Dan Eischen <eischen@vigrid.com> [011204 06:26] wrote:
> > > >
> > > > There are already cancellation tests when resuming threads
> > > > whose contexts are not saved as a result of a signal interrupt
> > > > (ctxtype != CTX_UC). You shouldn't test for cancellation when
> > > > ctxtype == CTX_UC because you are running on the scheduler
> > > > stack, not the threads stack.
> > >
> > > That makes sense, but why?
> >
> > Because when a thread gets cancelled, pthread_exit gets called
> > which then calls the scheduler again.  It is also possible to
> > get interrupted during this process and the threads context
> > (which is operating on the scheduler stack) could get saved.
> > The scheduler could get entered again, and if the thread
> > gets resumed, it'll longjmp to the saved context which is the
> > scheduler stack (and which was just trashed by entering the
> > scheduler again).
> >
> > It is too confusing to try to handle conditions like this, and
> > the threads library doesn't need to get any more confusing ;-)
> > Once the scheduler is entered, no pthread routines should
> > be called and the scheduler should not be recursively
> > entered.  The only way out of the scheduler should be a
> > longjmp or sigreturn to a saved threads context.
> 
> Ok, for the sake of beating a clue into me...
> 
> in uthread_kern.c:_thread_kern_sched
> 
>                 /* Save the state of the current thread: */
>                 if (_setjmp(curthread->ctx.jb) == 0) {
>                         /* Flag the jump buffer was the last state saved: */
>                         curthread->ctxtype = CTX_JB_NOSIG;
>                         curthread->longjmp_val = 1;
>                 } else {
>                         DBG_MSG("Returned from ___longjmp, thread %p\n",
>                             curthread);
>                         /*
>                          * This point is reached when a longjmp() is called
>                          * to restore the state of a thread.
>                          *
>                          * This is the normal way out of the scheduler.
>                          */
>                         _thread_kern_in_sched = 0;
> 
>                         if (curthread->sig_defer_count == 0) {
>                                 if (((curthread->cancelflags &
>                                     PTHREAD_AT_CANCEL_POINT) == 0) &&
>                                     ((curthread->cancelflags &
>                                     PTHREAD_CANCEL_ASYNCHRONOUS) != 0))
>                                         /*
>                                          * Cancellations override signals.
>                                          *
>                                          * Stick a cancellation point at the
>                                          * start of each async-cancellable
>                                          * thread's resumption.
>                                          *
>                                          * We allow threads woken at cancel
>                                          * points to do their own checks.
>                                          */
>                                         pthread_testcancel();
>                         }
> 
> Why isn't this "working", shouldn't it be doing the right thing?
> What if curthread->sig_defer_count wasn't tested?
> Maybe this should be a test against curthread->sig_defer_count <= 1?

Because this is the "normal" way into the scheduler -- when a thread
hits a blocking condition or yields.  A signal interrupting a thread
does not go through this section.  The interrupted threads context is
argument 3 of the signal handler, and this context gets stored in
curthread->ctx.uc.  This is the crux of the problem.  When you resume
this context, you are not in the thread scheduling code and so you
can't check for cancellation.  I'm suggesting that the proper way
to handle this is to munge this interrupted context (a ucontext_t)
so that it first returns to a small wrapper function that can
check for cancellation (and clear the "in scheduler flag" which
is the other problem I mentioned) before returning to the interrupted
context.

There is another way to handle this, but it is more complicated
although probably better than the above method.  This would involve
changing the signal handling code to not use an alternate signal
stack, so an interrupted thread could do something like:

	void
	sighandler(int sig, siginfo_t *info, ucontext_t *ucp)
	{
		...
		{
			ucontext_t	uc;

			/* Save interrupted context on stack: */
			uc = *ucp;
			uc.uc_sigmask = _process_sigmask;

			/* Enter the scheduler: */
			_thread_kern_sched(NULL);

			/*
			 * After return from the scheduler, the
			 * "in scheduler flag" has been cleared
			 * and the thread has been checked for
			 * cancellation.
			 */

			/* Return to interrupted context: */
			__sys_sigreturn(&uc);
		}
	}

The problem is that the signal handling code munges with a threads
saved stack in order to install a signal handler on it.  This is
OK if the thread isn't the current thread.  But if it is the current
thread, then you end up munging with the stack on which you are
currently executing (not a good thing, and this is why we install
an alternate signal stack).  The signal handling code would have to
invoke a signal handler directly if it was for the current thread.
ThreadsNG will probably do it like this.

-- 
Dan Eischen

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message