From owner-freebsd-hackers Tue Dec 4 12: 7:52 2001 Delivered-To: freebsd-hackers@freebsd.org Received: from net2.gendyn.com (nat2.gendyn.com [204.60.171.12]) by hub.freebsd.org (Postfix) with ESMTP id EBCBC37B41A; Tue, 4 Dec 2001 12:07:39 -0800 (PST) Received: from [153.11.11.3] (helo=plunger.gdeb.com) by net2.gendyn.com with esmtp (Exim 2.12 #1) id 16BLqc-0009ZJ-00; Tue, 4 Dec 2001 15:07:28 -0500 Received: from clcrtr.gdeb.com ([153.11.109.11]) by plunger.gdeb.com with SMTP id OAA05518; Tue, 4 Dec 2001 14:54:58 -0500 (EST) Received: from vigrid.com (gpz.clc.gdeb.com [192.168.3.12]) by clcrtr.gdeb.com (8.11.4/8.11.4) with ESMTP id fB4KCaK87164; Tue, 4 Dec 2001 15:14:43 -0500 (EST) (envelope-from eischen@vigrid.com) Message-ID: <3C0D2C69.8521471D@vigrid.com> Date: Tue, 04 Dec 2001 15:04:57 -0500 From: Daniel Eischen X-Mailer: Mozilla 4.78 [en] (X11; U; SunOS 5.8 sun4u) X-Accept-Language: en MIME-Version: 1.0 To: Alfred Perlstein Cc: Louis-Philippe Gagnon , freebsd-current@FreeBSD.ORG, freebsd-hackers@FreeBSD.ORG Subject: Re: Possible libc_r pthread bug References: <094601c179ea$7cca85c0$2964a8c0@MACADAMIAN.com> <20011204021815.E92148@elvis.mu.org> <3C0CC2FE.275F4C68@vigrid.com> <20011204114236.H92148@elvis.mu.org> <3C0D1680.E3461FB@gdeb.com> <20011204124624.L92148@elvis.mu.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Alfred Perlstein wrote: > > * Daniel Eischen [011204 12:32] wrote: > > Alfred Perlstein wrote: > > > > > > * Dan Eischen [011204 06:26] wrote: > > > > > > > > There are already cancellation tests when resuming threads > > > > whose contexts are not saved as a result of a signal interrupt > > > > (ctxtype != CTX_UC). You shouldn't test for cancellation when > > > > ctxtype == CTX_UC because you are running on the scheduler > > > > stack, not the threads stack. > > > > > > That makes sense, but why? > > > > Because when a thread gets cancelled, pthread_exit gets called > > which then calls the scheduler again. It is also possible to > > get interrupted during this process and the threads context > > (which is operating on the scheduler stack) could get saved. > > The scheduler could get entered again, and if the thread > > gets resumed, it'll longjmp to the saved context which is the > > scheduler stack (and which was just trashed by entering the > > scheduler again). > > > > It is too confusing to try to handle conditions like this, and > > the threads library doesn't need to get any more confusing ;-) > > Once the scheduler is entered, no pthread routines should > > be called and the scheduler should not be recursively > > entered. The only way out of the scheduler should be a > > longjmp or sigreturn to a saved threads context. > > Ok, for the sake of beating a clue into me... > > in uthread_kern.c:_thread_kern_sched > > /* Save the state of the current thread: */ > if (_setjmp(curthread->ctx.jb) == 0) { > /* Flag the jump buffer was the last state saved: */ > curthread->ctxtype = CTX_JB_NOSIG; > curthread->longjmp_val = 1; > } else { > DBG_MSG("Returned from ___longjmp, thread %p\n", > curthread); > /* > * This point is reached when a longjmp() is called > * to restore the state of a thread. > * > * This is the normal way out of the scheduler. > */ > _thread_kern_in_sched = 0; > > if (curthread->sig_defer_count == 0) { > if (((curthread->cancelflags & > PTHREAD_AT_CANCEL_POINT) == 0) && > ((curthread->cancelflags & > PTHREAD_CANCEL_ASYNCHRONOUS) != 0)) > /* > * Cancellations override signals. > * > * Stick a cancellation point at the > * start of each async-cancellable > * thread's resumption. > * > * We allow threads woken at cancel > * points to do their own checks. > */ > pthread_testcancel(); > } > > Why isn't this "working", shouldn't it be doing the right thing? > What if curthread->sig_defer_count wasn't tested? > Maybe this should be a test against curthread->sig_defer_count <= 1? Because this is the "normal" way into the scheduler -- when a thread hits a blocking condition or yields. A signal interrupting a thread does not go through this section. The interrupted threads context is argument 3 of the signal handler, and this context gets stored in curthread->ctx.uc. This is the crux of the problem. When you resume this context, you are not in the thread scheduling code and so you can't check for cancellation. I'm suggesting that the proper way to handle this is to munge this interrupted context (a ucontext_t) so that it first returns to a small wrapper function that can check for cancellation (and clear the "in scheduler flag" which is the other problem I mentioned) before returning to the interrupted context. There is another way to handle this, but it is more complicated although probably better than the above method. This would involve changing the signal handling code to not use an alternate signal stack, so an interrupted thread could do something like: void sighandler(int sig, siginfo_t *info, ucontext_t *ucp) { ... { ucontext_t uc; /* Save interrupted context on stack: */ uc = *ucp; uc.uc_sigmask = _process_sigmask; /* Enter the scheduler: */ _thread_kern_sched(NULL); /* * After return from the scheduler, the * "in scheduler flag" has been cleared * and the thread has been checked for * cancellation. */ /* Return to interrupted context: */ __sys_sigreturn(&uc); } } The problem is that the signal handling code munges with a threads saved stack in order to install a signal handler on it. This is OK if the thread isn't the current thread. But if it is the current thread, then you end up munging with the stack on which you are currently executing (not a good thing, and this is why we install an alternate signal stack). The signal handling code would have to invoke a signal handler directly if it was for the current thread. ThreadsNG will probably do it like this. -- Dan Eischen To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message