From owner-freebsd-hackers Tue Dec 4 10:32:18 2001 Delivered-To: freebsd-hackers@freebsd.org Received: from net2.gendyn.com (nat2.gendyn.com [204.60.171.12]) by hub.freebsd.org (Postfix) with ESMTP id 9675637B405; Tue, 4 Dec 2001 10:32:08 -0800 (PST) Received: from [153.11.11.3] (helo=plunger.gdeb.com) by net2.gendyn.com with esmtp (Exim 2.12 #1) id 16BKMJ-0005Ds-00; Tue, 4 Dec 2001 13:32:00 -0500 Received: from clcrtr.gdeb.com ([153.11.109.11]) by plunger.gdeb.com with SMTP id NAA27465; Tue, 4 Dec 2001 13:19:30 -0500 (EST) Received: from gdeb.com (gpz.clc.gdeb.com [192.168.3.12]) by clcrtr.gdeb.com (8.11.4/8.11.4) with ESMTP id fB4Id8K87093; Tue, 4 Dec 2001 13:39:13 -0500 (EST) (envelope-from deischen@gdeb.com) Message-ID: <3C0D1680.E3461FB@gdeb.com> Date: Tue, 04 Dec 2001 13:31:28 -0500 From: Daniel Eischen X-Mailer: Mozilla 4.78 [en] (X11; U; SunOS 5.8 sun4u) X-Accept-Language: en MIME-Version: 1.0 To: Alfred Perlstein Cc: Dan Eischen , Louis-Philippe Gagnon , freebsd-current@FreeBSD.ORG, freebsd-hackers@FreeBSD.ORG Subject: Re: Possible libc_r pthread bug References: <094601c179ea$7cca85c0$2964a8c0@MACADAMIAN.com> <20011204021815.E92148@elvis.mu.org> <3C0CC2FE.275F4C68@vigrid.com> <20011204114236.H92148@elvis.mu.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Alfred Perlstein wrote: > > * Dan Eischen [011204 06:26] wrote: > > > > There are already cancellation tests when resuming threads > > whose contexts are not saved as a result of a signal interrupt > > (ctxtype != CTX_UC). You shouldn't test for cancellation when > > ctxtype == CTX_UC because you are running on the scheduler > > stack, not the threads stack. > > That makes sense, but why? Because when a thread gets cancelled, pthread_exit gets called which then calls the scheduler again. It is also possible to get interrupted during this process and the threads context (which is operating on the scheduler stack) could get saved. The scheduler could get entered again, and if the thread gets resumed, it'll longjmp to the saved context which is the scheduler stack (and which was just trashed by entering the scheduler again). It is too confusing to try to handle conditions like this, and the threads library doesn't need to get any more confusing ;-) Once the scheduler is entered, no pthread routines should be called and the scheduler should not be recursively entered. The only way out of the scheduler should be a longjmp or sigreturn to a saved threads context. > > > You also have a bug in the > > way you changed the check for cancellation flags. > > What? When a thread is at a cancellation point, you want to let the cancellable routine handle the cancel. The check as coded before avoided calling pthread_testcancel() when at a cancellation point. I think you check for either PTHREAD_AT_CANCEL_POINT or PTHREAD_CANCEL_ASYNCHRONOUS being set when you really want ((flags & PTHREAD_AT_CANCEL_POINT) == 0) && ((flags & PTHREAD_CANCEL_ASYNCHRONOUS) != 0)) > > > There only clean way to fix this is to add a return frame > > to the interrupted context so that it can check for cancellation > > (and other things) before returning to the threads interrupted > > context. > > No way to work around this? Shouldn't the thread exit library > know which stack exactly to clean up even in the context of a > signal handler? It assumes that you're running on the current threads stack. I don't view this particular bug as a big problem. It is a somewhat perverse program that has a CPU bound thread that never gets to any sort of blocking condition and yet still wants to be cancelled. The submitter of the problem doesn't even want to upgrade to get a fix. It can be worked around easily enough by checking for cancellation or by using pthread_kill to send a signal to the thread and have the signal handler exit the thread or longjmp back to the thread at a place that can exit and cleanup. There is already a minor race condition in trying to resume a thread that was interrupted by a signal. Adding some code to munge the stack of an interrupted context so that it calls a wrapper function would solve both problems. The signal handling code already does this to install a signal handler wrapper on a threads stack. -- Dan Eischen To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message