From owner-freebsd-threads@FreeBSD.ORG Fri Jun 27 16:10:27 2003 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A072737B401 for ; Fri, 27 Jun 2003 16:10:27 -0700 (PDT) Received: from mail.pcnet.com (mail.pcnet.com [204.213.232.4]) by mx1.FreeBSD.org (Postfix) with ESMTP id DF86744005 for ; Fri, 27 Jun 2003 16:10:26 -0700 (PDT) (envelope-from eischen@vigrid.com) Received: from mail.pcnet.com (mail.pcnet.com [204.213.232.4]) by mail.pcnet.com (8.12.8/8.12.1) with ESMTP id h5RNAPXh018491; Fri, 27 Jun 2003 19:10:25 -0400 (EDT) Date: Fri, 27 Jun 2003 19:10:25 -0400 (EDT) From: Daniel Eischen X-Sender: eischen@pcnet5.pcnet.com To: Mike Makonnen In-Reply-To: <20030627224228.SLIN12592.out001.verizon.net@kokeb.ambesa.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-threads@freebsd.org cc: marcel@xcllnt.net Subject: Re: libkse / libthr bugs? X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: deischen@freebsd.org List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 Jun 2003 23:10:28 -0000 On Fri, 27 Jun 2003, Mike Makonnen wrote: > On Fri, 27 Jun 2003 07:14:43 -0400 (EDT) > Daniel Eischen wrote: > > > > > To answer your first question, yes, I think it's necessary. > > Here's an example: > > > > void > > sigalarmhandler(int sig) > > { > > longjmp(env, 1); > > } > > > > int > > somenslookup(...) > > { > > ... > > alarm(5); /* Wait a maximum of 5 seconds. */ > > if (setjmp(env) == 0) { > > nsdispatch(...); > > return (0); > > } > > return (-1); > > } > > > > Now perhaps this isn't the best example; imagine using > > something that used malloc()/free() or any one of the other > > locked libc functions. There is also the use of application > > level-locks which should work similarly, but by using libc > > example I avoid any argument about "the application shouldn't > > be doing that" :-) > > If I understand correctly what you are saying is if the alarm fires before > somenslookup() returns successfully, the longjmp will make it return > unsuccessfully, and if it happened to be put on a queue in nsdispatch() it will > return to its caller still on the queue (which we clearly don't want). Right. > How is this any different than holding some other resource? For example, if > it had managed to aquire the lock before the longjmp from the signal handler? > See below for what I mean. Libc mutexes and CVs use the single underscore version so our threads library can tell the difference between implementation locks and application locks. We should be deferring signals while _holding_ one of these locks. Mutexes in libc should be very short term, so deferring signals while being held should be OK. But for CV's, you could be on the CV queue for some time, and you shouldn't be holding up signal delivery because of that. My rule of thumb is to treat all locks the same and don't keep threads on waiting queues of any sort while running signal handlers. But once a lock is held, that's a bit different because the application could longjmp() out of the locked region and never release the lock. We should probably defer signal delivery while implementation mutexes, spinlocks, and low-level locks are held. > > It's also possible that the thread calls the same set > > of libc functions again, and if it isn't removed from > > the internal mutex queue, then you'd get the exact > > error message Marcel was seeing (already on mutexq). > > I'm glad you brought this last point up because that has been on my mind as > well. Let's say an application is in one of the locked libc functions, receives > a signal and calls that same libc function from it's signal handler. > Furthermore, let's say that the thread had just aquired a lock in the libc > funtion before handling the signal. When this thread called the same libc > function from the signal handler and tried to aquire the same lock again, would > it not deadlock against itself? Yes, I think it would. > So, the problem is not that the mutex/cond queues are special. It's that locking > from within libc in general is special, right? For the most part, yes. I don't differentiate between being on an application mutex/CV's queue or on an implementation mutex/CV's queue, though. I always treat them the same and remove the thread from the synchronization object's waiting queue before delivering the signal, then reinsert the thread again if the signal handler returns normally. If you don't do that, then your queues become corrupted. > I guess part of what has me confused is why the queues are being treated so > specially. I think the problem is one of general re-entrancy for library > functions designated async or thread safe. If so, then once I have a proper > handle on the issues I can start addressing them in libthr. I do recall that when I was working on libc_r, these issues always came up, and the simplest way to fix them (or work around them) was to always make sure the queues were consistent and always remove threads from these queues before calling signal handlers. -- Dan Eischen