From owner-freebsd-threads@FreeBSD.ORG  Fri Jun 27 16:10:27 2003
Return-Path: <owner-freebsd-threads@FreeBSD.ORG>
Delivered-To: freebsd-threads@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id A072737B401
	for <freebsd-threads@freebsd.org>;
	Fri, 27 Jun 2003 16:10:27 -0700 (PDT)
Received: from mail.pcnet.com (mail.pcnet.com [204.213.232.4])
	by mx1.FreeBSD.org (Postfix) with ESMTP id DF86744005
	for <freebsd-threads@freebsd.org>;
	Fri, 27 Jun 2003 16:10:26 -0700 (PDT)
	(envelope-from eischen@vigrid.com)
Received: from mail.pcnet.com (mail.pcnet.com [204.213.232.4])
	by mail.pcnet.com (8.12.8/8.12.1) with ESMTP id h5RNAPXh018491;
	Fri, 27 Jun 2003 19:10:25 -0400 (EDT)
Date: Fri, 27 Jun 2003 19:10:25 -0400 (EDT)
From: Daniel Eischen <eischen@vigrid.com>
X-Sender: eischen@pcnet5.pcnet.com
To: Mike Makonnen <mtm@identd.net>
In-Reply-To: <20030627224228.SLIN12592.out001.verizon.net@kokeb.ambesa.net>
Message-ID: <Pine.GSO.4.10.10306271845300.14664-100000@pcnet5.pcnet.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: freebsd-threads@freebsd.org
cc: marcel@xcllnt.net
Subject: Re: libkse / libthr bugs?
X-BeenThere: freebsd-threads@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
Reply-To: deischen@freebsd.org
List-Id: Threading on FreeBSD <freebsd-threads.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-threads>,
	<mailto:freebsd-threads-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-threads>
List-Post: <mailto:freebsd-threads@freebsd.org>
List-Help: <mailto:freebsd-threads-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-threads>,
	<mailto:freebsd-threads-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 27 Jun 2003 23:10:28 -0000

On Fri, 27 Jun 2003, Mike Makonnen wrote:

> On Fri, 27 Jun 2003 07:14:43 -0400 (EDT)
> Daniel Eischen <eischen@vigrid.com> wrote:
> 
> > 
> > To answer your first question, yes, I think it's necessary.
> > Here's an example:
> > 
> > 	void
> > 	sigalarmhandler(int sig)
> > 	{
> > 		longjmp(env, 1);
> > 	}
> > 
> > 	int
> > 	somenslookup(...)
> > 	{
> > 		...
> > 		alarm(5);	/* Wait a maximum of 5 seconds. */
> > 		if (setjmp(env) == 0) {
> > 			nsdispatch(...);
> > 			return (0);
> > 		}
> > 		return (-1);
> > 	}
> > 
> > Now perhaps this isn't the best example; imagine using
> > something that used malloc()/free() or any one of the other
> > locked libc functions.  There is also the use of application
> > level-locks which should work similarly, but by using libc
> > example I avoid any argument about "the application shouldn't
> > be doing that" :-)
> 
> If I understand correctly what you are saying is if the alarm fires before
> somenslookup() returns successfully, the longjmp will make it return
> unsuccessfully, and if it happened to be put on a queue in nsdispatch() it will
> return to its caller still on the queue (which we clearly don't want). 

Right.

> How is this any different than holding some other resource? For example, if
> it had managed to aquire the lock before the longjmp from the signal handler?
> See below for what I mean.

Libc mutexes and CVs use the single underscore version
so our threads library can tell the difference between
implementation locks and application locks.  We should
be deferring signals while _holding_ one of these locks.
Mutexes in libc should be very short term, so deferring
signals while being held should be OK.  But for CV's,
you could be on the CV queue for some time, and you
shouldn't be holding up signal delivery because of
that.

My rule of thumb is to treat all locks the same and
don't keep threads on waiting queues of any sort while
running signal handlers.  But once a lock is held,
that's a bit different because the application could
longjmp() out of the locked region and never release
the lock.  We should probably defer signal delivery
while implementation mutexes, spinlocks, and low-level
locks are held.

> > It's also possible that the thread calls the same set
> > of libc functions again, and if it isn't removed from
> > the internal mutex queue, then you'd get the exact
> > error message Marcel was seeing (already on mutexq).
> 
> I'm glad you brought this last point up because that has been on my mind as
> well. Let's say an application is in one of the locked libc functions, receives
> a signal and calls that same libc function from it's signal handler.
> Furthermore, let's say that the thread had just aquired a lock in the libc
> funtion before handling the signal. When this thread called the same libc
> function from the signal handler and tried to aquire the same lock again, would
> it not deadlock against itself?

Yes, I think it would.

> So, the problem is not that the mutex/cond queues are special. It's that locking
> from within libc in general is special, right?

For the most part, yes.  I don't differentiate between being on
an application mutex/CV's queue or on an implementation mutex/CV's
queue, though.  I always treat them the same and remove the thread
from the synchronization object's waiting queue before delivering
the signal, then reinsert the thread again if the signal handler
returns normally.  If you don't do that, then your queues become
corrupted.

> I guess part of what has me confused is why the queues are being treated so
> specially. I think the problem is one of general re-entrancy for library
> functions designated async or thread safe.  If so, then once I have a proper
> handle on the issues I can start addressing them in libthr.

I do recall that when I was working on libc_r, these issues
always came up, and the simplest way to fix them (or work
around them) was to always make sure the queues were consistent
and always remove threads from these queues before calling
signal handlers.

-- 
Dan Eischen