Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 27 Oct 2004 18:30:15 -0400 (EDT)
From:      Daniel Eischen <deischen@freebsd.org>
To:        John Baldwin <jhb@freebsd.org>
Cc:        threads@freebsd.org
Subject:   Re: Infinite loop bug in libc_r on 4.x with condition variables and signals
Message-ID:  <Pine.GSO.4.43.0410271826590.239-100000@sea.ntplx.net>
In-Reply-To: <200410271429.14164.jhb@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 27 Oct 2004, John Baldwin wrote:

> On Thursday 21 October 2004 07:04 pm, Daniel Eischen wrote:
> > On Thu, 21 Oct 2004, John Baldwin wrote:
> > > The behavior seems more to be this:
> > >
> > > - thread does pthread_cond_wait*(c1)
> > > - thread enqueued on c1
> > > - thread interrupted by a signal while on c1 but still in PS_RUNNING
> >
> > This shouldn't happen when signals are deferred.  It should
> > only happen when the state is PS_COND_WAIT after we've
> > context switched to the scheduler.
> >
> > > - thread saves state which excludes the PTHREAD_FLAGS_IN_CONDQ flag
> > > (among others)
> >
> > Right, because it assumes that the thread will be backed out of
> > any mutex or CV queues prior to invoking the signal handler.
> >
> > > - thread calls _cond_wait_backout() if state is PS_COND_WAIT (but it's
> > > not in - this case, this is the normal case though, which is why it's ok
> > > to not save the CONDQ flag in the saved state above)
> >
> > Right.  The problem is, how is the thread getting setup for a signal
> > while signals are deferred and the state has not yet been changed
> > from PS_RUNNING to PS_COND_WAIT?
> >
> > > - thread executes signal handler
> > > - thread restores state
> > > - pthread_condwait*() see that interrupted is 0, so don't try to remove
> > > the thread from the condition variable (also, PTHREAD_FLAGS_IN_CONDQ
> > > isn't set either, so we can't detect this case that way)
> > > - thread returns from pthread_cond_wait() (maybe due to timeout, etc.)
> > > - thread calls pthread_cond_wait*(c2)
> > > - thread enqueued on c2
> > > - another thread does pthread_cond_broadcast(c2), and bewm
> > >
> > > My question is is it possible for the thread to get interrupted and
> > > chosen to run a signal while it is on c1 somehow given my patch to defer
> > > signals around the wait loops (and is that patch correct btw given the
> > > above scenario?)
> >
> > Yes (and yes I think).  Defering signals just means that the signal handler
> > won't try to install a signal frame on the current thread; instead it just
> > queues the signal and the scheduler will pick it up and send it to the
> > correct thread.
> >
> > I do think signals should be deferred for condition variables so
> > that setting the thread state (to PS_COND_WAIT) is atomic.
> >
> > It's not obvious to be where the bug is.  If you had a simple
> > test case to reproduce it that would help.
>
> FWIW, we are having (I think) the same problem on 5.3 with libpthread.  The
> panic there is in the mutex code about an assertion failing because a thread
> is on a syncq when it is not supposed to be.

David and I recently fixed some races in pthread_join() and
pthread_exit() in -current libpthread.  Don't know if those
were responsible...

Here's a test program that shows correct behavior with both
libc_r and libpthread in -current.

-- 
Dan Eischen

#include <pthread.h>
#include <signal.h>
#include <stdio.h>
#include <unistd.h>

pthread_mutex_t gm;

static void
handler(int sig)
{
	printf("Thread %p Got signal %d\n", pthread_self(), sig);
}

static void *
waiter(void *arg)
{
	sigset_t set;

	sigemptyset(&set);
	sigaddset(&set, SIGUSR1);
	sigprocmask(SIG_UNBLOCK, &set, NULL);

	for (;;) {
		pthread_mutex_lock(&gm);
		printf("Waiter locked mutex.");
		sleep(1);
		pthread_mutex_unlock(&gm);
		printf("Waiter unlocked mutex.");
		sleep(1);
	}
	return (NULL);
}

int
main(int argc, char *argv[])
{
	struct sigaction act;
	sigset_t set;
	pthread_t tid;

	sigemptyset(&set);
	sigaddset(&set, SIGUSR1);
	sigprocmask(SIG_BLOCK, &set, NULL);

	/* Install a handler for SIGUSR1. */
	act.sa_handler = handler;
	act.sa_flags = SA_RESTART;
	sigfillset(&act.sa_mask);
	sigaction(SIGUSR1, &act, NULL);

	pthread_mutex_init(&gm, NULL);

	pthread_mutex_lock(&gm);
	pthread_create(&tid, NULL, waiter, NULL);
	for (;;) {
		kill(getpid(), SIGUSR1);
		pthread_yield();
	}
	return (0);
}





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.GSO.4.43.0410271826590.239-100000>