Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 27 Oct 2004 14:29:14 -0400
From:      John Baldwin <jhb@FreeBSD.org>
To:        Daniel Eischen <deischen@FreeBSD.org>
Cc:        threads@FreeBSD.org
Subject:   Re: Infinite loop bug in libc_r on 4.x with condition variables and signals
Message-ID:  <200410271429.14164.jhb@FreeBSD.org>
In-Reply-To: <Pine.GSO.4.43.0410211827060.24152-100000@sea.ntplx.net>
References:  <Pine.GSO.4.43.0410211827060.24152-100000@sea.ntplx.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thursday 21 October 2004 07:04 pm, Daniel Eischen wrote:
> On Thu, 21 Oct 2004, John Baldwin wrote:
> > The behavior seems more to be this:
> >
> > - thread does pthread_cond_wait*(c1)
> > - thread enqueued on c1
> > - thread interrupted by a signal while on c1 but still in PS_RUNNING
>
> This shouldn't happen when signals are deferred.  It should
> only happen when the state is PS_COND_WAIT after we've
> context switched to the scheduler.
>
> > - thread saves state which excludes the PTHREAD_FLAGS_IN_CONDQ flag
> > (among others)
>
> Right, because it assumes that the thread will be backed out of
> any mutex or CV queues prior to invoking the signal handler.
>
> > - thread calls _cond_wait_backout() if state is PS_COND_WAIT (but it's
> > not in - this case, this is the normal case though, which is why it's ok
> > to not save the CONDQ flag in the saved state above)
>
> Right.  The problem is, how is the thread getting setup for a signal
> while signals are deferred and the state has not yet been changed
> from PS_RUNNING to PS_COND_WAIT?
>
> > - thread executes signal handler
> > - thread restores state
> > - pthread_condwait*() see that interrupted is 0, so don't try to remove
> > the thread from the condition variable (also, PTHREAD_FLAGS_IN_CONDQ
> > isn't set either, so we can't detect this case that way)
> > - thread returns from pthread_cond_wait() (maybe due to timeout, etc.)
> > - thread calls pthread_cond_wait*(c2)
> > - thread enqueued on c2
> > - another thread does pthread_cond_broadcast(c2), and bewm
> >
> > My question is is it possible for the thread to get interrupted and
> > chosen to run a signal while it is on c1 somehow given my patch to defer
> > signals around the wait loops (and is that patch correct btw given the
> > above scenario?)
>
> Yes (and yes I think).  Defering signals just means that the signal handler
> won't try to install a signal frame on the current thread; instead it just
> queues the signal and the scheduler will pick it up and send it to the
> correct thread.
>
> I do think signals should be deferred for condition variables so
> that setting the thread state (to PS_COND_WAIT) is atomic.
>
> It's not obvious to be where the bug is.  If you had a simple
> test case to reproduce it that would help.

FWIW, we are having (I think) the same problem on 5.3 with libpthread.  The 
panic there is in the mutex code about an assertion failing because a thread 
is on a syncq when it is not supposed to be.

-- 
John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200410271429.14164.jhb>