Date: Thu, 25 May 2006 10:49:33 -0400 From: Kurt Miller <kurt@intricatesoftware.com> To: freebsd-threads@freebsd.org, Daniel Eischen <deischen@freebsd.org> Subject: Re: pthread_cond_signal w/suspended threads Message-ID: <200605251049.34486.kurt@intricatesoftware.com> In-Reply-To: <Pine.GSO.4.64.0605241900060.29100@sea.ntplx.net> References: <200605241814.11452.lists@intricatesoftware.com> <Pine.GSO.4.64.0605241900060.29100@sea.ntplx.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wednesday 24 May 2006 7:07 pm, Daniel Eischen wrote: > On Wed, 24 May 2006, Kurt Miller wrote: > > > I've been working on an intermittent deadlock in > > the jvm and have isolated it to the following issue. > > When multiple threads are waiting on a condition > > variable and some of those threads have been suspended > > with pthread_suspend_np(), there is an expectation that > > pthread_cond_signal() will signal an unsuspended thread > > first. However, the following program shows this is not > > the case on 6.1-release/kse (thr works as expected). > > > > Can kse behavior be changed to signal unsuspended > > threads first? > > Relying on this behavior isn't exactly portable, and what > if there are no unsuspended threads when the signal occurs. > A suspended thread would still be signaled in that case. I agree its not portable, but we are talking about a non portable function (pthread_suspend_np). :-) Since we have some latitude here, doesn't it make sense to have our pthread_suspend_np be feature compatible with Solaris? At least in the case of the jvm it would help a bit if this behavior matched Solaris. Below is a diff that makes the behavior change for your consideration. Also the test program converted for Solaris is after that. As it turns out there are two deadlocks I've found. The one described here appears to be hard to reproduce and occurs upon jvm shutdown. In fact I haven't been able to reproduce it again. The other deadlock I'll post under another thread when I make a C program to reproduce it and if it appears to be a thread issue. > The whole suspended threads thing is kind of dangerous > anyways. If one of these threads hold a lock, or is > waiting in some other queue, then deadlock can occur. > I think you need a way of waiting until they are out > of critical regions before you suspend them in order > for this not to cause problems. Sure, I agree. Right now the only alternative I see is to not use pthread_suspend/resume_np and do thread suspension with signals the way hotspot does it for linux. I'd perfer not to do that if possible. --- thr_cond.c.orig Wed May 24 21:36:18 2006 +++ thr_cond.c Thu May 25 10:44:46 2006 @@ -596,13 +596,23 @@ (*cond)->c_seqno++; /* + * Wakeup the first unsuspended thread, otherwise + * the first suspended. Matches Solaris behavior. + */ + pthread = TAILQ_FIRST(&(*cond)->c_queue); + while (pthread != NULL && + (pthread->flags & THR_FLAGS_SUSPENDED) != 0) + pthread = TAILQ_NEXT(pthread, sqe); + if (pthread == NULL) + pthread = TAILQ_FIRST(&(*cond)->c_queue); + + /* * Wakeups have to be done with the CV lock held; * otherwise there is a race condition where the * thread can timeout, run on another KSE, and enter * another blocking state (including blocking on a CV). */ - if ((pthread = TAILQ_FIRST(&(*cond)->c_queue)) - != NULL) { + if (pthread != NULL) { THR_SCHED_LOCK(curthread, pthread); cond_queue_remove(*cond, pthread); pthread->sigbackout = NULL; #include <stdio.h> #include <unistd.h> #include <sched.h> #include <thread.h> cond_t cond; mutex_t mutex; static void * start_routine(void *arg) { mutex_lock(&mutex); cond_wait(&cond, &mutex); printf("unblocked %d\n", (int)arg); mutex_unlock(&mutex); return (0); } int main(int argc, char *argv[]) { thread_t thread1; thread_t thread2; cond_init(&cond, NULL, NULL); mutex_init(&mutex, NULL, NULL); thr_create(NULL, NULL, start_routine, (void *)1, NULL, &thread1); sleep(1); thr_create(NULL, NULL, start_routine, (void *)2, NULL, &thread2); sleep(1); thr_suspend(thread1); cond_signal(&cond); sleep(1); thr_continue(thread1); sleep(1); return (0); }
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200605251049.34486.kurt>