Date: Sat, 10 Nov 2001 10:09:14 -0500 (EST) From: Daniel Eischen <eischen@pcnet1.pcnet.com> To: rittle@labs.mot.com Cc: freebsd-hackers@FreeBSD.ORG, current@FreeBSD.ORG Subject: Re: Report on FreeBSD 4.4 pthread implementation verses boehm-gc Message-ID: <Pine.SUN.3.91.1011110095953.20896A-100000@pcnet1.pcnet.com> In-Reply-To: <200111090250.fA92o7h55180@latour.rsch.comm.mot.com>
next in thread | previous in thread | raw e-mail | index | archive | help
[ Followups to -current ]
On Thu, 8 Nov 2001, Loren James Rittle wrote:
> Hello all,
>
> I have ported the most recent version of boehm-gc (6.1-alpha) to
> FreeBSD/i386 under the auspice of the gcc project (it will be in Hans'
> 6.1 release and it is on the gcc mainline). I got one notable thing
> fully configured beyond what is in the ports tree (which is based on
> 6.0): threaded GC is now supported. However, this work has uncovered
> either a rare race condition in the 4.X pthread implementation (also
> seen on a current 5.0 system) or a bad assumption in the GC signal
> code (abstracted below). Either way, the result seen is an undetected
> deadlock. With the following new assertion, I can at least force the
> condition to be detectable in many cases where it would have locked up.
>
> Two questions come to mind: Is there any condition under which my new
> assumption should not be true? Is there any obvious mistake that a
> threaded application can make (perhaps related to its signal use) that
> could cause the new assumption to ever be violated?
>
> I have also seen what I thought was a less important issue, but I now
> see that it is probably related. After reviewing the FreeBSD uthread
> source code, the issue appears to be a race between the pthread_exit()
> code running in one thread and the pthread_join() code running in
> another thread in conjunction with a sigsuspend() call occurring on a
> signal handler of that second thread. Under some conditions, an
> errant EINTR would be returned to the pthread_join() caller instead of
> the exit code from the terminated thread. Under other timing
> conditions, you get the deadlock spotted with the above new assertion.
Try the following patch; this is to -current, you'll have to massage
it a bit for -stable (Hint: s/curthread/_thread_run/ in -stable).
--
Dan Eischen
Index: uthread/pthread_private.h
===================================================================
RCS file: /opt/d/CVS/src/lib/libc_r/uthread/pthread_private.h,v
retrieving revision 1.63
diff -u -r1.63 pthread_private.h
--- uthread/pthread_private.h 26 Oct 2001 21:19:22 -0000 1.63
+++ uthread/pthread_private.h 10 Nov 2001 14:35:48 -0000
@@ -601,6 +601,11 @@
/* XXX - What about thread->timeout and/or thread->error? */
};
+struct join_status {
+ struct pthread *thread;
+ int ret;
+ int error;
+};
/*
* Normally thread contexts are stored as jmp_bufs via
_setjmp()/_longjmp(),
@@ -757,8 +762,12 @@
*/
int error;
- /* Pointer to a thread that is waiting to join (NULL if no joiner). */
- struct pthread *joiner;
+ /*
+ * The joiner is the thread that is joining to this thread. The
+ * join status keeps track of a join operation to another thread.
+ */
+ struct pthread *joiner;
+ struct join_status join_status;
/*
* The current thread can belong to only one scheduling queue at
Index: uthread/uthread_exit.c
===================================================================
RCS file: /opt/d/CVS/src/lib/libc_r/uthread/uthread_exit.c,v
retrieving revision 1.23
diff -u -r1.23 uthread_exit.c
--- uthread/uthread_exit.c 20 May 2001 23:08:32 -0000 1.23
+++ uthread/uthread_exit.c 10 Nov 2001 14:36:50 -0000
@@ -220,8 +220,9 @@
}
/* Set the return value for the joining thread: */
- pthread->ret = curthread->ret;
- pthread->error = 0;
+ pthread->join_status.ret = curthread->ret;
+ pthread->join_status.error = 0;
+ pthread->join_status.thread = NULL;
/* Make this thread collectable by the garbage collector. */
PTHREAD_ASSERT(((curthread->attr.flags & PTHREAD_DETACHED) ==
Index: uthread/uthread_join.c
===================================================================
RCS file: /opt/d/CVS/src/lib/libc_r/uthread/uthread_join.c,v
retrieving revision 1.19
diff -u -r1.19 uthread_join.c
--- uthread/uthread_join.c 16 Aug 2001 06:31:32 -0000 1.19
+++ uthread/uthread_join.c 10 Nov 2001 14:36:14 -0000
@@ -122,18 +122,20 @@
pthread->joiner = curthread;
/* Keep track of which thread we're joining to: */
- curthread->data.thread = pthread;
+ curthread->join_status.thread = pthread;
- /* Schedule the next thread: */
- _thread_kern_sched_state(PS_JOIN, __FILE__, __LINE__);
+ while (curthread->join_status.thread == pthread) {
+ /* Schedule the next thread: */
+ _thread_kern_sched_state(PS_JOIN, __FILE__, __LINE__);
+ }
/*
* The thread return value and error are set by the thread we're
* joining to when it exits or detaches:
*/
- ret = curthread->error;
+ ret = curthread->join_status.error;
if ((ret == 0) && (thread_return != NULL))
- *thread_return = curthread->ret;
+ *thread_return = curthread->join_status.ret;
} else {
/*
* The thread exited (is dead) without being detached, and no
Index: uthread/uthread_sig.c
===================================================================
RCS file: /opt/d/CVS/src/lib/libc_r/uthread/uthread_sig.c,v
retrieving revision 1.38
diff -u -r1.38 uthread_sig.c
--- uthread/uthread_sig.c 29 Jun 2001 17:09:07 -0000 1.38
+++ uthread/uthread_sig.c 10 Nov 2001 14:28:09 -0000
@@ -671,7 +671,6 @@
* signal handler to run:
*/
case PS_COND_WAIT:
- case PS_JOIN:
case PS_MUTEX_WAIT:
/*
* Remove the thread from the wait queue. It will
@@ -679,6 +678,17 @@
* handlers have been invoked.
*/
PTHREAD_WAITQ_REMOVE(pthread);
+ break;
+
+ case PS_JOIN:
+ /*
+ * Remove the thread from the wait queue. It will
+ * be added back to the wait queue once all signal
+ * handlers have been invoked.
+ */
+ PTHREAD_WAITQ_REMOVE(pthread);
+ /* Make the thread runnable: */
+ PTHREAD_SET_STATE(pthread, PS_RUNNING);
break;
/*
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.SUN.3.91.1011110095953.20896A-100000>
