From owner-freebsd-current Sat Nov 10 7:10:33 2001 Delivered-To: freebsd-current@freebsd.org Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3]) by hub.freebsd.org (Postfix) with ESMTP id D604B37B41F; Sat, 10 Nov 2001 07:10:20 -0800 (PST) Received: (from eischen@localhost) by pcnet1.pcnet.com (8.12.1/8.12.1) id fAAF9GvQ022537; Sat, 10 Nov 2001 10:09:16 -0500 (EST) Date: Sat, 10 Nov 2001 10:09:14 -0500 (EST) From: Daniel Eischen To: rittle@labs.mot.com Cc: freebsd-hackers@FreeBSD.ORG, current@FreeBSD.ORG Subject: Re: Report on FreeBSD 4.4 pthread implementation verses boehm-gc In-Reply-To: <200111090250.fA92o7h55180@latour.rsch.comm.mot.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG [ Followups to -current ] On Thu, 8 Nov 2001, Loren James Rittle wrote: > Hello all, > > I have ported the most recent version of boehm-gc (6.1-alpha) to > FreeBSD/i386 under the auspice of the gcc project (it will be in Hans' > 6.1 release and it is on the gcc mainline). I got one notable thing > fully configured beyond what is in the ports tree (which is based on > 6.0): threaded GC is now supported. However, this work has uncovered > either a rare race condition in the 4.X pthread implementation (also > seen on a current 5.0 system) or a bad assumption in the GC signal > code (abstracted below). Either way, the result seen is an undetected > deadlock. With the following new assertion, I can at least force the > condition to be detectable in many cases where it would have locked up. > > Two questions come to mind: Is there any condition under which my new > assumption should not be true? Is there any obvious mistake that a > threaded application can make (perhaps related to its signal use) that > could cause the new assumption to ever be violated? > > I have also seen what I thought was a less important issue, but I now > see that it is probably related. After reviewing the FreeBSD uthread > source code, the issue appears to be a race between the pthread_exit() > code running in one thread and the pthread_join() code running in > another thread in conjunction with a sigsuspend() call occurring on a > signal handler of that second thread. Under some conditions, an > errant EINTR would be returned to the pthread_join() caller instead of > the exit code from the terminated thread. Under other timing > conditions, you get the deadlock spotted with the above new assertion. Try the following patch; this is to -current, you'll have to massage it a bit for -stable (Hint: s/curthread/_thread_run/ in -stable). -- Dan Eischen Index: uthread/pthread_private.h =================================================================== RCS file: /opt/d/CVS/src/lib/libc_r/uthread/pthread_private.h,v retrieving revision 1.63 diff -u -r1.63 pthread_private.h --- uthread/pthread_private.h 26 Oct 2001 21:19:22 -0000 1.63 +++ uthread/pthread_private.h 10 Nov 2001 14:35:48 -0000 @@ -601,6 +601,11 @@ /* XXX - What about thread->timeout and/or thread->error? */ }; +struct join_status { + struct pthread *thread; + int ret; + int error; +}; /* * Normally thread contexts are stored as jmp_bufs via _setjmp()/_longjmp(), @@ -757,8 +762,12 @@ */ int error; - /* Pointer to a thread that is waiting to join (NULL if no joiner). */ - struct pthread *joiner; + /* + * The joiner is the thread that is joining to this thread. The + * join status keeps track of a join operation to another thread. + */ + struct pthread *joiner; + struct join_status join_status; /* * The current thread can belong to only one scheduling queue at Index: uthread/uthread_exit.c =================================================================== RCS file: /opt/d/CVS/src/lib/libc_r/uthread/uthread_exit.c,v retrieving revision 1.23 diff -u -r1.23 uthread_exit.c --- uthread/uthread_exit.c 20 May 2001 23:08:32 -0000 1.23 +++ uthread/uthread_exit.c 10 Nov 2001 14:36:50 -0000 @@ -220,8 +220,9 @@ } /* Set the return value for the joining thread: */ - pthread->ret = curthread->ret; - pthread->error = 0; + pthread->join_status.ret = curthread->ret; + pthread->join_status.error = 0; + pthread->join_status.thread = NULL; /* Make this thread collectable by the garbage collector. */ PTHREAD_ASSERT(((curthread->attr.flags & PTHREAD_DETACHED) == Index: uthread/uthread_join.c =================================================================== RCS file: /opt/d/CVS/src/lib/libc_r/uthread/uthread_join.c,v retrieving revision 1.19 diff -u -r1.19 uthread_join.c --- uthread/uthread_join.c 16 Aug 2001 06:31:32 -0000 1.19 +++ uthread/uthread_join.c 10 Nov 2001 14:36:14 -0000 @@ -122,18 +122,20 @@ pthread->joiner = curthread; /* Keep track of which thread we're joining to: */ - curthread->data.thread = pthread; + curthread->join_status.thread = pthread; - /* Schedule the next thread: */ - _thread_kern_sched_state(PS_JOIN, __FILE__, __LINE__); + while (curthread->join_status.thread == pthread) { + /* Schedule the next thread: */ + _thread_kern_sched_state(PS_JOIN, __FILE__, __LINE__); + } /* * The thread return value and error are set by the thread we're * joining to when it exits or detaches: */ - ret = curthread->error; + ret = curthread->join_status.error; if ((ret == 0) && (thread_return != NULL)) - *thread_return = curthread->ret; + *thread_return = curthread->join_status.ret; } else { /* * The thread exited (is dead) without being detached, and no Index: uthread/uthread_sig.c =================================================================== RCS file: /opt/d/CVS/src/lib/libc_r/uthread/uthread_sig.c,v retrieving revision 1.38 diff -u -r1.38 uthread_sig.c --- uthread/uthread_sig.c 29 Jun 2001 17:09:07 -0000 1.38 +++ uthread/uthread_sig.c 10 Nov 2001 14:28:09 -0000 @@ -671,7 +671,6 @@ * signal handler to run: */ case PS_COND_WAIT: - case PS_JOIN: case PS_MUTEX_WAIT: /* * Remove the thread from the wait queue. It will @@ -679,6 +678,17 @@ * handlers have been invoked. */ PTHREAD_WAITQ_REMOVE(pthread); + break; + + case PS_JOIN: + /* + * Remove the thread from the wait queue. It will + * be added back to the wait queue once all signal + * handlers have been invoked. + */ + PTHREAD_WAITQ_REMOVE(pthread); + /* Make the thread runnable: */ + PTHREAD_SET_STATE(pthread, PS_RUNNING); break; /* To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message