From owner-freebsd-threads@FreeBSD.ORG Mon Oct 13 11:06:58 2008 Return-Path: Delivered-To: freebsd-threads@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4D39B1065698 for ; Mon, 13 Oct 2008 11:06:58 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 301B38FC1C for ; Mon, 13 Oct 2008 11:06:58 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id m9DB6wjf029594 for ; Mon, 13 Oct 2008 11:06:58 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.3/8.14.3/Submit) id m9DB6vAn029590 for freebsd-threads@FreeBSD.org; Mon, 13 Oct 2008 11:06:57 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 13 Oct 2008 11:06:57 GMT Message-Id: <200810131106.m9DB6vAn029590@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-threads@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-threads@FreeBSD.org X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Oct 2008 11:06:58 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o threa/127225 threads bug in lib/libthr/thread/thr_init.c o threa/126950 threads [patch] rtld(1): rtld malloc is thread-unsafe o kern/126128 threads [patch] pthread_condattr_getpshared is broken o threa/122923 threads 'nice' does not prevent background process from steali o threa/121336 threads lang/neko threading ok on UP, broken on SMP (FreeBSD 7 o threa/118715 threads kse problem o threa/116668 threads can no longer use jdk15 with libthr on -stable SMP o threa/116181 threads /dev/io-related io access permissions are not propagat o threa/115211 threads pthread_atfork misbehaves in initial thread o threa/110636 threads [request] gdb(1): using gdb with multi thread applicat o threa/110306 threads apache 2.0 segmentation violation when calling gethost o threa/103975 threads Implicit loading/unloading of libpthread.so may crash o threa/101323 threads fork(2) in threaded programs broken. s threa/100815 threads FBSD 5.5 broke nanosleep in libc_r s threa/94467 threads send(), sendto() and sendmsg() are not correct in libc s threa/84483 threads problems with devel/nspr and -lc_r on 4.x o threa/83914 threads [libc] popen() doesn't work in static threaded program o threa/80992 threads abort() sometimes not caught by gdb depending on threa o threa/80435 threads panic on high loads o threa/79887 threads [patch] freopen() isn't thread-safe o threa/79683 threads svctcp_create() fails if multiple threads call at the s threa/76694 threads fork cause hang in dup()/close() function in child (-l s threa/76690 threads fork hang in child for -lc_r o threa/75374 threads pthread_kill() ignores SA_SIGINFO flag o threa/75273 threads FBSD 5.3 libpthread (KSE) bug o threa/72953 threads fork() unblocks blocked signals w/o PTHREAD_SCOPE_SYST o threa/70975 threads [sysvipc] unexpected and unreliable behaviour when usi s threa/69020 threads pthreads library leaks _gc_mutex s threa/49087 threads Signals lost in programs linked with libc_r s threa/48856 threads Setting SIGCHLD to SIG_IGN still leaves zombies under s threa/40671 threads pthread_cancel doesn't remove thread from condition qu s threa/39922 threads [threads] [patch] Threaded applications executed with s threa/37676 threads libc_r: msgsnd(), msgrcv(), pread(), pwrite() need wra s threa/34536 threads accept() blocks other threads s kern/32295 threads [libc_r] [patch] pthread(3) dont dequeue signals s threa/30464 threads pthread mutex attributes -- pshared s threa/24632 threads libc_r delicate deviation from libc in handling SIGCHL s threa/24472 threads libc_r does not honor SO_SNDTIMEO/SO_RCVTIMEO socket o 38 problems total. From owner-freebsd-threads@FreeBSD.ORG Thu Oct 16 01:44:40 2008 Return-Path: Delivered-To: threads@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9B398106568C; Thu, 16 Oct 2008 01:44:40 +0000 (UTC) (envelope-from davidxu@freebsd.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 861918FC17; Thu, 16 Oct 2008 01:44:40 +0000 (UTC) (envelope-from davidxu@freebsd.org) Received: from apple.my.domain (root@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id m9G1icjh041026; Thu, 16 Oct 2008 01:44:39 GMT (envelope-from davidxu@freebsd.org) Message-ID: <48F69CF4.9050905@freebsd.org> Date: Thu, 16 Oct 2008 09:46:28 +0800 From: David Xu User-Agent: Thunderbird 2.0.0.9 (X11/20080612) MIME-Version: 1.0 To: Daniel Eischen References: <18668.10465.699531.162573@gromit.timing.com> <20081008045447.GY36572@elvis.mu.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: threads@freebsd.org, John Hein Subject: Re: pthread_cleanup_push & pthread_cleanup_pop usage X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Oct 2008 01:44:40 -0000 Daniel Eischen wrote: > On Tue, 7 Oct 2008, Alfred Perlstein wrote: > >> * John Hein [081007 21:45] wrote: >>> In June pthread_cleanup_push & pthread_cleanup_pop were changed to >>> macros >>> that look like so... >> >> Hey John, I found the same problem when working on QNX a while back, >> however that is really how it's supposed to be set up. >> >> I would suggest the following construct to fix the problem, >> make your own per-thread stack of destructors that are callable >> as functions and not macros. >> >> It's not too hard to do. >> >> Just use a pthread_key and pthread_once thingy to write a library >> to do it, shouldn't take more than a hundred lines of code. >> >> FWIW, OS X and QNX have the same set of macros, not sure about >> other OSes. > > Solaris as well. > > Just conditionally undef them before you use them. > > #ifdef pthread_cleanup_push > #undef pthread_cleanup_push > #endif > #ifdef pthread_cleanup_pop > #undef pthread_cleanup_pop > #endif > > The library versions are still there (they have to be in order > to be callable from non-C/C++ languages). > One of possible solutions is we define a C++ class in pthread.h: #ifdef __cplusplus class __pthread_cleanup_obj { void (*__f)(void *); void *__a; int __execeute; public: __pthread_cleanup_obj(void (*__cleanup_routine)(void *), void *__arg) { f = __cleanup_routine; a = __arg; __execute = 0; } ~__pthread_cleanup_obj() { if (__execute) __f(__a); } void __set_execute(int __e) { __execute = __e; } }; #define pthread_cleanup_push(f, a) { \ __pthread_cleanup_obj __cleanup(f, a); \ { #define pthread_cleanup_pop(e) \ __cleanup.__set_execute(e); \ } \ } #endif but because there is no specification for C++ and threading, it is unknown which behavior is desired. David Xu From owner-freebsd-threads@FreeBSD.ORG Fri Oct 17 16:50:02 2008 Return-Path: Delivered-To: freebsd-threads@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 78DDE10656AB for ; Fri, 17 Oct 2008 16:50:02 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 521B98FC0C for ; Fri, 17 Oct 2008 16:50:02 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id m9HGo2ia024161 for ; Fri, 17 Oct 2008 16:50:02 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.3/8.14.3/Submit) id m9HGo2ao024160; Fri, 17 Oct 2008 16:50:02 GMT (envelope-from gnats) Resent-Date: Fri, 17 Oct 2008 16:50:02 GMT Resent-Message-Id: <200810171650.m9HGo2ao024160@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-threads@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Kurt Miller Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3EB0D106568F for ; Fri, 17 Oct 2008 16:41:00 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21]) by mx1.freebsd.org (Postfix) with ESMTP id 2A19B8FC23 for ; Fri, 17 Oct 2008 16:41:00 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (localhost [127.0.0.1]) by www.freebsd.org (8.14.3/8.14.3) with ESMTP id m9HGexgD090894 for ; Fri, 17 Oct 2008 16:40:59 GMT (envelope-from nobody@www.freebsd.org) Received: (from nobody@localhost) by www.freebsd.org (8.14.3/8.14.3/Submit) id m9HGexJ1090893; Fri, 17 Oct 2008 16:40:59 GMT (envelope-from nobody) Message-Id: <200810171640.m9HGexJ1090893@www.freebsd.org> Date: Fri, 17 Oct 2008 16:40:59 GMT From: Kurt Miller To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-3.1 Cc: Subject: threads/128180: pthread_cond_broadcast() lost wakup X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Oct 2008 16:50:02 -0000 >Number: 128180 >Category: threads >Synopsis: pthread_cond_broadcast() lost wakup >Confidential: no >Severity: serious >Priority: high >Responsible: freebsd-threads >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Fri Oct 17 16:50:01 UTC 2008 >Closed-Date: >Last-Modified: >Originator: Kurt Miller >Release: 6.3-RELEASE >Organization: Intricate Software >Environment: FreeBSD fbsd-amd64-63.intricatesoftware.com 6.3-RELEASE FreeBSD 6.3-RELEASE #0: Wed Jan 16 01:43:02 UTC 2008 root@palmer.cse.buffalo.edu:/usr/obj/usr/src/sys/SMP amd64 >Description: I've been investigating a deadlock in the jvm that occurs with the concurrent mark sweep garbage collector. The cause appears to be due to the kernel failing to wake up all threads waiting on a condition variable. I have written a test program that mimics the jvm's underlying pattern. It reproduces the deadlock quickly and exhibits the same problem. The general idea is that one thread sends a broadcast to a group of worker threads. The worker threads perform some tasks, coordinate their completion and broadcast on the same condition variable they are done. The design is a bit heavy on the use of the one condition variable, however it does appear to be valid if not ideal. The deadlock occurs with the following system setup: 6.3-RELEASE SMP amd64 kernel libthr 2 or more cores I have not yet checked other releases or setups. The test program outputs periodic printf's indicating progress is being made. When it stops the process is deadlocked. The lost wakeup can be confirmed by inspecting the saved_waiters local var in main(). Each time the deadlock occurs I see that saved_waiters is 8 which tells me all eight worker threads were waiting on the condition variable when the broadcast was sent. Then switch to the thread that is still waiting on the condition variable, and you can see that the last_cycle local var is one behind the cycles global var which indicates it didn't receive the last wakeup. >How-To-Repeat: #include #include #include pthread_mutex_t group_mutex = PTHREAD_MUTEX_INITIALIZER; pthread_cond_t group_cond_var = PTHREAD_COND_INITIALIZER; volatile int tickets; volatile int waiters; volatile int finished; int term_count; volatile unsigned long cycles; void *thread_main(void * thread_num); #define NTHREADS 8 #define NYIELDS 1000 inline void atomicinc(volatile int* val) { __asm__ __volatile__ ("lock addl $1,(%0)" : : "r" (val) : "cc", "memory"); } int main( int argc, char *argv[] ) { long t_num; pthread_t tid[NTHREADS]; volatile int saved_waiters; /* startup threads */ for (t_num=0; t_num < NTHREADS; t_num++) { pthread_create( &tid[t_num], NULL, thread_main, (void *)t_num ); } for(;;) { /* monitor progress on stdout */ if (cycles % 5000 == 0) printf("cycles %lu\n", cycles); /* broadcast to workers to work */ pthread_mutex_lock(&group_mutex); cycles++; term_count = 0; finished = 0; tickets=NTHREADS; saved_waiters = waiters; pthread_cond_broadcast(&group_cond_var); pthread_mutex_unlock(&group_mutex); /* wait for workers to finish */ pthread_mutex_lock(&group_mutex); while (finished != NTHREADS) pthread_cond_wait(&group_cond_var, &group_mutex); pthread_mutex_unlock(&group_mutex); } return 0; } void * thread_main(void *thread_num) { unsigned long yield_count=0; unsigned long sleep_count=0; u_int32_t i, busy_loop = arc4random() & 0x7FFF; u_int32_t dummy = busy_loop; pthread_cond_t sleep_cond_var; pthread_mutex_t sleep_mutex; struct timeval tmptime; struct timeval delay = {0, 1}; struct timespec waketime; volatile unsigned long last_cycle; pthread_mutex_init(&sleep_mutex, NULL); pthread_cond_init(&sleep_cond_var, NULL); for (;;) { pthread_mutex_lock(&group_mutex); waiters++; while (tickets == 0) pthread_cond_wait(&group_cond_var, &group_mutex); waiters--; tickets--; last_cycle = cycles; pthread_mutex_unlock(&group_mutex); /* do something busy */ for (i = 0; i < busy_loop; i++) dummy *= i; /* sync termination */ atomicinc(&term_count); for(;;) { if (term_count == NTHREADS) break; if (yield_count < NYIELDS) { yield_count++; sched_yield(); } else { yield_count = 0; sleep_count++; // 1.6 uses pthread_cond_timedwait for sleeping gettimeofday(&tmptime, NULL); timeradd(&tmptime, &delay, &tmptime); waketime.tv_sec = tmptime.tv_sec; waketime.tv_nsec = tmptime.tv_usec * 1000; pthread_mutex_lock(&sleep_mutex); pthread_cond_timedwait(&sleep_cond_var, &sleep_mutex, &waketime); pthread_mutex_unlock(&sleep_mutex); } } /* ok all terminated now let everyone know */ pthread_mutex_lock(&group_mutex); finished++; pthread_cond_broadcast(&group_cond_var); pthread_mutex_unlock(&group_mutex); } return NULL; } >Fix: >Release-Note: >Audit-Trail: >Unformatted: From owner-freebsd-threads@FreeBSD.ORG Fri Oct 17 20:00:13 2008 Return-Path: Delivered-To: freebsd-threads@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1580F106568B for ; Fri, 17 Oct 2008 20:00:13 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id DEBEF8FC24 for ; Fri, 17 Oct 2008 20:00:12 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id m9HK0Cpl039356 for ; Fri, 17 Oct 2008 20:00:12 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.3/8.14.3/Submit) id m9HK0CV1039353; Fri, 17 Oct 2008 20:00:12 GMT (envelope-from gnats) Date: Fri, 17 Oct 2008 20:00:12 GMT Message-Id: <200810172000.m9HK0CV1039353@freefall.freebsd.org> To: freebsd-threads@FreeBSD.org From: Kurt Miller Cc: Subject: Re: threads/128180: pthread_cond_broadcast() lost wakup X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Kurt Miller List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Oct 2008 20:00:13 -0000 The following reply was made to PR threads/128180; it has been noted by GNATS. From: Kurt Miller To: bug-followup@FreeBSD.org Cc: Subject: Re: threads/128180: pthread_cond_broadcast() lost wakup Date: Fri, 17 Oct 2008 15:56:57 -0400 I've reproduced this on the following setup: 6.3-RELEASE SMP i386 kernel libthr 2 cores However, the lost wakeup is on the main thread. The last broadcast wakeup from the worker threads gets lost and the process deadlocks. So far the test program has not provoked the lost wakeup using libpthread on 6.3. Initial 7.0 (amd64 libthr) testing has not provoked the issue yet either. From owner-freebsd-threads@FreeBSD.ORG Fri Oct 17 23:45:00 2008 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 71ABC106568C for ; Fri, 17 Oct 2008 23:45:00 +0000 (UTC) (envelope-from deischen@freebsd.org) Received: from mail.netplex.net (mail.netplex.net [204.213.176.10]) by mx1.freebsd.org (Postfix) with ESMTP id 2F1C18FC15 for ; Fri, 17 Oct 2008 23:44:59 +0000 (UTC) (envelope-from deischen@freebsd.org) Received: from sea.ntplx.net (sea.ntplx.net [204.213.176.11]) by mail.netplex.net (8.14.3/8.14.3/NETPLEX) with ESMTP id m9HNiwvY011483; Fri, 17 Oct 2008 19:44:58 -0400 (EDT) X-Virus-Scanned: by AMaViS and Clam AntiVirus (mail.netplex.net) X-Greylist: Message whitelisted by DRAC access database, not delayed by milter-greylist-4.0 (mail.netplex.net [204.213.176.10]); Fri, 17 Oct 2008 19:44:58 -0400 (EDT) Date: Fri, 17 Oct 2008 19:44:58 -0400 (EDT) From: Daniel Eischen X-X-Sender: eischen@sea.ntplx.net To: Kurt Miller In-Reply-To: <200810171640.m9HGexJ1090893@www.freebsd.org> Message-ID: References: <200810171640.m9HGexJ1090893@www.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-gnats-submit@freebsd.org, freebsd-threads@freebsd.org Subject: Re: threads/128180: pthread_cond_broadcast() lost wakup X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Daniel Eischen List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Oct 2008 23:45:00 -0000 On Fri, 17 Oct 2008, Kurt Miller wrote: > The test program outputs periodic printf's indicating > progress is being made. When it stops the process is > deadlocked. The lost wakeup can be confirmed by inspecting > the saved_waiters local var in main(). Each time the > deadlock occurs I see that saved_waiters is 8 which tells > me all eight worker threads were waiting on the condition > variable when the broadcast was sent. Then switch to the > thread that is still waiting on the condition variable, > and you can see that the last_cycle local var is one behind > the cycles global var which indicates it didn't receive the > last wakeup. The test program doesn't look correct to me. It seems possible for only a few of the threads (as little as 2) to do all the work. Thread 1 can start doing work, then wait for a broadcast. Thread 2 can start doing his work, then broadcast waking thread 1. I think you need separate condition variables, one to wake up the main thread when the last worker goes to sleep/finishes, and one to wake up the workers. -- DE From owner-freebsd-threads@FreeBSD.ORG Sat Oct 18 00:00:10 2008 Return-Path: Delivered-To: freebsd-threads@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A2FE9106568D for ; Sat, 18 Oct 2008 00:00:09 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 841EF8FC13 for ; Sat, 18 Oct 2008 00:00:09 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id m9I009Sb058527 for ; Sat, 18 Oct 2008 00:00:09 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.3/8.14.3/Submit) id m9I009T8058526; Sat, 18 Oct 2008 00:00:09 GMT (envelope-from gnats) Date: Sat, 18 Oct 2008 00:00:09 GMT Message-Id: <200810180000.m9I009T8058526@freefall.freebsd.org> To: freebsd-threads@FreeBSD.org From: Daniel Eischen Cc: Subject: Re: threads/128180: pthread_cond_broadcast() lost wakup X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Daniel Eischen List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Oct 2008 00:00:10 -0000 The following reply was made to PR threads/128180; it has been noted by GNATS. From: Daniel Eischen To: Kurt Miller Cc: freebsd-gnats-submit@freebsd.org, freebsd-threads@freebsd.org Subject: Re: threads/128180: pthread_cond_broadcast() lost wakup Date: Fri, 17 Oct 2008 19:44:58 -0400 (EDT) On Fri, 17 Oct 2008, Kurt Miller wrote: > The test program outputs periodic printf's indicating > progress is being made. When it stops the process is > deadlocked. The lost wakeup can be confirmed by inspecting > the saved_waiters local var in main(). Each time the > deadlock occurs I see that saved_waiters is 8 which tells > me all eight worker threads were waiting on the condition > variable when the broadcast was sent. Then switch to the > thread that is still waiting on the condition variable, > and you can see that the last_cycle local var is one behind > the cycles global var which indicates it didn't receive the > last wakeup. The test program doesn't look correct to me. It seems possible for only a few of the threads (as little as 2) to do all the work. Thread 1 can start doing work, then wait for a broadcast. Thread 2 can start doing his work, then broadcast waking thread 1. I think you need separate condition variables, one to wake up the main thread when the last worker goes to sleep/finishes, and one to wake up the workers. -- DE From owner-freebsd-threads@FreeBSD.ORG Sat Oct 18 03:00:18 2008 Return-Path: Delivered-To: freebsd-threads@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 105FB1065690 for ; Sat, 18 Oct 2008 03:00:18 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id F27928FC13 for ; Sat, 18 Oct 2008 03:00:17 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id m9I30HZm073893 for ; Sat, 18 Oct 2008 03:00:17 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.3/8.14.3/Submit) id m9I30HqG073892; Sat, 18 Oct 2008 03:00:17 GMT (envelope-from gnats) Date: Sat, 18 Oct 2008 03:00:17 GMT Message-Id: <200810180300.m9I30HqG073892@freefall.freebsd.org> To: freebsd-threads@FreeBSD.org From: Kurt Miller Cc: Subject: Re: threads/128180: pthread_cond_broadcast() lost wakup X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Kurt Miller List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Oct 2008 03:00:18 -0000 The following reply was made to PR threads/128180; it has been noted by GNATS. From: Kurt Miller To: freebsd-gnats-submit@freebsd.org Cc: Subject: Re: threads/128180: pthread_cond_broadcast() lost wakup Date: Fri, 17 Oct 2008 22:54:11 -0400 Hi Daniel, Thanks for the review of the test program. On Friday 17 October 2008 7:44:58 pm Daniel Eischen wrote: > On Fri, 17 Oct 2008, Kurt Miller wrote: > > > The test program outputs periodic printf's indicating > > progress is being made. When it stops the process is > > deadlocked. The lost wakeup can be confirmed by inspecting > > the saved_waiters local var in main(). Each time the > > deadlock occurs I see that saved_waiters is 8 which tells > > me all eight worker threads were waiting on the condition > > variable when the broadcast was sent. Then switch to the > > thread that is still waiting on the condition variable, > > and you can see that the last_cycle local var is one behind > > the cycles global var which indicates it didn't receive the > > last wakeup. > > The test program doesn't look correct to me. It seems possible > for only a few of the threads (as little as 2) to do all the > work. Thread 1 can start doing work, then wait for a broadcast. > Thread 2 can start doing his work, then broadcast waking thread 1. I didn't fully describe why the design is the way it is. I understand some of the reasons why it was designed like this, but to fully understand it I would need to study the concurrent mark sweep garbage collector far more. I can explain a bit more of what I do understand. The controlling thread in jvm corresponds to the primordial thread in my test program. In the jvm the controlling thread is not in a loop. It just kicks off the worker threads and waits for them to complete, then returns back to the calling function. The jvm will create a worker thread per cpu which wait around for the controlling thread to kick them off. The garbage collection work is divided amongst them. The reason why my test program has 8 worker threads is because the problem was first reported to me on an dual quad core amd64 system. My test systems are just dual core. > I think you need separate condition variables, one to wake up > the main thread when the last worker goes to sleep/finishes, > and one to wake up the workers. Indeed. In my first attempts to reproduce the lost wakeup problem I wrote the test program with a separate condition variable for letting the main thread know when the last worker finished. However, that didn't reproduce the deadlock the jdk was experiencing. Only when I fully mimicked the underlying design of the jdk, did the deadlock get reproduced by the test program. Note that the jdk is written in C++ and abstraction it provides makes for some pretty ugly code when translated in plain C. I could make adjustments to the jvm code to introduce the second condition variable and incorporate that in future releases of the jdk. The problem is that the binary release of the jdk, Diablo, can't be changed without a new formal release process being followed. While the test program and the jdk's use of condition variables may not be ideal and somewhat unexpected, I do believe it is valid. It does work on Solaris, Linux and Windows without loosing wakeups. With the 6.4 release comming soon, it would be great if the lost wakeup problem (which is rather serious) could be looked at and fixed before 6.4 is released. Regards, -Kurt