Date: Mon, 18 Dec 2006 18:40:21 GMT From: Peter Edwards <peadar@freebsd.org> To: freebsd-threads@FreeBSD.org Subject: Re: threads/74180: KSE problem. Applications those riched maximum possible threads at a time, would hang on threads join. look at detailed description ! Message-ID: <200612181840.kBIIeL5s044845@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
The following reply was made to PR threads/74180; it has been noted by GNATS. From: Peter Edwards <peadar@freebsd.org> To: bug-followup@FreeBSD.org, acs@swamp.homeunix.org Cc: Subject: Re: threads/74180: KSE problem. Applications those riched maximum possible threads at a time, would hang on threads join. look at detailed description ! Date: Mon, 18 Dec 2006 18:03:34 +0000 There's some bugs in the posted sample that will indeed cause it to hang unpredictably. For condition variables, you need to test some condition before sleeping, and the condition needs to be protected by the mutex you release as you go to sleep (this is where they get their name from) For example, in the case posted, after you start, say, 2000 threads, the main thread may reach the pthread_cond_broadcast() before some subset of those 2000 reach pthread_cond_wait() The broadcast only wakes up those threads that are _currently_ waiting on the condvar, so threads that reach the pthread_cond_wait() after that will hang indefinitely. So, before going asleep, you need to test if the main thread has already hit the pthread_cond_broadcast(): eg, > > static bool done = false; > ... > pthread_mutex_lock(&lock); > while (!done) > pthread_cond_wait(&WakeThemUp, &lock); > pthread_mutex_unlock(&lock); > > ... > > done = true; > pthread_cond_signal(&WakeThemUp); Note the "while (cond)" rather than the "if (cond)" around the cond_wait, it's allowed for pthread_cond_wait to return spuriously. This still leaves a race condition between the assignment of the "done" sentinel with the waking of the condition (ie, between the waiter thread testing "done" and going asleep, "done" is assigned by the waker thread): Generally, you need to hold the mutex while you change the condition that the other threads are waiting on, and signal/broadcast the condvar, so you really need > > pthread_mutex_lock(&lock); > done = true; > pthread_cond_signal(&WakeThemUp); > pthread_mutex_unlock(&lock); Essentially, condition variables - in conjunction with a mutex - give you the ability to have two threads communicate via some external condition (in this case, just the value of "done": the CV just gives you the ability for a consumer to atomically test that condition and go to sleep if its false, and for a producer to atomically change the value of the condition and wake up the consumer. I'm not entirely sure why the program only works the first time its invoked, but its likely that the main thread does a lot of work in the kernel on the first iteration, while the resources allocated are available more readilly (as they are recycled) for successive invocations of the test. This would cause the main thread to lag behind those threads it created for the first invocation, but race ahead afterwards. Note: I'm not saying this _is_ the case, but it's plausible, and serves to indicate why things might not always happen the same way.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200612181840.kBIIeL5s044845>