From owner-freebsd-threads@FreeBSD.ORG Sat Oct 18 00:00:10 2008 Return-Path: Delivered-To: freebsd-threads@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A2FE9106568D for ; Sat, 18 Oct 2008 00:00:09 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 841EF8FC13 for ; Sat, 18 Oct 2008 00:00:09 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id m9I009Sb058527 for ; Sat, 18 Oct 2008 00:00:09 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.3/8.14.3/Submit) id m9I009T8058526; Sat, 18 Oct 2008 00:00:09 GMT (envelope-from gnats) Date: Sat, 18 Oct 2008 00:00:09 GMT Message-Id: <200810180000.m9I009T8058526@freefall.freebsd.org> To: freebsd-threads@FreeBSD.org From: Daniel Eischen Cc: Subject: Re: threads/128180: pthread_cond_broadcast() lost wakup X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Daniel Eischen List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 Oct 2008 00:00:10 -0000 The following reply was made to PR threads/128180; it has been noted by GNATS. From: Daniel Eischen To: Kurt Miller Cc: freebsd-gnats-submit@freebsd.org, freebsd-threads@freebsd.org Subject: Re: threads/128180: pthread_cond_broadcast() lost wakup Date: Fri, 17 Oct 2008 19:44:58 -0400 (EDT) On Fri, 17 Oct 2008, Kurt Miller wrote: > The test program outputs periodic printf's indicating > progress is being made. When it stops the process is > deadlocked. The lost wakeup can be confirmed by inspecting > the saved_waiters local var in main(). Each time the > deadlock occurs I see that saved_waiters is 8 which tells > me all eight worker threads were waiting on the condition > variable when the broadcast was sent. Then switch to the > thread that is still waiting on the condition variable, > and you can see that the last_cycle local var is one behind > the cycles global var which indicates it didn't receive the > last wakeup. The test program doesn't look correct to me. It seems possible for only a few of the threads (as little as 2) to do all the work. Thread 1 can start doing work, then wait for a broadcast. Thread 2 can start doing his work, then broadcast waking thread 1. I think you need separate condition variables, one to wake up the main thread when the last worker goes to sleep/finishes, and one to wake up the workers. -- DE