From owner-freebsd-threads@FreeBSD.ORG Fri Oct 17 23:45:00 2008 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 71ABC106568C for ; Fri, 17 Oct 2008 23:45:00 +0000 (UTC) (envelope-from deischen@freebsd.org) Received: from mail.netplex.net (mail.netplex.net [204.213.176.10]) by mx1.freebsd.org (Postfix) with ESMTP id 2F1C18FC15 for ; Fri, 17 Oct 2008 23:44:59 +0000 (UTC) (envelope-from deischen@freebsd.org) Received: from sea.ntplx.net (sea.ntplx.net [204.213.176.11]) by mail.netplex.net (8.14.3/8.14.3/NETPLEX) with ESMTP id m9HNiwvY011483; Fri, 17 Oct 2008 19:44:58 -0400 (EDT) X-Virus-Scanned: by AMaViS and Clam AntiVirus (mail.netplex.net) X-Greylist: Message whitelisted by DRAC access database, not delayed by milter-greylist-4.0 (mail.netplex.net [204.213.176.10]); Fri, 17 Oct 2008 19:44:58 -0400 (EDT) Date: Fri, 17 Oct 2008 19:44:58 -0400 (EDT) From: Daniel Eischen X-X-Sender: eischen@sea.ntplx.net To: Kurt Miller In-Reply-To: <200810171640.m9HGexJ1090893@www.freebsd.org> Message-ID: References: <200810171640.m9HGexJ1090893@www.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-gnats-submit@freebsd.org, freebsd-threads@freebsd.org Subject: Re: threads/128180: pthread_cond_broadcast() lost wakup X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Daniel Eischen List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Oct 2008 23:45:00 -0000 On Fri, 17 Oct 2008, Kurt Miller wrote: > The test program outputs periodic printf's indicating > progress is being made. When it stops the process is > deadlocked. The lost wakeup can be confirmed by inspecting > the saved_waiters local var in main(). Each time the > deadlock occurs I see that saved_waiters is 8 which tells > me all eight worker threads were waiting on the condition > variable when the broadcast was sent. Then switch to the > thread that is still waiting on the condition variable, > and you can see that the last_cycle local var is one behind > the cycles global var which indicates it didn't receive the > last wakeup. The test program doesn't look correct to me. It seems possible for only a few of the threads (as little as 2) to do all the work. Thread 1 can start doing work, then wait for a broadcast. Thread 2 can start doing his work, then broadcast waking thread 1. I think you need separate condition variables, one to wake up the main thread when the last worker goes to sleep/finishes, and one to wake up the workers. -- DE