From owner-freebsd-current@FreeBSD.ORG Thu Sep 18 21:38:07 2008 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DBE38106564A for ; Thu, 18 Sep 2008 21:38:07 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from server.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id 5C0BE8FC18 for ; Thu, 18 Sep 2008 21:38:07 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from localhost.corp.yahoo.com (john@localhost [IPv6:::1]) (authenticated bits=0) by server.baldwin.cx (8.14.2/8.14.2) with ESMTP id m8ILbngg020803; Thu, 18 Sep 2008 17:38:01 -0400 (EDT) (envelope-from jhb@freebsd.org) From: John Baldwin To: freebsd-current@freebsd.org Date: Thu, 18 Sep 2008 17:35:29 -0400 User-Agent: KMail/1.9.7 References: <200809180631.47071.naylor.b.david@gmail.com> In-Reply-To: <200809180631.47071.naylor.b.david@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200809181735.29504.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (server.baldwin.cx [IPv6:::1]); Thu, 18 Sep 2008 17:38:01 -0400 (EDT) X-Virus-Scanned: ClamAV 0.93.1/8282/Thu Sep 18 14:21:01 2008 on server.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.5 required=4.2 tests=AWL,BAYES_00,NO_RELAYS autolearn=ham version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx Cc: David Naylor Subject: Re: FreeBSD deadlock (with fork?) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Sep 2008 21:38:07 -0000 On Thursday 18 September 2008 12:31:42 am David Naylor wrote: > Hi, > > I have a program that spawns a lot of subprocesses (with pipes open) from > multiple threads. The problem is the program often deadlocks, but not > consistently. Sometimes the program can run over 5 times to competition > without incidence and yet othertimes it locks within a few seconds. > > However if I limit the thread count to 1 the problem does not appear to be > present. > > Here are the logs from gdb: > (gdb) info thread > 5 Thread 7021c0 (LWP 100203) 0x00000008009a2e8c in _umtx_op_err () > at /usr/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:37 > 4 Thread a28480 (LWP 100174) 0x00000008009a2e8c in _umtx_op_err () > at /usr/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:37 > 3 Thread a61d80 (LWP 100175) 0x00000008009a2e8c in _umtx_op_err () > at /usr/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:37 > 2 Thread a61bc0 (LWP 100176) 0x00000008009a2e8c in _umtx_op_err () > at /usr/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:37 > * 1 Thread a61840 (LWP 100177) 0x00000008009a2e8c in _umtx_op_err () > at /usr/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:37 > > > (gdb) bt > #0 0x00000008009a2e8c in _umtx_op_err () > at /usr/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:37 > #1 0x00000008009a1331 in cond_wait_common (cond=Variable "cond" is not > available. This is not waiting on a lock, this is a pthread_condvar_wait() of some sort. > (gdb) thr 2 > [Switching to thread 2 (Thread a61bc0 (LWP 100176))]#0 0x00000008009a2e8c in > _umtx_op_err () > at /usr/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:37 > 37 RSYSCALL_ERR(_umtx_op) > (gdb) bt > #0 0x00000008009a2e8c in _umtx_op_err () > at /usr/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:37 > #1 0x00000008009a1331 in cond_wait_common (cond=Variable "cond" is not > available. Simiarly here. I don't think you have a deadlock. I think you have a bug where you are missing a pthread_condvar_signal() or broadcast or some such. Or maybe you aren't holding the mutex when doing the signal or broadcast. -- John Baldwin