From owner-freebsd-current@FreeBSD.ORG Thu Sep 18 17:46:03 2008 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 66D35106567E for ; Thu, 18 Sep 2008 17:46:03 +0000 (UTC) (envelope-from julian@elischer.org) Received: from outK.internet-mail-service.net (outk.internet-mail-service.net [216.240.47.234]) by mx1.freebsd.org (Postfix) with ESMTP id 4D44B8FC0A for ; Thu, 18 Sep 2008 17:46:03 +0000 (UTC) (envelope-from julian@elischer.org) Received: from idiom.com (mx0.idiom.com [216.240.32.160]) by out.internet-mail-service.net (Postfix) with ESMTP id BA72B24DE; Thu, 18 Sep 2008 10:46:01 -0700 (PDT) Received: from julian-mac.elischer.org (localhost [127.0.0.1]) by idiom.com (Postfix) with ESMTP id 3699B2D6031; Thu, 18 Sep 2008 09:23:24 -0700 (PDT) Message-ID: <48D2807A.1050106@elischer.org> Date: Thu, 18 Sep 2008 09:23:22 -0700 From: Julian Elischer User-Agent: Thunderbird 2.0.0.16 (Macintosh/20080707) MIME-Version: 1.0 To: David Naylor References: <200809180631.47071.naylor.b.david@gmail.com> In-Reply-To: <200809180631.47071.naylor.b.david@gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-current@freebsd.org Subject: Re: FreeBSD deadlock (with fork?) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Sep 2008 17:46:03 -0000 David Naylor wrote: > Hi, > > I have a program that spawns a lot of subprocesses (with pipes open) from > multiple threads. The problem is the program often deadlocks, but not > consistently. Sometimes the program can run over 5 times to competition > without incidence and yet othertimes it locks within a few seconds. you sent this to -current. Is it in -current? (we fixed something like this some months back in current and 7) do your post-fork processes do an exec? according to the spec they should. if any of your threads other than the one that did the fork ho;ds any mutex at teh time of fork then your process will hang. If they hold a umtx (between processes) then everything will hang. > > However if I limit the thread count to 1 the problem does not appear to be > present. > > Here are the logs from gdb: > (gdb) info thread > 5 Thread 7021c0 (LWP 100203) 0x00000008009a2e8c in _umtx_op_err () > at /usr/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:37 > 4 Thread a28480 (LWP 100174) 0x00000008009a2e8c in _umtx_op_err () > at /usr/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:37 > 3 Thread a61d80 (LWP 100175) 0x00000008009a2e8c in _umtx_op_err () > at /usr/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:37 > 2 Thread a61bc0 (LWP 100176) 0x00000008009a2e8c in _umtx_op_err () > at /usr/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:37 > * 1 Thread a61840 (LWP 100177) 0x00000008009a2e8c in _umtx_op_err () > at /usr/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:37 > > > (gdb) bt > #0 0x00000008009a2e8c in _umtx_op_err () > at /usr/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:37 > #1 0x00000008009a1331 in cond_wait_common (cond=Variable "cond" is not > available. > ) at /usr/src/lib/libthr/thread/thr_cond.c:204 > #2 0x00000000004c0573 in PyThread_acquire_lock (lock=0x70a760, waitflag=1) at > thread_pthread.h:452 > #3 0x00000000004c38b0 in lock_PyThread_acquire_lock (self=0x83f258, > args=Variable "args" is not available. > ) at ./Modules/threadmodule.c:46 > #4 0x00000000004939e3 in PyEval_EvalFrameEx (f=0xa53920, > throwflag=Variable "throwflag" is not available. > ) at Python/ceval.c:3679 > #5 0x00000000004939a8 in PyEval_EvalFrameEx (f=0xb57420, > throwflag=Variable "throwflag" is not available. > ) at Python/ceval.c:3765 > #6 0x0000000000494ac1 in PyEval_EvalCodeEx (co=0x9aab70, > globals=Variable "globals" is not available. > ) at Python/ceval.c:2942 > #7 0x00000000004eb6bc in function_call (func=0x9d8938, arg=0x9d9990, > kw=0xa24da0) > at Objects/funcobject.c:524 > #8 0x0000000000417add in PyObject_Call (func=0x9d8938, arg=0x9d9990, > kw=0xa24da0) > at Objects/abstract.c:2487 > #9 0x000000000048f730 in PyEval_EvalFrameEx (f=0xb48e20, > throwflag=Variable "throwflag" is not available. > ) at Python/ceval.c:3978 > #10 0x00000000004939a8 in PyEval_EvalFrameEx (f=0xb57720, > throwflag=Variable "throwflag" is not available. > ) at Python/ceval.c:3765 > #11 0x00000000004939a8 in PyEval_EvalFrameEx (f=0xb49020, > throwflag=Variable "throwflag" is not available. > ) at Python/ceval.c:3765 > #12 0x0000000000494ac1 in PyEval_EvalCodeEx (co=0x947cd8, > globals=Variable "globals" is not available. > ) at Python/ceval.c:2942 > #13 0x00000000004eb5bd in function_call (func=0x9a5320, arg=0x9d9950, kw=0x0) > at Objects/funcobject.c:524 > #14 0x0000000000417add in PyObject_Call (func=0x9a5320, arg=0x9d9950, kw=0x0) > at Objects/abstract.c:2487 > #15 0x000000000041f12f in instancemethod_call (func=0x9a5320, arg=0x9d9950, > kw=0x0) > at Objects/classobject.c:2558 > #16 0x0000000000417add in PyObject_Call (func=0x93b410, arg=0x718050, kw=0x0) > at Objects/abstract.c:2487 > #17 0x000000000048d4c6 in PyEval_CallObjectWithKeywords (func=0x93b410, > arg=0x718050, kw=0x0) > at Python/ceval.c:3548 > #18 0x00000000004c3cbd in t_bootstrap (boot_raw=0x70ae60) > at ./Modules/threadmodule.c:425 > #19 0x000000080099ac11 in thread_start (curthread=0xa61840) > at /usr/src/lib/libthr/thread/thr_create.c:288 > #20 0x0000000000000000 in ?? () > Error accessing memory address 0x7fffff5fc000: Bad address. > > > (gdb) thr 2 > [Switching to thread 2 (Thread a61bc0 (LWP 100176))]#0 0x00000008009a2e8c in > _umtx_op_err () > at /usr/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:37 > 37 RSYSCALL_ERR(_umtx_op) > (gdb) bt > #0 0x00000008009a2e8c in _umtx_op_err () > at /usr/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:37 > #1 0x00000008009a1331 in cond_wait_common (cond=Variable "cond" is not > available. > ) at /usr/src/lib/libthr/thread/thr_cond.c:204 > #2 0x00000000004c0573 in PyThread_acquire_lock (lock=0x70a600, waitflag=1) at > thread_pthread.h:452 > #3 0x00000000004c38b0 in lock_PyThread_acquire_lock (self=0x83f168, > args=Variable "args" is not available. > ) at ./Modules/threadmodule.c:46 > #4 0x00000000004939e3 in PyEval_EvalFrameEx (f=0xb56520, > throwflag=Variable "throwflag" is not available. > ) at Python/ceval.c:3679 > #5 0x0000000000494ac1 in PyEval_EvalCodeEx (co=0x8d5dc8, > globals=Variable "globals" is not available. > ) at Python/ceval.c:2942 > #6 0x0000000000492c88 in PyEval_EvalFrameEx (f=0xb2de20, > throwflag=Variable "throwflag" is not available. > ) at Python/ceval.c:3775 > #7 0x0000000000494ac1 in PyEval_EvalCodeEx (co=0x8d5288, > globals=Variable "globals" is not available. > ) at Python/ceval.c:2942 > #8 0x00000000004eb5bd in function_call (func=0x9a8b18, arg=0xc524d0, kw=0x0) > at Objects/funcobject.c:524 > #9 0x0000000000417add in PyObject_Call (func=0x9a8b18, arg=0xc524d0, kw=0x0) > at Objects/abstract.c:2487 > #10 0x000000000041f12f in instancemethod_call (func=0x9a8b18, arg=0xc524d0, > kw=0x0) > at Objects/classobject.c:2558 > #11 0x0000000000417add in PyObject_Call (func=0x93b370, arg=0x10331d0, kw=0x0) > at Objects/abstract.c:2487 > #12 0x0000000000464158 in slot_tp_init (self=Variable "self" is not available. > ) at Objects/typeobject.c:5532 > #13 0x0000000000461561 in type_call (type=0x7d9420, args=0x10331d0, kwds=0x0) > at Objects/typeobject.c:700 > #14 0x0000000000417add in PyObject_Call (func=0x7d9420, arg=0x10331d0, kw=0x0) > at Objects/abstract.c:2487 > #15 0x00000000004906f9 in PyEval_EvalFrameEx (f=0xb3a120, > throwflag=Variable "throwflag" is not available. > ) at Python/ceval.c:3890 > #16 0x0000000000494ac1 in PyEval_EvalCodeEx (co=0x8d5eb8, > globals=Variable "globals" is not available. > ) at Python/ceval.c:2942 > #17 0x0000000000492c88 in PyEval_EvalFrameEx (f=0xa47a20, > throwflag=Variable "throwflag" is not available. > ) at Python/ceval.c:3775 > #18 0x0000000000494ac1 in PyEval_EvalCodeEx (co=0x8d5cd8, > globals=Variable "globals" is not available. > ) at Python/ceval.c:2942 > #19 0x0000000000492c88 in PyEval_EvalFrameEx (f=0xb3a420, > throwflag=Variable "throwflag" is not available. > ) at Python/ceval.c:3775 > #20 0x0000000000494ac1 in PyEval_EvalCodeEx (co=0x9aabe8, > globals=Variable "globals" is not available. > ) at Python/ceval.c:2942 > #21 0x0000000000492c88 in PyEval_EvalFrameEx (f=0xb3a720, > throwflag=Variable "throwflag" is not available. > ) at Python/ceval.c:3775 > #22 0x0000000000494ac1 in PyEval_EvalCodeEx (co=0x9aab70, > globals=Variable "globals" is not available. > ) at Python/ceval.c:2942 > #23 0x00000000004eb6bc in function_call (func=0x9d8938, arg=0x9d98d0, > kw=0xa5a660) > at Objects/funcobject.c:524 > #24 0x0000000000417add in PyObject_Call (func=0x9d8938, arg=0x9d98d0, > kw=0xa5a660) > at Objects/abstract.c:2487 > #25 0x000000000048f730 in PyEval_EvalFrameEx (f=0xb2e020, > throwflag=Variable "throwflag" is not available. > ) at Python/ceval.c:3978 > ---Type to continue, or q to quit--- > #26 0x00000000004939a8 in PyEval_EvalFrameEx (f=0xb3aa20, > throwflag=Variable "throwflag" is not available. > ) at Python/ceval.c:3765 > #27 0x00000000004939a8 in PyEval_EvalFrameEx (f=0xb2e220, > throwflag=Variable "throwflag" is not available. > ) at Python/ceval.c:3765 > #28 0x0000000000494ac1 in PyEval_EvalCodeEx (co=0x947cd8, > globals=Variable "globals" is not available. > ) at Python/ceval.c:2942 > #29 0x00000000004eb5bd in function_call (func=0x9a5320, arg=0x9d9890, kw=0x0) > at Objects/funcobject.c:524 > #30 0x0000000000417add in PyObject_Call (func=0x9a5320, arg=0x9d9890, kw=0x0) > at Objects/abstract.c:2487 > #31 0x000000000041f12f in instancemethod_call (func=0x9a5320, arg=0x9d9890, > kw=0x0) > at Objects/classobject.c:2558 > #32 0x0000000000417add in PyObject_Call (func=0x93b320, arg=0x718050, kw=0x0) > at Objects/abstract.c:2487 > #33 0x000000000048d4c6 in PyEval_CallObjectWithKeywords (func=0x93b320, > arg=0x718050, kw=0x0) > at Python/ceval.c:3548 > #34 0x00000000004c3cbd in t_bootstrap (boot_raw=0x70aee0) > at ./Modules/threadmodule.c:425 > #35 0x000000080099ac11 in thread_start (curthread=0xa61bc0) > at /usr/src/lib/libthr/thread/thr_create.c:288 > #36 0x0000000000000000 in ?? () > Error accessing memory address 0x7fffff7fd000: Bad address. > > Apologies about the long backtraces [and thus the long message]. > > Thanks > > David