From owner-freebsd-hackers@FreeBSD.ORG Tue Feb 7 15:50:12 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 293CE1065673 for ; Tue, 7 Feb 2012 15:50:12 +0000 (UTC) (envelope-from janm@transactionware.com) Received: from midgard.transactionware.com (mail2.transactionware.com [203.14.245.36]) by mx1.freebsd.org (Postfix) with SMTP id 9113A8FC16 for ; Tue, 7 Feb 2012 15:50:07 +0000 (UTC) Received: (qmail 33712 invoked by uid 907); 7 Feb 2012 15:50:05 -0000 Received: from midgard.transactionware.com (HELO [127.0.0.1]) (192.168.1.55) by midgard.transactionware.com (qpsmtpd/0.82) with ESMTP; Wed, 08 Feb 2012 02:50:05 +1100 Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii From: Jan Mikkelsen In-Reply-To: Date: Wed, 8 Feb 2012 02:50:04 +1100 Content-Transfer-Encoding: quoted-printable Message-Id: References: <201201110806.30620.jhb@freebsd.org> <5D37298B-9D68-4F0F-8AAB-E8F2DBB9D9C3@transactionware.com> To: Attilio Rao X-Mailer: Apple Mail (2.1084) X-Mailman-Approved-At: Tue, 07 Feb 2012 16:31:40 +0000 Cc: Garrett Cooper , freebsd-hackers@freebsd.org, Ivan Voras , Xin LI , Jan Mikkelsen , davidxu@freebsd.org Subject: Re: sem(4) lockup in python? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Feb 2012 15:50:12 -0000 On 06/02/2012, at 3:49 AM, Attilio Rao wrote: > 2012/2/5 Ivan Voras : >> On 5 February 2012 11:44, Garrett Cooper wrote: >>=20 >>>=20 >>> 'make MAKE_JOBS_NUMBER=3D1' is the workground used right now.. >>=20 >> David Xu suggested that it is a bug in Python - it doesn't set >> process-shared attribute when it calls sem_init(), but i've tried >> patching it (replacing the port patchfile file the one I've attached) >> and I still get the hang. >=20 > Guys, > it would be valuable if you do the following: > 1) recompile your kernel with INVARIANTS, WITNESS and without = WITNESS_SKIPSPIN > 2a) If you have a serial console, please run the DDB stuff through it > (go to point 3) > 2b) If you don't have a serial console please run the DDB stuff in > textdump (go to point 3) > 3) Collect the following informations: > - show allpcpu > - show alllocks > - ps > - alltrace > 3a) If you had the serial console (thus not textdump) please collect > the coredump with: call doadump > 4) reset your machine >=20 > You will end up with the textdump or coredump + all the serial logs > necessary to debug this. > If you cannot reproduce your issue with WITNESS enabled, please remove > from your kernel config and avoid to call 'show alllocks' when in DDB. > But try to leave INVARIANTS on. >=20 > Hope this helps, > Attilio This has just happened again, this time with MAKE_JOBS_NUMBER=3D1, so = that workaround didn't work. I don't have INVARIANTS or WITNESS compiled in, but I did fire up kgdb = to poke around. The stack traces look identical. I don't know what to = expect in these structures. If there's anything useful I can dig out = here, please let me know. However: A parent and child process both blocked waiting on semaphores = smells like an user level bug to me. Jan. (kgdb) proc 24969 [Switching to thread 648 (Thread 101022)]#0 sched_switch = (td=3D0xfffffe003de43000, newtd=3D0xfffffe000b501000, flags=3DVariable = "flags" is not available. ) at = /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/sched_ule.c:18= 54 1854 cpuid =3D PCPU_GET(cpuid); (kgdb) where #0 sched_switch (td=3D0xfffffe003de43000, newtd=3D0xfffffe000b501000, = flags=3DVariable "flags" is not available. ) at = /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/sched_ule.c:18= 54 #1 0xffffffff8083af24 in mi_switch (flags=3D260, newtd=3D0x0) at = /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/kern_synch.c:4= 48 #2 0xffffffff80872644 in sleepq_catch_signals = (wchan=3D0xfffffe0015fca800, pri=3D0) at = /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/subr_sleepqueu= e.c:425 #3 0xffffffff80872fb6 in sleepq_wait_sig (wchan=3DVariable "wchan" is = not available. ) at = /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/subr_sleepqueu= e.c:631 #4 0xffffffff8083b599 in _sleep (ident=3D0xfffffe0015fca800, = lock=3D0xffffffff81114860, priority=3DVariable "priority" is not = available. ) at = /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/kern_synch.c:2= 32 #5 0xffffffff8084ac69 in do_sem_wait (td=3DVariable "td" is not = available. ) at = /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/kern_umtx.c:51= 3 #6 0xffffffff8084ad61 in __umtx_op_sem_wait (td=3D0xfffffe003de43000, = uap=3D0xffffff8693d85bc0) at = /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/kern_umtx.c:32= 05 #7 0xffffffff80b17de0 in amd64_syscall (td=3D0xfffffe003de43000, = traced=3D0) at subr_syscall.c:131 #8 0xffffffff80b03517 in Xfast_syscall () at = /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/amd64/amd64/excepti= on.S:387 #9 0x00000008010277fc in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) proc 24970 [Switching to thread 665 (Thread 100553)]#0 sched_switch = (td=3D0xfffffe02f7240460, newtd=3D0xfffffe000b501460, flags=3DVariable = "flags" is not available. ) at = /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/sched_ule.c:18= 54 1854 cpuid =3D PCPU_GET(cpuid); (kgdb) where #0 sched_switch (td=3D0xfffffe02f7240460, newtd=3D0xfffffe000b501460, = flags=3DVariable "flags" is not available. ) at = /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/sched_ule.c:18= 54 #1 0xffffffff8083af24 in mi_switch (flags=3D260, newtd=3D0x0) at = /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/kern_synch.c:4= 48 #2 0xffffffff80872644 in sleepq_catch_signals = (wchan=3D0xfffffe0015fd7380, pri=3D0) at = /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/subr_sleepqueu= e.c:425 #3 0xffffffff80872fb6 in sleepq_wait_sig (wchan=3DVariable "wchan" is = not available. ) at = /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/subr_sleepqueu= e.c:631 #4 0xffffffff8083b599 in _sleep (ident=3D0xfffffe0015fd7380, = lock=3D0xffffffff811145e0, priority=3DVariable "priority" is not = available. ) at = /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/kern_synch.c:2= 32 #5 0xffffffff8084ac69 in do_sem_wait (td=3DVariable "td" is not = available. ) at = /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/kern_umtx.c:51= 3 #6 0xffffffff8084ad61 in __umtx_op_sem_wait (td=3D0xfffffe02f7240460, = uap=3D0xffffff8694b04bc0) at = /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/kern_umtx.c:32= 05 #7 0xffffffff80b17de0 in amd64_syscall (td=3D0xfffffe02f7240460, = traced=3D0) at subr_syscall.c:131 #8 0xffffffff80b03517 in Xfast_syscall () at = /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/amd64/amd64/excepti= on.S:387 #9 0x00000008010277fc in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) up #1 0xffffffff8083af24 in mi_switch (flags=3D260, newtd=3D0x0) at = /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/kern_synch.c:4= 48 448 sched_switch(td, newtd, flags); (kgdb) up #2 0xffffffff80872644 in sleepq_catch_signals = (wchan=3D0xfffffe0015fd7380, pri=3D0) at = /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/subr_sleepqueu= e.c:425 425 sleepq_switch(wchan, pri); (kgdb) up #3 0xffffffff80872fb6 in sleepq_wait_sig (wchan=3DVariable "wchan" is = not available. ) at = /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/subr_sleepqueu= e.c:631 631 rcatch =3D sleepq_catch_signals(wchan, pri); (kgdb) up #4 0xffffffff8083b599 in _sleep (ident=3D0xfffffe0015fd7380, = lock=3D0xffffffff811145e0, priority=3DVariable "priority" is not = available. ) at = /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/kern_synch.c:2= 32 232 rval =3D sleepq_wait_sig(ident, pri); (kgdb) up #5 0xffffffff8084ac69 in do_sem_wait (td=3DVariable "td" is not = available. ) at = /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/kern_umtx.c:51= 3 warning: Source file is more recent than executable. 513 error =3D msleep(uq, &uc->uc_lock, PCATCH, wmesg, timo); (kgdb) p *uq $1 =3D {uq_link =3D {tqe_next =3D 0x0, tqe_prev =3D 0xfffffe0015fd2080}, = uq_key =3D {hash =3D 186, type =3D 2, shared =3D 0, info =3D {shared =3D = {object =3D 0xfffffe00162b0310, offset =3D 34380812388}, private =3D { vs =3D 0xfffffe00162b0310, addr =3D 34380812388}, both =3D {a =3D = 0xfffffe00162b0310, b =3D 34380812388}}}, uq_flags =3D 1, uq_thread =3D = 0xfffffe02f7240460, uq_pi_blocked =3D 0x0, uq_lockq =3D { tqe_next =3D 0x0, tqe_prev =3D 0x0}, uq_pi_contested =3D {tqh_first = =3D 0x0, tqh_last =3D 0xfffffe0015fd73d8}, uq_inherited_pri =3D 255 '?', = uq_spare_queue =3D 0x0, uq_cur_queue =3D 0xfffffe0015fd2080} (kgdb) proc 24969 [Switching to thread 648 (Thread 101022)]#0 sched_switch = (td=3D0xfffffe003de43000, newtd=3D0xfffffe000b501000, flags=3DVariable = "flags" is not available. ) at = /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/sched_ule.c:18= 54 1854 cpuid =3D PCPU_GET(cpuid); (kgdb) up #1 0xffffffff8083af24 in mi_switch (flags=3D260, newtd=3D0x0) at = /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/kern_synch.c:4= 48 448 sched_switch(td, newtd, flags); (kgdb) up #2 0xffffffff80872644 in sleepq_catch_signals = (wchan=3D0xfffffe0015fca800, pri=3D0) at = /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/subr_sleepqueu= e.c:425 425 sleepq_switch(wchan, pri); (kgdb) up #3 0xffffffff80872fb6 in sleepq_wait_sig (wchan=3DVariable "wchan" is = not available. ) at = /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/subr_sleepqueu= e.c:631 631 rcatch =3D sleepq_catch_signals(wchan, pri); (kgdb) up #4 0xffffffff8083b599 in _sleep (ident=3D0xfffffe0015fca800, = lock=3D0xffffffff81114860, priority=3DVariable "priority" is not = available. ) at = /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/kern_synch.c:2= 32 232 rval =3D sleepq_wait_sig(ident, pri); (kgdb) up #5 0xffffffff8084ac69 in do_sem_wait (td=3DVariable "td" is not = available. ) at = /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/kern_umtx.c:51= 3 513 error =3D msleep(uq, &uc->uc_lock, PCATCH, wmesg, timo); (kgdb) p *uq $2 =3D {uq_link =3D {tqe_next =3D 0x0, tqe_prev =3D 0xfffffe04fc73c280}, = uq_key =3D {hash =3D 194, type =3D 2, shared =3D 0, info =3D {shared =3D = {object =3D 0xfffffe001628d188, offset =3D 34380814884}, private =3D { vs =3D 0xfffffe001628d188, addr =3D 34380814884}, both =3D {a =3D = 0xfffffe001628d188, b =3D 34380814884}}}, uq_flags =3D 1, uq_thread =3D = 0xfffffe003de43000, uq_pi_blocked =3D 0x0, uq_lockq =3D { tqe_next =3D 0x0, tqe_prev =3D 0x0}, uq_pi_contested =3D {tqh_first = =3D 0x0, tqh_last =3D 0xfffffe0015fca858}, uq_inherited_pri =3D 255 '?', = uq_spare_queue =3D 0x0, uq_cur_queue =3D 0xfffffe04fc73c280} (kgdb)=20