Date: Wed, 8 Feb 2012 02:50:04 +1100 From: Jan Mikkelsen <janm@transactionware.com> To: Attilio Rao <attilio@freebsd.org> Cc: Garrett Cooper <yanegomi@gmail.com>, freebsd-hackers@freebsd.org, Ivan Voras <ivoras@freebsd.org>, Xin LI <delphij@delphij.net>, Jan Mikkelsen <janm-freebsd-hackers@transactionware.com>, davidxu@freebsd.org Subject: Re: sem(4) lockup in python? Message-ID: <D5A00EE0-3671-4117-B5C4-891E0C65A20F@transactionware.com> In-Reply-To: <CAJ-FndCMst_LYAAevbsMNTZ9TVyv5nHsyttXfYeNYOQN9JhA0A@mail.gmail.com> References: <jejrbe$or8$1@dough.gmane.org> <201201110806.30620.jhb@freebsd.org> <CAF-QHFWFvYTPeM68Mk%2BOYVX--MNhKOJ2o1GF9ZOsBmtiC5fYFQ@mail.gmail.com> <CAGH67wRsek2-WY_ETW6QEER1r5dDXLXfDjbzpHMjtv059Y8cJw@mail.gmail.com> <5D37298B-9D68-4F0F-8AAB-E8F2DBB9D9C3@transactionware.com> <CAGH67wT3HuxPHUXeTib0qJNH%2BO5snn3Eiim1bfj8LewYoKdXdA@mail.gmail.com> <CAF-QHFVADLkduLH1AG_hSZeDtDVCC=FkqZxbxrsMY3Y3%2BsMZ8A@mail.gmail.com> <CAJ-FndCMst_LYAAevbsMNTZ9TVyv5nHsyttXfYeNYOQN9JhA0A@mail.gmail.com>
index | next in thread | previous in thread | raw e-mail
On 06/02/2012, at 3:49 AM, Attilio Rao wrote:
> 2012/2/5 Ivan Voras <ivoras@freebsd.org>:
>> On 5 February 2012 11:44, Garrett Cooper <yanegomi@gmail.com> wrote:
>>
>>>
>>> 'make MAKE_JOBS_NUMBER=1' is the workground used right now..
>>
>> David Xu suggested that it is a bug in Python - it doesn't set
>> process-shared attribute when it calls sem_init(), but i've tried
>> patching it (replacing the port patchfile file the one I've attached)
>> and I still get the hang.
>
> Guys,
> it would be valuable if you do the following:
> 1) recompile your kernel with INVARIANTS, WITNESS and without WITNESS_SKIPSPIN
> 2a) If you have a serial console, please run the DDB stuff through it
> (go to point 3)
> 2b) If you don't have a serial console please run the DDB stuff in
> textdump (go to point 3)
> 3) Collect the following informations:
> - show allpcpu
> - show alllocks
> - ps
> - alltrace
> 3a) If you had the serial console (thus not textdump) please collect
> the coredump with: call doadump
> 4) reset your machine
>
> You will end up with the textdump or coredump + all the serial logs
> necessary to debug this.
> If you cannot reproduce your issue with WITNESS enabled, please remove
> from your kernel config and avoid to call 'show alllocks' when in DDB.
> But try to leave INVARIANTS on.
>
> Hope this helps,
> Attilio
This has just happened again, this time with MAKE_JOBS_NUMBER=1, so that workaround didn't work.
I don't have INVARIANTS or WITNESS compiled in, but I did fire up kgdb to poke around. The stack traces look identical. I don't know what to expect in these structures. If there's anything useful I can dig out here, please let me know.
However: A parent and child process both blocked waiting on semaphores smells like an user level bug to me.
Jan.
(kgdb) proc 24969
[Switching to thread 648 (Thread 101022)]#0 sched_switch (td=0xfffffe003de43000, newtd=0xfffffe000b501000, flags=Variable "flags" is not available.
)
at /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/sched_ule.c:1854
1854 cpuid = PCPU_GET(cpuid);
(kgdb) where
#0 sched_switch (td=0xfffffe003de43000, newtd=0xfffffe000b501000, flags=Variable "flags" is not available.
) at /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/sched_ule.c:1854
#1 0xffffffff8083af24 in mi_switch (flags=260, newtd=0x0) at /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/kern_synch.c:448
#2 0xffffffff80872644 in sleepq_catch_signals (wchan=0xfffffe0015fca800, pri=0) at /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/subr_sleepqueue.c:425
#3 0xffffffff80872fb6 in sleepq_wait_sig (wchan=Variable "wchan" is not available.
) at /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/subr_sleepqueue.c:631
#4 0xffffffff8083b599 in _sleep (ident=0xfffffe0015fca800, lock=0xffffffff81114860, priority=Variable "priority" is not available.
) at /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/kern_synch.c:232
#5 0xffffffff8084ac69 in do_sem_wait (td=Variable "td" is not available.
) at /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/kern_umtx.c:513
#6 0xffffffff8084ad61 in __umtx_op_sem_wait (td=0xfffffe003de43000, uap=0xffffff8693d85bc0) at /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/kern_umtx.c:3205
#7 0xffffffff80b17de0 in amd64_syscall (td=0xfffffe003de43000, traced=0) at subr_syscall.c:131
#8 0xffffffff80b03517 in Xfast_syscall () at /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/amd64/amd64/exception.S:387
#9 0x00000008010277fc in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb) proc 24970
[Switching to thread 665 (Thread 100553)]#0 sched_switch (td=0xfffffe02f7240460, newtd=0xfffffe000b501460, flags=Variable "flags" is not available.
)
at /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/sched_ule.c:1854
1854 cpuid = PCPU_GET(cpuid);
(kgdb) where
#0 sched_switch (td=0xfffffe02f7240460, newtd=0xfffffe000b501460, flags=Variable "flags" is not available.
) at /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/sched_ule.c:1854
#1 0xffffffff8083af24 in mi_switch (flags=260, newtd=0x0) at /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/kern_synch.c:448
#2 0xffffffff80872644 in sleepq_catch_signals (wchan=0xfffffe0015fd7380, pri=0) at /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/subr_sleepqueue.c:425
#3 0xffffffff80872fb6 in sleepq_wait_sig (wchan=Variable "wchan" is not available.
) at /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/subr_sleepqueue.c:631
#4 0xffffffff8083b599 in _sleep (ident=0xfffffe0015fd7380, lock=0xffffffff811145e0, priority=Variable "priority" is not available.
) at /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/kern_synch.c:232
#5 0xffffffff8084ac69 in do_sem_wait (td=Variable "td" is not available.
) at /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/kern_umtx.c:513
#6 0xffffffff8084ad61 in __umtx_op_sem_wait (td=0xfffffe02f7240460, uap=0xffffff8694b04bc0) at /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/kern_umtx.c:3205
#7 0xffffffff80b17de0 in amd64_syscall (td=0xfffffe02f7240460, traced=0) at subr_syscall.c:131
#8 0xffffffff80b03517 in Xfast_syscall () at /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/amd64/amd64/exception.S:387
#9 0x00000008010277fc in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb) up
#1 0xffffffff8083af24 in mi_switch (flags=260, newtd=0x0) at /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/kern_synch.c:448
448 sched_switch(td, newtd, flags);
(kgdb) up
#2 0xffffffff80872644 in sleepq_catch_signals (wchan=0xfffffe0015fd7380, pri=0) at /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/subr_sleepqueue.c:425
425 sleepq_switch(wchan, pri);
(kgdb) up
#3 0xffffffff80872fb6 in sleepq_wait_sig (wchan=Variable "wchan" is not available.
) at /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/subr_sleepqueue.c:631
631 rcatch = sleepq_catch_signals(wchan, pri);
(kgdb) up
#4 0xffffffff8083b599 in _sleep (ident=0xfffffe0015fd7380, lock=0xffffffff811145e0, priority=Variable "priority" is not available.
) at /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/kern_synch.c:232
232 rval = sleepq_wait_sig(ident, pri);
(kgdb) up
#5 0xffffffff8084ac69 in do_sem_wait (td=Variable "td" is not available.
) at /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/kern_umtx.c:513
warning: Source file is more recent than executable.
513 error = msleep(uq, &uc->uc_lock, PCATCH, wmesg, timo);
(kgdb) p *uq
$1 = {uq_link = {tqe_next = 0x0, tqe_prev = 0xfffffe0015fd2080}, uq_key = {hash = 186, type = 2, shared = 0, info = {shared = {object = 0xfffffe00162b0310, offset = 34380812388}, private = {
vs = 0xfffffe00162b0310, addr = 34380812388}, both = {a = 0xfffffe00162b0310, b = 34380812388}}}, uq_flags = 1, uq_thread = 0xfffffe02f7240460, uq_pi_blocked = 0x0, uq_lockq = {
tqe_next = 0x0, tqe_prev = 0x0}, uq_pi_contested = {tqh_first = 0x0, tqh_last = 0xfffffe0015fd73d8}, uq_inherited_pri = 255 '?', uq_spare_queue = 0x0, uq_cur_queue = 0xfffffe0015fd2080}
(kgdb) proc 24969
[Switching to thread 648 (Thread 101022)]#0 sched_switch (td=0xfffffe003de43000, newtd=0xfffffe000b501000, flags=Variable "flags" is not available.
)
at /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/sched_ule.c:1854
1854 cpuid = PCPU_GET(cpuid);
(kgdb) up
#1 0xffffffff8083af24 in mi_switch (flags=260, newtd=0x0) at /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/kern_synch.c:448
448 sched_switch(td, newtd, flags);
(kgdb) up
#2 0xffffffff80872644 in sleepq_catch_signals (wchan=0xfffffe0015fca800, pri=0) at /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/subr_sleepqueue.c:425
425 sleepq_switch(wchan, pri);
(kgdb) up
#3 0xffffffff80872fb6 in sleepq_wait_sig (wchan=Variable "wchan" is not available.
) at /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/subr_sleepqueue.c:631
631 rcatch = sleepq_catch_signals(wchan, pri);
(kgdb) up
#4 0xffffffff8083b599 in _sleep (ident=0xfffffe0015fca800, lock=0xffffffff81114860, priority=Variable "priority" is not available.
) at /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/kern_synch.c:232
232 rval = sleepq_wait_sig(ident, pri);
(kgdb) up
#5 0xffffffff8084ac69 in do_sem_wait (td=Variable "td" is not available.
) at /home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/kern_umtx.c:513
513 error = msleep(uq, &uc->uc_lock, PCATCH, wmesg, timo);
(kgdb) p *uq
$2 = {uq_link = {tqe_next = 0x0, tqe_prev = 0xfffffe04fc73c280}, uq_key = {hash = 194, type = 2, shared = 0, info = {shared = {object = 0xfffffe001628d188, offset = 34380814884}, private = {
vs = 0xfffffe001628d188, addr = 34380814884}, both = {a = 0xfffffe001628d188, b = 34380814884}}}, uq_flags = 1, uq_thread = 0xfffffe003de43000, uq_pi_blocked = 0x0, uq_lockq = {
tqe_next = 0x0, tqe_prev = 0x0}, uq_pi_contested = {tqh_first = 0x0, tqh_last = 0xfffffe0015fca858}, uq_inherited_pri = 255 '?', uq_spare_queue = 0x0, uq_cur_queue = 0xfffffe04fc73c280}
(kgdb)
help
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?D5A00EE0-3671-4117-B5C4-891E0C65A20F>
