Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 8 Feb 2012 02:50:04 +1100
From:      Jan Mikkelsen <janm@transactionware.com>
To:        Attilio Rao <attilio@freebsd.org>
Cc:        Garrett Cooper <yanegomi@gmail.com>, freebsd-hackers@freebsd.org, Ivan Voras <ivoras@freebsd.org>, Xin LI <delphij@delphij.net>, Jan Mikkelsen <janm-freebsd-hackers@transactionware.com>, davidxu@freebsd.org
Subject:   Re: sem(4) lockup in python?
Message-ID:  <D5A00EE0-3671-4117-B5C4-891E0C65A20F@transactionware.com>
In-Reply-To: <CAJ-FndCMst_LYAAevbsMNTZ9TVyv5nHsyttXfYeNYOQN9JhA0A@mail.gmail.com>
References:  <jejrbe$or8$1@dough.gmane.org> <201201110806.30620.jhb@freebsd.org> <CAF-QHFWFvYTPeM68Mk%2BOYVX--MNhKOJ2o1GF9ZOsBmtiC5fYFQ@mail.gmail.com> <CAGH67wRsek2-WY_ETW6QEER1r5dDXLXfDjbzpHMjtv059Y8cJw@mail.gmail.com> <5D37298B-9D68-4F0F-8AAB-E8F2DBB9D9C3@transactionware.com> <CAGH67wT3HuxPHUXeTib0qJNH%2BO5snn3Eiim1bfj8LewYoKdXdA@mail.gmail.com> <CAF-QHFVADLkduLH1AG_hSZeDtDVCC=FkqZxbxrsMY3Y3%2BsMZ8A@mail.gmail.com> <CAJ-FndCMst_LYAAevbsMNTZ9TVyv5nHsyttXfYeNYOQN9JhA0A@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On 06/02/2012, at 3:49 AM, Attilio Rao wrote:

> 2012/2/5 Ivan Voras <ivoras@freebsd.org>:
>> On 5 February 2012 11:44, Garrett Cooper <yanegomi@gmail.com> wrote:
>>=20
>>>=20
>>>    'make MAKE_JOBS_NUMBER=3D1' is the workground used right now..
>>=20
>> David Xu suggested that it is a bug in Python - it doesn't set
>> process-shared attribute when it calls sem_init(), but i've tried
>> patching it (replacing the port patchfile file the one I've attached)
>> and I still get the hang.
>=20
> Guys,
> it would be valuable if you do the following:
> 1) recompile your kernel with INVARIANTS, WITNESS and without =
WITNESS_SKIPSPIN
> 2a) If you have a serial console, please run the DDB stuff through it
> (go to point 3)
> 2b) If you don't have a serial console please run the DDB stuff in
> textdump (go to point 3)
> 3) Collect the following informations:
> - show allpcpu
> - show alllocks
> - ps
> - alltrace
> 3a) If you had the serial console (thus not textdump) please collect
> the coredump with: call doadump
> 4) reset your machine
>=20
> You will end up with the textdump or coredump + all the serial logs
> necessary to debug this.
> If you cannot reproduce your issue with WITNESS enabled, please remove
> from your kernel config and avoid to call 'show alllocks' when in DDB.
> But try to leave INVARIANTS on.
>=20
> Hope this helps,
> Attilio


This has just happened again, this time with MAKE_JOBS_NUMBER=3D1, so =
that workaround didn't work.

I don't have INVARIANTS or WITNESS compiled in, but I did fire up kgdb =
to poke around. The stack traces look identical. I don't know what to =
expect in these structures. If there's anything useful I can dig out =
here, please let me know.

However: A parent and child process both blocked waiting on semaphores =
smells like an user level bug to me.

Jan.



(kgdb) proc 24969
[Switching to thread 648 (Thread 101022)]#0  sched_switch =
(td=3D0xfffffe003de43000, newtd=3D0xfffffe000b501000, flags=3DVariable =
"flags" is not available.
)
    at =
/home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/sched_ule.c:18=
54
1854			cpuid =3D PCPU_GET(cpuid);
(kgdb) where
#0  sched_switch (td=3D0xfffffe003de43000, newtd=3D0xfffffe000b501000, =
flags=3DVariable "flags" is not available.
) at =
/home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/sched_ule.c:18=
54
#1  0xffffffff8083af24 in mi_switch (flags=3D260, newtd=3D0x0) at =
/home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/kern_synch.c:4=
48
#2  0xffffffff80872644 in sleepq_catch_signals =
(wchan=3D0xfffffe0015fca800, pri=3D0) at =
/home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/subr_sleepqueu=
e.c:425
#3  0xffffffff80872fb6 in sleepq_wait_sig (wchan=3DVariable "wchan" is =
not available.
) at =
/home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/subr_sleepqueu=
e.c:631
#4  0xffffffff8083b599 in _sleep (ident=3D0xfffffe0015fca800, =
lock=3D0xffffffff81114860, priority=3DVariable "priority" is not =
available.
) at =
/home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/kern_synch.c:2=
32
#5  0xffffffff8084ac69 in do_sem_wait (td=3DVariable "td" is not =
available.
) at =
/home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/kern_umtx.c:51=
3
#6  0xffffffff8084ad61 in __umtx_op_sem_wait (td=3D0xfffffe003de43000, =
uap=3D0xffffff8693d85bc0) at =
/home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/kern_umtx.c:32=
05
#7  0xffffffff80b17de0 in amd64_syscall (td=3D0xfffffe003de43000, =
traced=3D0) at subr_syscall.c:131
#8  0xffffffff80b03517 in Xfast_syscall () at =
/home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/amd64/amd64/excepti=
on.S:387
#9  0x00000008010277fc in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb) proc 24970
[Switching to thread 665 (Thread 100553)]#0  sched_switch =
(td=3D0xfffffe02f7240460, newtd=3D0xfffffe000b501460, flags=3DVariable =
"flags" is not available.
)
    at =
/home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/sched_ule.c:18=
54
1854			cpuid =3D PCPU_GET(cpuid);
(kgdb) where
#0  sched_switch (td=3D0xfffffe02f7240460, newtd=3D0xfffffe000b501460, =
flags=3DVariable "flags" is not available.
) at =
/home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/sched_ule.c:18=
54
#1  0xffffffff8083af24 in mi_switch (flags=3D260, newtd=3D0x0) at =
/home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/kern_synch.c:4=
48
#2  0xffffffff80872644 in sleepq_catch_signals =
(wchan=3D0xfffffe0015fd7380, pri=3D0) at =
/home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/subr_sleepqueu=
e.c:425
#3  0xffffffff80872fb6 in sleepq_wait_sig (wchan=3DVariable "wchan" is =
not available.
) at =
/home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/subr_sleepqueu=
e.c:631
#4  0xffffffff8083b599 in _sleep (ident=3D0xfffffe0015fd7380, =
lock=3D0xffffffff811145e0, priority=3DVariable "priority" is not =
available.
) at =
/home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/kern_synch.c:2=
32
#5  0xffffffff8084ac69 in do_sem_wait (td=3DVariable "td" is not =
available.
) at =
/home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/kern_umtx.c:51=
3
#6  0xffffffff8084ad61 in __umtx_op_sem_wait (td=3D0xfffffe02f7240460, =
uap=3D0xffffff8694b04bc0) at =
/home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/kern_umtx.c:32=
05
#7  0xffffffff80b17de0 in amd64_syscall (td=3D0xfffffe02f7240460, =
traced=3D0) at subr_syscall.c:131
#8  0xffffffff80b03517 in Xfast_syscall () at =
/home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/amd64/amd64/excepti=
on.S:387
#9  0x00000008010277fc in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb) up
#1  0xffffffff8083af24 in mi_switch (flags=3D260, newtd=3D0x0) at =
/home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/kern_synch.c:4=
48
448		sched_switch(td, newtd, flags);
(kgdb) up
#2  0xffffffff80872644 in sleepq_catch_signals =
(wchan=3D0xfffffe0015fd7380, pri=3D0) at =
/home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/subr_sleepqueu=
e.c:425
425			sleepq_switch(wchan, pri);
(kgdb) up
#3  0xffffffff80872fb6 in sleepq_wait_sig (wchan=3DVariable "wchan" is =
not available.
) at =
/home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/subr_sleepqueu=
e.c:631
631		rcatch =3D sleepq_catch_signals(wchan, pri);
(kgdb) up
#4  0xffffffff8083b599 in _sleep (ident=3D0xfffffe0015fd7380, =
lock=3D0xffffffff811145e0, priority=3DVariable "priority" is not =
available.
) at =
/home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/kern_synch.c:2=
32
232			rval =3D sleepq_wait_sig(ident, pri);
(kgdb) up
#5  0xffffffff8084ac69 in do_sem_wait (td=3DVariable "td" is not =
available.
) at =
/home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/kern_umtx.c:51=
3
warning: Source file is more recent than executable.

513		error =3D msleep(uq, &uc->uc_lock, PCATCH, wmesg, timo);
(kgdb) p *uq
$1 =3D {uq_link =3D {tqe_next =3D 0x0, tqe_prev =3D 0xfffffe0015fd2080}, =
uq_key =3D {hash =3D 186, type =3D 2, shared =3D 0, info =3D {shared =3D =
{object =3D 0xfffffe00162b0310, offset =3D 34380812388}, private =3D {
        vs =3D 0xfffffe00162b0310, addr =3D 34380812388}, both =3D {a =3D =
0xfffffe00162b0310, b =3D 34380812388}}}, uq_flags =3D 1, uq_thread =3D =
0xfffffe02f7240460, uq_pi_blocked =3D 0x0, uq_lockq =3D {
    tqe_next =3D 0x0, tqe_prev =3D 0x0}, uq_pi_contested =3D {tqh_first =
=3D 0x0, tqh_last =3D 0xfffffe0015fd73d8}, uq_inherited_pri =3D 255 '?', =
uq_spare_queue =3D 0x0, uq_cur_queue =3D 0xfffffe0015fd2080}
(kgdb) proc 24969
[Switching to thread 648 (Thread 101022)]#0  sched_switch =
(td=3D0xfffffe003de43000, newtd=3D0xfffffe000b501000, flags=3DVariable =
"flags" is not available.
)
    at =
/home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/sched_ule.c:18=
54
1854			cpuid =3D PCPU_GET(cpuid);
(kgdb) up
#1  0xffffffff8083af24 in mi_switch (flags=3D260, newtd=3D0x0) at =
/home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/kern_synch.c:4=
48
448		sched_switch(td, newtd, flags);
(kgdb) up
#2  0xffffffff80872644 in sleepq_catch_signals =
(wchan=3D0xfffffe0015fca800, pri=3D0) at =
/home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/subr_sleepqueu=
e.c:425
425			sleepq_switch(wchan, pri);
(kgdb) up
#3  0xffffffff80872fb6 in sleepq_wait_sig (wchan=3DVariable "wchan" is =
not available.
) at =
/home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/subr_sleepqueu=
e.c:631
631		rcatch =3D sleepq_catch_signals(wchan, pri);
(kgdb) up
#4  0xffffffff8083b599 in _sleep (ident=3D0xfffffe0015fca800, =
lock=3D0xffffffff81114860, priority=3DVariable "priority" is not =
available.
) at =
/home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/kern_synch.c:2=
32
232			rval =3D sleepq_wait_sig(ident, pri);
(kgdb) up
#5  0xffffffff8084ac69 in do_sem_wait (td=3DVariable "td" is not =
available.
) at =
/home/janm/p4/freebsd-image-std-2011.1/FreeBSD/src/sys/kern/kern_umtx.c:51=
3
513		error =3D msleep(uq, &uc->uc_lock, PCATCH, wmesg, timo);
(kgdb) p *uq
$2 =3D {uq_link =3D {tqe_next =3D 0x0, tqe_prev =3D 0xfffffe04fc73c280}, =
uq_key =3D {hash =3D 194, type =3D 2, shared =3D 0, info =3D {shared =3D =
{object =3D 0xfffffe001628d188, offset =3D 34380814884}, private =3D {
        vs =3D 0xfffffe001628d188, addr =3D 34380814884}, both =3D {a =3D =
0xfffffe001628d188, b =3D 34380814884}}}, uq_flags =3D 1, uq_thread =3D =
0xfffffe003de43000, uq_pi_blocked =3D 0x0, uq_lockq =3D {
    tqe_next =3D 0x0, tqe_prev =3D 0x0}, uq_pi_contested =3D {tqh_first =
=3D 0x0, tqh_last =3D 0xfffffe0015fca858}, uq_inherited_pri =3D 255 '?', =
uq_spare_queue =3D 0x0, uq_cur_queue =3D 0xfffffe04fc73c280}
(kgdb)=20




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?D5A00EE0-3671-4117-B5C4-891E0C65A20F>