Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 11 Jan 2012 08:51:16 -0800
From:      Adrian Chadd <adrian@freebsd.org>
To:        Garrett Cooper <yanegomi@gmail.com>
Cc:        freebsd-hackers@freebsd.org, Xin LI <delphij@delphij.net>, Ivan Voras <ivoras@freebsd.org>, davidxu@freebsd.org
Subject:   Re: sem(4) lockup in python?
Message-ID:  <CAJ-VmonLVKH8RuYM8RSKfydQNGZxPg2k3YLTOvPxgF3xcgHzwg@mail.gmail.com>
In-Reply-To: <CAGH67wRsek2-WY_ETW6QEER1r5dDXLXfDjbzpHMjtv059Y8cJw@mail.gmail.com>
References:  <jejrbe$or8$1@dough.gmane.org> <201201110806.30620.jhb@freebsd.org> <CAF-QHFWFvYTPeM68Mk%2BOYVX--MNhKOJ2o1GF9ZOsBmtiC5fYFQ@mail.gmail.com> <CAGH67wRsek2-WY_ETW6QEER1r5dDXLXfDjbzpHMjtv059Y8cJw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
... yes, enable WITNESS already and see if you can find LORs. :)

(Sheesh, that's what it's for! :)


Adrian

On 11 January 2012 08:47, Garrett Cooper <yanegomi@gmail.com> wrote:
> On Wed, Jan 11, 2012 at 6:33 AM, Ivan Voras <ivoras@freebsd.org> wrote:
>> On 11 January 2012 14:06, John Baldwin <jhb@freebsd.org> wrote:
>>> On Wednesday, January 11, 2012 6:21:18 am Ivan Voras wrote:
>>>> The lang/python27 port can optionally be built with the support for
>>>> POSIX semaphores - i.e. sem(4). This option is labeled as experimental
>>>> so it may be that the code is simply incorrect. I've tried it and get
>>>> frequent hangs with the python process in the "usem" state. The kernel
>>>> stack is as follows and looks reasonable:
>>>>
>>>> # procstat -kk 19008
>>>> =A0 =A0PID =A0 =A0TID COMM =A0 =A0 =A0 =A0 =A0 =A0 TDNAME =A0 =A0 =A0 =
=A0 =A0 KSTACK
>>>>
>>>> 19008 101605 python =A0 =A0 =A0 =A0 =A0 - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0mi_switch+0x174
>>>> sleepq_catch_signals+0x2f4 sleepq_wait_sig+0x16 _sleep+0x269
>>>> do_sem_wait+0xa19 __umtx_op_sem_wait+0x51 amd64_syscall+0x450
>>>> Xfast_syscall+0xf7
>>>>
>>>> The process doesn't react to SIGINT or SIGTERM but fortunately reacts =
to
>>>> SIGKILL.
>>>>
>>>> This could be an error in Python code but OTOH this code is not
>>>> FreeBSD-specific so it's unlikely.
>>>
>>> This is using the new umtx-based semaphore code that David Xu wrote. =
=A0He is
>>> probably the best person to ask (cc'd).
>>>
>>
>> Ok, I've encountered the problem repeatedly while building databases/tdb=
:
>> =A0it uses Python in the build process (but maybe it needs something els=
e in
>> parallel to provoke the problem).
>
> Glad to see that iXsystems isn't the only one ([1] -- please add a "me
> too" to the PR). The problem is that we do FreeNAS nightlies and they
> frequently get stuck building tdb (10%~20% of the time) and it sticks
> when doing interactive builds as well. The issue appears to be
> exacerbated when we have more builds running in parallel on the same
> machine. I've also run into the same issue compiling talloc because it
> uses the same waf infrastructure as tdb, which was designed to "speed
> things up by forcing builds to be parallelized" (It builds
> kern.smp.ncpus jobs instead of -j 1). Furthermore, it seems to occur
> regardless of whether or not we have the WITH_SEM enabled in python or
> not (build.ix's copy of python doesn't have it enabled, but
> streetfighter.ix, my system bayonetta, etc do).
>
> I haven't actually enabled WITNESS or the deadlock resolver and
> checked for LORs / deadlocks, but that might be an alternate avenue to
> pursue in debugging the issue; my gut is that the issue exists within
> the code that handles the subprocessing stuff and/or the GIL stuff in
> the python interpreter and that the race condition between a command
> actually finishing and not is relatively small (in most cases) and in
> most cases python's code wins and continues on as usual. It could also
> be some non-threadsafe code trying to run in parallel touching things
> that it shouldn't in the python interpreter. It would also be
> interesting to see what python3k brings to the table, but using that
> would be introducing some extra unknowns into the equation.
>
> It can be reproduced by running continuous builds of talloc or tdb.
>
> Thanks!
> -Garrett
>
> 1. http://www.freebsd.org/cgi/query-pr.cgi?pr=3Dports/163489
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org=
"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-VmonLVKH8RuYM8RSKfydQNGZxPg2k3YLTOvPxgF3xcgHzwg>