Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 24 Aug 2018 21:53:28 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Gleb Popov <6yearold@gmail.com>
Cc:        freebsd-hackers <freebsd-hackers@freebsd.org>
Subject:   Re: Strange hang when calling signal()
Message-ID:  <20180824185328.GE2340@kib.kiev.ua>
In-Reply-To: <CALH631mtb1Z-4v4%2BUuzCx0tX0Zt5LfoQR9%2Biags-nL7MRUhGLA@mail.gmail.com>
References:  <CALH631mtb1Z-4v4%2BUuzCx0tX0Zt5LfoQR9%2Biags-nL7MRUhGLA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Aug 24, 2018 at 05:53:28PM +0300, Gleb Popov wrote:
> I'm debugging a Qt test app that hangs when launching a QProcess.
> 
> The parent does the following:
> 
> QProcess p;
> ...
> p.start();
> p.waitForStarted(-1); // wait indefinitely
> 
> Under the hood starting the QProcess involves creating a pair of pipes and
> forking:
> 
> qt_create_pipe(childStartedPipe);
> ...
> pid_t childPid;
> ::forkfd(FFD_CLOEXEC, &childPid);
> 
> and waiting for it to be started is just ppoll()'ing on the pipe
> 
> pollfd pfd = qt_make_pollfd(childStartedPipe[0], POLLIN);
> if (qt_poll_msecs(&pfd, 1, msecs) == 0) {
> ...
> 
> On the child side the code looks like
> 
> ::signal(SIGPIPE, SIG_DFL);
> ...
> qt_safe_close(childStartedPipe[0]);
> ...
> qt_safe_execv(argv[0], argv);
> 
> So, the problem is that after forking the parent process hangs on polling
> and child process hangs inside signal call; Here is the backtrace:
> 
> #0  _umtx_op_err () at
> /usr/src/lib/libthr/arch/amd64/amd64/_umtx_op_err.S:37
> #1  0x0000000802bd9571 in __thr_rwlock_rdlock (rwlock=0x802bf3200,
> flags=<optimized out>, tsp=<optimized out>) at
> /usr/src/lib/libthr/thread/thr_umtx.c:307
> #2  0x0000000802be24c0 in _thr_rwlock_rdlock (flags=0, tsp=0x0,
> rwlock=<optimized out>) at /usr/src/lib/libthr/thread/thr_umtx.h:232
> #3  _thr_rtld_rlock_acquire (lock=0x802bf3200) at
> /usr/src/lib/libthr/thread/thr_rtld.c:125
> #4  0x000000080024e63b in rlock_acquire (lock=0x80025f0a0 <rtld_locks>,
> lockstate=0x7fffffffc678) at /usr/src/libexec/rtld-elf/rtld_lock.c:208
> #5  0x00000008002472dd in _rtld_bind (obj=0x80027b000, reloff=4416) at
> /usr/src/libexec/rtld-elf/rtld.c:788
> #6  0x000000080024404d in _rtld_bind_start () at
> /usr/src/libexec/rtld-elf/amd64/rtld_start.S:121
> #7  0x0000000803d31a76 in QProcessPrivate::execChild (this=0x81a9716c0,
> workingDir=0x0, argv=0x81fde5760, envp=0x0) at io/qprocess_unix.cpp:537
> 
> Any idea what causes signal() to not return? I haven't extracted a minimal
> repro yet, wanted to ask for any clues first.
Immediate and not that useful answer is that the child process is trying
to acquire rtld bind lock, and the state of the lock is busy.

There was a constant stream of the bugs some time ago, where
multithreaded process forked and then tried to use services which
require some of the C runtime internal locks.  It did not helped that
POSIX allow most of this breakage.  Since state of the parent
process is usually not determinate at the time of fork, other thread
might have grabbed some of that locks (and made internal structures
inconsistent), which is inherited by the child. Then there is nobody in
the child to correct the damage (restore consistency and unlock).

Since then, we started locking most of that locks in parent around fork(2),
all the code in lib/libthr/thread/thr_fork.c.  In particular, we lock rtld,
malloc, and disable cancellation around fork.  So if your program used fork(2)
but ended with the broken rtld it is worrying.

On the other hand, we do not do that for vfork(2).  So yes, the minimal
reproduction case, in bare libc/libthr API (i.e. without QT), would be
the first step to diagnose and and might be fix.

> 
> The code in question is here:
> https://github.com/qt/qtbase/blob/5.11/src/corelib/io/qprocess_unix.cpp
> Relevant functions are QProcessPrivate::startProcess(),
> QProcessPrivate::execChild(), QProcessPrivate::waitForStarted().




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180824185328.GE2340>