From owner-freebsd-hackers@FreeBSD.ORG Fri Nov 22 13:36:09 2013 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 057ED2AC; Fri, 22 Nov 2013 13:36:09 +0000 (UTC) Received: from mx1.stack.nl (unknown [IPv6:2001:610:1108:5012::107]) (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 89B0324E8; Fri, 22 Nov 2013 13:36:08 +0000 (UTC) Received: from turtle.stack.nl (turtle.stack.nl [IPv6:2001:610:1108:5010::132]) by mx1.stack.nl (Postfix) with ESMTP id 060391203C8; Fri, 22 Nov 2013 14:35:54 +0100 (CET) Received: by turtle.stack.nl (Postfix, from userid 1677) id D2507CB4E; Fri, 22 Nov 2013 14:35:53 +0100 (CET) Date: Fri, 22 Nov 2013 14:35:53 +0100 From: Jilles Tjoelker To: Konstantin Belousov Subject: Re: Problem with signal 0 being delivered to SIGUSR1 handler Message-ID: <20131122133553.GA28457@stack.nl> References: <528DFEE6.6020504@gmail.com> <20131121211546.GQ59496@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20131121211546.GQ59496@kib.kiev.ua> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-hackers@freebsd.org, threads@freebsd.org, Vitaly Magerya , davidxu@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Nov 2013 13:36:09 -0000 On Thu, Nov 21, 2013 at 11:15:46PM +0200, Konstantin Belousov wrote: > On Thu, Nov 21, 2013 at 02:39:02PM +0200, Vitaly Magerya wrote: > > Hi, folks. I'm investigating a test case failure that devel/boehm-gc > > has on recent FreeBSD releases. The problem is that a signal > > handler registered for SIGUSR1 is sometimes called with signum=0, > > which should not be possible under any conditions. > > Here's a simple test case that demonstrates this behavior: > > /* Compile with 'c99 -o example example.c -pthread' > > */ > > #include > > #include > > #include > > #include > > > > void signal_handler(int signum, siginfo_t *si, void *context) { > > if (signum != SIGUSR1) { > > printf("bad signal, signum=%d\n", signum); > > exit(1); > > } > > } > > > > void *thread_func(void *arg) { > > return arg; > > } > > > > int main(void) { > > struct sigaction sa = { 0 }; > > sa.sa_flags = SA_SIGINFO; > > sa.sa_sigaction = signal_handler; > > if (sigfillset(&sa.sa_mask) != 0) abort(); > > if (sigaction(SIGUSR1, &sa, NULL) != 0) abort(); > > for (int i = 0; i < 10000; i++) { > > pthread_t t; > > pthread_create(&t, NULL, thread_func, NULL); > > pthread_kill(t, SIGUSR1); > Side note. pthread_kill(3) call behaviour is undefined if pthread_create(3) > in the line before failed. > > > } > > return 0; > > } > > Under FreeBSD 9.2-RELEASE amd64 I pretty consistently get > > "signum=0" from this program, but you may need to run it a few > > times or increase the number of iterations to see the same. > > Interestingly enough, I don't see this behavior under 9.0-RELEASE. This is because the bug was introduced with AVX support. (It also occurs on systems without AVX.) > > So, any ideas what the problem here is? > It happens when libthr deferred signal handling path is taken for signal > delivery and for some reason the code inside the deferred path called > into rtld for symbol binding. Than, rtld lock is locked, some code in > rtld is executed, and rtld lock is unlocked. Unlock causes _thr_ast() > run, which results in the nested check_deferred_signal() execution. > The check_deferred_signal() clearks si_signo, so on return the same > signal is delivered one more time, but is advertized as signo zero. > The _thr_rtld_init() approach of doing dummy calls does not really work, > since it is not practically possible to enumerate the symbols needed > during signal delivery. > My first attempt to fix this was to increment curthread->critical_count > around the calls to check_* functions in the _thr_ast(), but it causes > reverse problem of losing _thr_ast() runs on unlock. > I ended up with the flag to indicate that deferred delivery is running, > so check_deferred_signal() should avoid doing anything. A delicate > moment is that user signal handler is allowed to modify the passed > machine context to result the return from the signal handler to cause > arbitrary jump, or just do longjmp(). For this case, I also clear the > flag in thr_sighandler(), since kernel signal delivery means that nested > delivery code should not run right now. This analysis suggests an easier approach: just move the check for deferred_siginfo.si_signo == 0 downward. If __fillcontextx2 or sysarch need to be looked up by rtld, the resulting _thr_ast() will invoke the signal handler and the original call to check_deferred_signal() will do nothing. This patch fixes the problem for me on stable/9 and head. Index: lib/libthr/thread/thr_sig.c =================================================================== --- lib/libthr/thread/thr_sig.c (revision 258178) +++ lib/libthr/thread/thr_sig.c (working copy) @@ -326,12 +326,12 @@ check_deferred_signal(struct pthread *curthread) uc_len = __getcontextx_size(); uc = alloca(uc_len); getcontext(uc); - if (curthread->deferred_siginfo.si_signo == 0) - return; __fillcontextx2((char *)uc); act = curthread->deferred_sigact; uc->uc_sigmask = curthread->deferred_sigmask; memcpy(&info, &curthread->deferred_siginfo, sizeof(siginfo_t)); + if (curthread->deferred_siginfo.si_signo == 0) + return; /* remove signal */ curthread->deferred_siginfo.si_signo = 0; handle_signal(&act, info.si_signo, &info, uc); -- Jilles Tjoelker