From owner-freebsd-hackers@FreeBSD.ORG  Fri Nov 22 13:36:09 2013
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 057ED2AC;
 Fri, 22 Nov 2013 13:36:09 +0000 (UTC)
Received: from mx1.stack.nl (unknown [IPv6:2001:610:1108:5012::107])
 (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 89B0324E8;
 Fri, 22 Nov 2013 13:36:08 +0000 (UTC)
Received: from turtle.stack.nl (turtle.stack.nl [IPv6:2001:610:1108:5010::132])
 by mx1.stack.nl (Postfix) with ESMTP id 060391203C8;
 Fri, 22 Nov 2013 14:35:54 +0100 (CET)
Received: by turtle.stack.nl (Postfix, from userid 1677)
 id D2507CB4E; Fri, 22 Nov 2013 14:35:53 +0100 (CET)
Date: Fri, 22 Nov 2013 14:35:53 +0100
From: Jilles Tjoelker <jilles@stack.nl>
To: Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: Problem with signal 0 being delivered to SIGUSR1 handler
Message-ID: <20131122133553.GA28457@stack.nl>
References: <528DFEE6.6020504@gmail.com> <20131121211546.GQ59496@kib.kiev.ua>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20131121211546.GQ59496@kib.kiev.ua>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-hackers@freebsd.org, threads@freebsd.org,
 Vitaly Magerya <vmagerya@gmail.com>, davidxu@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 22 Nov 2013 13:36:09 -0000

On Thu, Nov 21, 2013 at 11:15:46PM +0200, Konstantin Belousov wrote:
> On Thu, Nov 21, 2013 at 02:39:02PM +0200, Vitaly Magerya wrote:
> > Hi, folks. I'm investigating a test case failure that devel/boehm-gc
> > has on recent FreeBSD releases. The problem is that a signal
> > handler registered for SIGUSR1 is sometimes called with signum=0,
> > which should not be possible under any conditions.

> > Here's a simple test case that demonstrates this behavior:

> > /* Compile with 'c99 -o example example.c -pthread'
> >  */
> > #include <pthread.h>
> > #include <signal.h>
> > #include <stdio.h>
> > #include <stdlib.h>
> > 
> > void signal_handler(int signum, siginfo_t *si, void *context) {
> >     if (signum != SIGUSR1) {
> >         printf("bad signal, signum=%d\n", signum);
> >         exit(1);
> >     }
> > }
> > 
> > void *thread_func(void *arg) {
> >     return arg;
> > }
> > 
> > int main(void) {
> >     struct sigaction sa = { 0 };
> >     sa.sa_flags = SA_SIGINFO;
> >     sa.sa_sigaction = signal_handler;
> >     if (sigfillset(&sa.sa_mask) != 0) abort();
> >     if (sigaction(SIGUSR1, &sa, NULL) != 0) abort();
> >     for (int i = 0; i < 10000; i++) {
> >         pthread_t t;
> >         pthread_create(&t, NULL, thread_func, NULL);
> >         pthread_kill(t, SIGUSR1);
> Side note.  pthread_kill(3) call behaviour is undefined if pthread_create(3)
> in the line before failed.
> 
> >     }
> >     return 0;
> > }

> > Under FreeBSD 9.2-RELEASE amd64 I pretty consistently get
> > "signum=0" from this program, but you may need to run it a few
> > times or increase the number of iterations to see the same.

> > Interestingly enough, I don't see this behavior under 9.0-RELEASE.

This is because the bug was introduced with AVX support. (It also occurs
on systems without AVX.)

> > So, any ideas what the problem here is?

> It happens when libthr deferred signal handling path is taken for signal
> delivery and for some reason the code inside the deferred path called
> into rtld for symbol binding. Than, rtld lock is locked, some code in
> rtld is executed, and rtld lock is unlocked. Unlock causes _thr_ast()
> run, which results in the nested check_deferred_signal() execution.
> The check_deferred_signal() clearks si_signo, so on return the same
> signal is delivered one more time, but is advertized as signo zero.

> The _thr_rtld_init() approach of doing dummy calls does not really work,
> since it is not practically possible to enumerate the symbols needed
> during signal delivery.

> My first attempt to fix this was to increment curthread->critical_count
> around the calls to check_* functions in the _thr_ast(), but it causes
> reverse problem of losing _thr_ast() runs on unlock.

> I ended up with the flag to indicate that deferred delivery is running,
> so check_deferred_signal() should avoid doing anything. A delicate
> moment is that user signal handler is allowed to modify the passed
> machine context to result the return from the signal handler to cause
> arbitrary jump, or just do longjmp(). For this case, I also clear the
> flag in thr_sighandler(), since kernel signal delivery means that nested
> delivery code should not run right now.

This analysis suggests an easier approach: just move the check for
deferred_siginfo.si_signo == 0 downward. If __fillcontextx2 or sysarch
need to be looked up by rtld, the resulting _thr_ast() will invoke the
signal handler and the original call to check_deferred_signal() will do
nothing.

This patch fixes the problem for me on stable/9 and head.

Index: lib/libthr/thread/thr_sig.c
===================================================================
--- lib/libthr/thread/thr_sig.c	(revision 258178)
+++ lib/libthr/thread/thr_sig.c	(working copy)
@@ -326,12 +326,12 @@ check_deferred_signal(struct pthread *curthread)
 	uc_len = __getcontextx_size();
 	uc = alloca(uc_len);
 	getcontext(uc);
-	if (curthread->deferred_siginfo.si_signo == 0)
-		return;
 	__fillcontextx2((char *)uc);
 	act = curthread->deferred_sigact;
 	uc->uc_sigmask = curthread->deferred_sigmask;
 	memcpy(&info, &curthread->deferred_siginfo, sizeof(siginfo_t));
+	if (curthread->deferred_siginfo.si_signo == 0)
+		return;
 	/* remove signal */
 	curthread->deferred_siginfo.si_signo = 0;
 	handle_signal(&act, info.si_signo, &info, uc);

-- 
Jilles Tjoelker