From owner-freebsd-hackers@FreeBSD.ORG Wed Apr 23 20:01:42 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A43C5901; Wed, 23 Apr 2014 20:01:42 +0000 (UTC) Received: from mail-lb0-x234.google.com (mail-lb0-x234.google.com [IPv6:2a00:1450:4010:c04::234]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id E3B7C11C2; Wed, 23 Apr 2014 20:01:41 +0000 (UTC) Received: by mail-lb0-f180.google.com with SMTP id 10so1194541lbg.39 for ; Wed, 23 Apr 2014 13:01:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:mime-version:content-type :content-disposition:user-agent; bh=r1a5Gyc8aw7x+VV8MBeOL9qwnHa3DPN9RFkVBpfDf6o=; b=fjjRBljbZMCMTaqDzc7MilqEG8cT7Jm01OjZxl2XIDLCJnGBIYR2spmj7lFkMtNked muKxkVwVKLUcqZ8Q0OcBO/a2M4VqqDoisHzCmOm3tuje8JUgVZMcudWE8IzGRjbpFW43 UZ05S0B8EN7XTq0wTkzczAbvLJj6LaS055bUAvXyYhVmBZB1O0u0Ia8FrXHEFk4ihYdA TlZ7VtflTxhaTsW7w0GDUH++nkaVQBjW0IzhUvTVzfg2wP2ZAM217ZJYqGzgf/R/n3ot IU0KaJ2r3NCQPhC2ov4hgJIu5jpzjTlUoMfwXXnKxPMAsi+iuJqnU4HLmnJ8UxWYsf2i s1aA== X-Received: by 10.152.23.233 with SMTP id p9mr3330997laf.31.1398283299816; Wed, 23 Apr 2014 13:01:39 -0700 (PDT) Received: from localhost ([178.150.115.244]) by mx.google.com with ESMTPSA id z2sm1794519lae.7.2014.04.23.13.01.38 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 23 Apr 2014 13:01:39 -0700 (PDT) Sender: Mikolaj Golub Date: Wed, 23 Apr 2014 23:01:36 +0300 From: Mikolaj Golub To: freebsd-hackers@freebsd.org Subject: valgrind on amd64 crashes when delivering signal for threaded application Message-ID: <20140423200135.GA6009@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.22 (2013-10-16) Cc: Stanislav Sedov X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Apr 2014 20:01:42 -0000 I am observing an issue with valgrind on amd64 CURRENT or 10, when it crashes the application delivering an asynchronous signal, if the application is linked with libthr. This simple test illustrate the issue. #include #include #include static void dummy_sighandler(int sig) { /* EMPTY */ } int main() { int c = 10; if (signal(SIGINT, dummy_sighandler) == SIG_ERR) return (1); sleep(100); return (0); } Building with -lpthread, running under valgrind and pressing Ctr-C makes it crash: kopusha:~/freebsd/valgrind/test_sa% valgrind --trace-signals=yes ./test_sa ==55627== Memcheck, a memory error detector ==55627== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. ==55627== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info ==55627== Command: ./test_sa ==55627== --55627-- Max kernel-supported signal is 128 --55627-- sync signal handler: signal=11, si_code=1, EIP=0x23822, eip=0x4012e99a8, from kernel --55627-- SIGSEGV: si_code=1 faultaddr=0x7feffef08 tid=1 ESP=0x7feffef08 seg=0x7fe001000-0x7feffefff --55627-- -> extended stack base to 0x7feffe000 --55627-- do_setmask: tid = 1 how = 1 (SIG_BLOCK), newset = 0x22C4F8 (fffffffffffffffffffffffffffff107) --55627-- oldset=0x7FEFFFC60 00000000000000000000000000000000 --55627-- do_setmask: tid = 1 how = 3 (SIG_SETMASK), newset = 0x22C50C (00000000000000000000000000000000) --55627-- do_setmask: tid = 1 how = 1 (SIG_BLOCK), newset = 0x22C4F8 (fffffffffffffffffffffffffffff107) --55627-- oldset=0x7FEFFF7F0 00000000000000000000000000000000 --55627-- do_setmask: tid = 1 how = 3 (SIG_SETMASK), newset = 0x22C50C (00000000000000000000000000000000) --55627-- do_setmask: tid = 1 how = 1 (SIG_BLOCK), newset = 0x22C4F8 (fffffffffffffffffffffffffffff107) --55627-- oldset=0x7FEFFF7F0 00000000000000000000000000000000 --55627-- do_setmask: tid = 1 how = 3 (SIG_SETMASK), newset = 0x22C50C (00000000000000000000000000000000) --55627-- do_setmask: tid = 1 how = 1 (SIG_BLOCK), newset = 0x22C4F8 (fffffffffffffffffffffffffffff107) --55627-- oldset=0x7FEFFF7F0 00000000000000000000000000000000 --55627-- do_setmask: tid = 1 how = 3 (SIG_SETMASK), newset = 0x22C50C (00000000000000000000000000000000) --55627-- sys_sigaction: sigNo 32, new 0x7fefff7b8, old 0x0, new flags 0x40 --55627-- do_setmask: tid = 1 how = 2 (SIG_UNBLOCK), newset = 0x7FEFFF7C4 (00000000000000000000000080000000) --55627-- do_setmask: tid = 1 how = 1 (SIG_BLOCK), newset = 0x1220D18 (ffffffffffffffffffffffffffffffff) --55627-- oldset=0x1C00128 00000000000000000000000000000000 --55627-- do_setmask: tid = 1 how = 3 (SIG_SETMASK), newset = 0x1C00128 (00000000000000000000000000000000) --55627-- do_setmask: tid = 1 how = 3 (SIG_SETMASK), newset = 0x1220D18 (ffffffffffffffffffffffffffffffff) --55627-- oldset=0x7FF0005B0 00000000000000000000000000000000 --55627-- sys_sigaction: sigNo 2, new 0x7ff000600, old 0x7ff0005e0, new flags 0x42 --55627-- do_setmask: tid = 1 how = 3 (SIG_SETMASK), newset = 0x7FF0005B0 (00000000000000000000000000000000) ^C--55627-- async signal handler: signal=2, tid=1, si_code=65542 --55627-- interrupted_syscall: tid=1, ip=0x380ca816, restart=True, sres.isErr=True, sres.val=4 --55627-- completed, but uncommitted: committing --55627-- delivering signal 2 (SIGINT):65542 to thread 1 --55627-- push_signal_frame (thread 1): signal 2 ==55627== at 0x1541A4A: nanosleep (nanosleep.S:3) ==55627== by 0x1492B29: sleep (sleep.c:58) ==55627== by 0x1217C12: sleep (thr_syscalls.c:614) ==55627== by 0x4007D7: main (test_sa.c:19) --55627-- sys_sigaction: sigNo 11, new 0x4012c3e78, old 0x0, new flags 0x0 --55627-- delivering signal 11 (SIGSEGV):128 to thread 1 --55627-- delivering 11 (code 128) to default handler; action: terminate+core ==55627== ==55627== Process terminating with default action of signal 11 (SIGSEGV): dumping core ==55627== General Protection Fault ==55627== at 0x1219F3C: ??? (thr_sig.c:162) ==55627== by 0x380529C7: ??? (m_trampoline.S:713) ==55627== by 0x1217C12: sleep (thr_syscalls.c:614) ==55627== by 0x4007D7: main (test_sa.c:19) ==55627== ==55627== HEAP SUMMARY: ==55627== in use at exit: 1,080 bytes in 2 blocks ==55627== total heap usage: 2 allocs, 0 frees, 1,080 bytes allocated ==55627== ==55627== LEAK SUMMARY: ==55627== definitely lost: 0 bytes in 0 blocks ==55627== indirectly lost: 0 bytes in 0 blocks ==55627== possibly lost: 0 bytes in 0 blocks ==55627== still reachable: 1,080 bytes in 2 blocks ==55627== suppressed: 0 bytes in 0 blocks ==55627== Rerun with --leak-check=full to see details of leaked memory ==55627== ==55627== For counts of detected and suppressed errors, rerun with: -v ==55627== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) zsh: segmentation fault valgrind --trace-signals=yes ./test_sa I tracked it to r249423 (import of clang 3.3), which optimizes this statement in the signal handler wrapper from thr_sig.c: static void thr_sighandler(int sig, siginfo_t *info, void *_ucp) { ... struct sigaction act; ... act = _thr_sigact[sig-1].sigact; into a sequence of movups/movaps instructions: 0x000000000000dc2f <+79>: movups (%r14,%r15,1),%xmm0 0x000000000000dc34 <+84>: movups 0x10(%r14,%r15,1),%xmm1 0x000000000000dc3a <+90>: movaps %xmm1,-0x40(%rbp) 0x000000000000dc3e <+94>: movaps %xmm0,-0x50(%rbp) I have lost in valgrind signal handling details, but apparently the frame for thr_sighandler() is misaligned when running by valgrind and as a result the movaps operand (the destination of act local variable) is not aligned on a 16-byte boundary. The prblem may be workarounded either by compiling thr_sig.c without optimization or replacing the assignment by bcopy(). Also, changing the alignment of the sigframe the valgrind pushes on the stack when delivering a signal to 8 bytes fixes the issue: --- coregrind/m_sigframe/sigframe-amd64-freebsd.c.orig 2014-04-23 22:39:45.000000000 +0300 +++ coregrind/m_sigframe/sigframe-amd64-freebsd.c 2014-04-23 22:40:23.000000000 +0300 @@ -250,7 +250,7 @@ static Addr build_sigframe(ThreadState * UWord err; rsp -= sizeof(*frame); - rsp = VG_ROUNDDN(rsp, 16); + rsp = VG_ROUNDDN(rsp, 16) - 8; frame = (struct sigframe *)rsp; if (!extend(tst, rsp, sizeof(*frame))) Unfortunately, I have poor understanding of valgrind internals and what is going on exactly when it delivers a signal to the process, so failed to find a proper fix. -- Mikolaj Golub