From owner-freebsd-hackers@FreeBSD.ORG Thu Apr 24 03:33:35 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C7B2A8C4; Thu, 24 Apr 2014 03:33:35 +0000 (UTC) Received: from dmz-mailsec-scanner-5.mit.edu (dmz-mailsec-scanner-5.mit.edu [18.7.68.34]) by mx1.freebsd.org (Postfix) with ESMTP id DE6FB1CE3; Thu, 24 Apr 2014 03:33:34 +0000 (UTC) X-AuditID: 12074422-f79186d00000135a-17-535884dc9ad6 Received: from mailhub-auth-4.mit.edu ( [18.7.62.39]) (using TLS with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by dmz-mailsec-scanner-5.mit.edu (Symantec Messaging Gateway) with SMTP id 71.59.04954.CD488535; Wed, 23 Apr 2014 23:28:28 -0400 (EDT) Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) by mailhub-auth-4.mit.edu (8.13.8/8.9.2) with ESMTP id s3O3SROe009786; Wed, 23 Apr 2014 23:28:28 -0400 Received: from multics.mit.edu (system-low-sipb.mit.edu [18.187.2.37]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.8/8.12.4) with ESMTP id s3O3SP4J002206 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Wed, 23 Apr 2014 23:28:27 -0400 Received: (from kaduk@localhost) by multics.mit.edu (8.12.9.20060308) id s3O3SP6H007560; Wed, 23 Apr 2014 23:28:25 -0400 (EDT) Date: Wed, 23 Apr 2014 23:28:25 -0400 (EDT) From: Benjamin Kaduk To: Mikolaj Golub Subject: Re: valgrind on amd64 crashes when delivering signal for threaded application In-Reply-To: <20140423200135.GA6009@gmail.com> Message-ID: References: <20140423200135.GA6009@gmail.com> User-Agent: Alpine 1.10 (GSO 962 2008-03-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrDIsWRmVeSWpSXmKPExsUixG6nrnunJSLYoGu9hMX2zf8YLbquXGSz mHzsMLsDs8eMT/NZAhijuGxSUnMyy1KL9O0SuDJOLI8taBWv+HbjGXsD4yKhLkZODgkBE4kN D44xQ9hiEhfurWfrYuTiEBKYzSTxcN89FghnI6PEr/sTGCGcQ0wSV5cvZIJwGhgljpzeyArS zyKgLXHww16wWWwCKhIz32xkA7FFBNQltm58ywRiMws4S5x8dIIFxBYWiJD48LoLaCoHB6eA nsTK6YYgYV4BR4kTn3+BlQgJ6Er8P3wYrFVUQEdi9f4pLBA1ghInZz5hgRhpKXHuz3W2CYyC s5CkZiFJLWBkWsUom5JbpZubmJlTnJqsW5ycmJeXWqRrqpebWaKXmlK6iREcqC5KOxh/HlQ6 xCjAwajEw3vgQniwEGtiWXFl7iFGSQ4mJVHeWU0RwUJ8SfkplRmJxRnxRaU5qcWHGCU4mJVE eA/WAOV4UxIrq1KL8mFS0hwsSuK8b62tgoUE0hNLUrNTUwtSi2CyMhwcShK8V5uBGgWLUtNT K9Iyc0oQ0kwcnCDDeYCGrwap4S0uSMwtzkyHyJ9iVJQS51UHSQiAJDJK8+B6YYnkFaM40CvC vDtBqniASQiu+xXQYCagwQUTwkEGlyQipKQaGNV2ry9SDjsybafjnoSX3Gvn7PX12M58ydZ/ ZgZ70MsJd3OmXNXq6Xrn4zddTvZy6/rAkqjInzOXbql89V1qxQSPOs1Th08/41ubY5Lr5b+3 fLHPglULmnyur2u+onjvuNTsQMZL2xbEdKRzL+ft1fOZmfNZd4Zm1kuOY+khW9T7M/ftzPj6 VUiJpTgj0VCLuag4EQAl5zZD/wIAAA== Cc: Stanislav Sedov , freebsd-hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Apr 2014 03:33:35 -0000 On Wed, 23 Apr 2014, Mikolaj Golub wrote: > I am observing an issue with valgrind on amd64 CURRENT or 10, when it I cannot remember whether we changed the stack alignment on one or both of i386 and amd64 when we switched to clang; I think we did, but am having trouble finding it in the archives. Though, I think it would have been to match what clang does by default on linux, which would not really help explain the weird behavior from valgrind. > I tracked it to r249423 (import of clang 3.3), which optimizes > this statement in the signal handler wrapper from thr_sig.c: > > static void > thr_sighandler(int sig, siginfo_t *info, void *_ucp) > { > ... > struct sigaction act; > ... > act = _thr_sigact[sig-1].sigact; > > into a sequence of movups/movaps instructions: > > 0x000000000000dc2f <+79>: movups (%r14,%r15,1),%xmm0 > 0x000000000000dc34 <+84>: movups 0x10(%r14,%r15,1),%xmm1 > 0x000000000000dc3a <+90>: movaps %xmm1,-0x40(%rbp) > 0x000000000000dc3e <+94>: movaps %xmm0,-0x50(%rbp) > > I have lost in valgrind signal handling details, but apparently the > frame for thr_sighandler() is misaligned when running by valgrind and > as a result the movaps operand (the destination of act local variable) > is not aligned on a 16-byte boundary. > > The prblem may be workarounded either by compiling thr_sig.c without > optimization or replacing the assignment by bcopy(). > > Also, changing the alignment of the sigframe the valgrind pushes on > the stack when delivering a signal to 8 bytes fixes the issue: > > --- coregrind/m_sigframe/sigframe-amd64-freebsd.c.orig 2014-04-23 22:39:45.000000000 +0300 > +++ coregrind/m_sigframe/sigframe-amd64-freebsd.c 2014-04-23 22:40:23.000000000 +0300 > @@ -250,7 +250,7 @@ static Addr build_sigframe(ThreadState * > UWord err; > > rsp -= sizeof(*frame); > - rsp = VG_ROUNDDN(rsp, 16); > + rsp = VG_ROUNDDN(rsp, 16) - 8; I would expect that the fact that this patch fixes the observed crash means that valgrind has a bug when setting up the stack for the signal handler. I had to work around an apparently similar bug in the built-in lightweight thread implementation in net/openafs by forcing -mstack-realign to be used for its compilation (because analyzing the lightweight threads implementation when upstream is trying to switch to pthreads is not worth the effort). I guess here the thing to try would be compiling libthr with -mstack-realign, not that that is a reasonable thing to do in head. Perhaps the valgrind upstream should be asked about the details of the stack creation? -Ben > frame = (struct sigframe *)rsp; > > if (!extend(tst, rsp, sizeof(*frame))) > > Unfortunately, I have poor understanding of valgrind internals and > what is going on exactly when it delivers a signal to the process, so > failed to find a proper fix.