Date: Wed, 24 Nov 2004 01:02:41 -0600 From: Craig Boston <craig@tobuj.gank.org> To: freebsd-hackers@freebsd.org Cc: freebsd-threads@freebsd.org Subject: SSE vs. stack alignment vs. pthread Message-ID: <200411240102.42269.craig@tobuj.gank.org>
next in thread | raw e-mail | index | archive | help
First of all, I'd like to apologize for cross-posting to -hackers and -threads. I'm not sure yet if this is an application bug, a gcc bug, or a pthreads bug, so here goes... I'm currently working on the audacity port. It's up to 1.2.3, but I want to get a problem I've observed with 1.2.2 resolved to make sure that it doesn't crop up later or affect other software... Long story short, audacity is a threaded program. A straight compile of 1.2.2 results in a 100% reproducible bus error that happens on multiple Pentium-4 machines (5.3-STABLE). It always happens at this instruction: 0x081807c4: movaps %xmm0,0xffffff68(%ebp) Now, at that time ebp is 0xbfadc6c0, so ebp+0xffffff68 (-0x152) is 0xbfadc56e. Oops, that's not 16-byte aligned like SSE wants. The offsets vary sligthly depending on the compile flags, etc., but the result is always the same -- SIGBUS. My first suspicion was compiler bug. Audacity doesn't inline any SSE code itself -- the movaps is being generated by gcc as part of the pentium4 optimizations. There are two factors that are a little suspicious, though. 1) When I switch out libpthread for libc_r, the crash goes away. Unfortunately, the gdb in 5.3 seems to have forgotten how to debug libc_r based programs so I can't really tell what is different in that case. I just get "Cannot find thread 2: Thread ID=1, generic error". 2) Some searching turned up several similar problems on Linux and NetBSD. The NetBSD post here [http://mail-index.netbsd.org/port-amd64/2004/02/27/0001.html] indicates that it may be related to stack alignment in the thread library. I'm not sure if the ABI requirement discussed there is NetBSD and/or amd64 specific though. HOWEVER -- I inserted some debugging printfs into libpthread to test this theory. The stack it allocates for that thread is located at 0xbfaad000, which is not only 16-byte aligned but page aligned... So I'm reluctant to blame libpthread as it seems to be doing everything right and even going the extra mile. I honestly don't know whether gcc is expecting the alignment to compensate for the return address push or the function prolog, or if it's just losing track of where the stack should be somewhere. I may be over-analyzing the problem at that point :) Another factor to consider is that nobody has reported similar problems in other software... I've been trying to create a simple test case, however it's proving quite difficult to coax gcc into generating SSE code on its own where I want it. It's of course possible that Audacity itself is doing something weird to cause it, but I haven't been able to find anything suspicious or low-level enough to affect the stack alignment. It could just be a heisenbug, and libc_r is different enough to mask the problem. Any and all suggestions from threads/compiler gurus would be very much appreciated. I'm about ready to throw in the towel and just force "-mno-sse -mno-sse2" compiler flags in the makefile... Thanks, Craig
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200411240102.42269.craig>