From owner-svn-src-head@FreeBSD.ORG Mon Apr 9 00:26:19 2012 Return-Path: Delivered-To: svn-src-head@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 668B8106564A; Mon, 9 Apr 2012 00:26:19 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail35.syd.optusnet.com.au (mail35.syd.optusnet.com.au [211.29.133.51]) by mx1.freebsd.org (Postfix) with ESMTP id DD8E38FC0C; Mon, 9 Apr 2012 00:26:18 +0000 (UTC) Received: from c211-30-171-136.carlnfd1.nsw.optusnet.com.au (c211-30-171-136.carlnfd1.nsw.optusnet.com.au [211.30.171.136]) by mail35.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q390QApi006737 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 9 Apr 2012 10:26:11 +1000 Date: Mon, 9 Apr 2012 10:26:10 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: David Schultz In-Reply-To: <20120407165729.GA2737@zim.MIT.EDU> Message-ID: <20120409084336.A1308@besplex.bde.org> References: <201202282217.q1SMHrIk094780@svn.freebsd.org> <201203012347.32984.tijl@freebsd.org> <20120302132403.P929@besplex.bde.org> <201203022231.43186.tijl@freebsd.org> <20120407165729.GA2737@zim.MIT.EDU> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: svn-src-head@FreeBSD.ORG, Tijl Coosemans , src-committers@FreeBSD.ORG, svn-src-all@FreeBSD.ORG, Bruce Evans Subject: Re: svn commit: r232275 - in head/sys: amd64/include i386/include pc98/include x86/include X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Apr 2012 00:26:19 -0000 On Sat, 7 Apr 2012, David Schultz wrote: > On Fri, Mar 02, 2012, Tijl Coosemans wrote: Hmm, old news. I think I already applied, but now notice some more details. >> Thanks, that was quite informative. C11 does say something about the >> FP env and signals now though: >> >> ``When the processing of the abstract machine is interrupted by receipt >> of a signal, the values of objects that are neither lock-free atomic >> objects nor of type volatile sig_atomic_t are unspecified, as is the >> state of the floating-point environment. The value of any object >> modified by the handler that is neither a lock-free atomic object nor >> of type volatile sig_atomic_t becomes indeterminate when the handler >> exits, as does the state of the floating-point environment if it is >> modified by the handler and not restored to its original state.'' This apparently allows signal handlers to be called with the FP env is in an undefined state (as in FreeBSD-4). But this is a large change relative to C99, since C99 says nothing about the floating point state for signal handlers, and its abstract machine requires FP expressions like "auto double four = 2.0 + 2.0;" to work. Does "unspecified" include "undefined", or does the requirement for the abstract machine to not give undefined behaviour have precedence over the allowance for the FP env to be anything? >> This means a signal handler must not rely on the state of the FP env. >> It may install its own FP env if needed (e.g. FE_DFL_ENV), but then it >> must restore the original before returning. This allows for the >> rounding mode to be silently modified for integer conversions for >> instance. >> >> If longjmp is not supposed to change the FP env then, when called from >> a signal handler, either the signal handler must install a proper FP >> env before calling longjmp or a proper FP env must be installed after >> the target setjmp call. Otherwise the FP env is unspecified. > > There are two reasonable ways to handle the floating point control > word. FreeBSD treats it as a register, resetting it on signal > handler entry and restoring it on longjmp or signal handler > return. Virtually every other OS (e.g., Linux, NetBSD, Solaris) > treats it as global state, leaving it up to the signal handler to > preserve it as needed. I checked what Linux-2.6.10 actually does. It does nothing as drastic as passing the interrupted FP environment to signal handlers. It just provides a clean FP env for signal handlers, like FreeBSD-5+ signal handlers do, except more cleanly for FP SIGFPE on x86: FreeBSD-[1-4] SIGFPE handling: save exception flags in memory clear exception flags in i387 call handler with this unclean state FreeBSD-[5-10] SIGFPE handling: convert exception flags to a signal code. Lose details in translation. Forget to merge the SSE flags when doing this. So the signal code cannot be trusted (AFAIR, it also doesn't distinguish between an i387 and an SSE exception. Better yet, npxtrap() doesn't distinguish, so it blindly translates for i387 when the exception was for SSE). clear exception flags in i387. Do this even if the exception was for SSE. Forget to do anything with the SSE flags. call handler with a different, completely clean state Linux-2.6.10 SIGFPE handling: (not sure if it has a signal code) don't clear exception flags in i387 call handler with a different, completely clean state The result is that if signal handler just returns, then: - under FreeBSD, iff the SIGFPE was for the i387, then the fault doesn't repeat - under Linux and under FreeBSD iff the SIGFPE was for SSE, then the fault does repeat - under FreeBSD, for both cases the i387 exception flags are broken (lost), but the SSE exception flags work (are preserved). Of course, returning from a SIGFPE handler gives undefined behaviour. This (not just different behaviour) the causes the following problems: - if the signal handler just returns, nothing good happens for the SIGFPE case (except for integer SIGFPE) - if the signal handler wants to fix up the FP env before returning, then it has very large portability problems even for fixing the exception flags in the above 3 classes of behaviour. But a fixup is usually essential if the handler is for FP SIGFPE. - if the signal handler longjmp()s, then it gets the following behaviour: - under FreeBSD, it gets the control word restored to that at the time of the setjmp() (modulo some bugs in some versions for SSE); similarly for the exception flags except the bugs are now features (it's best not to touch the exception flags) - under Linux-2.6.10, it gets a clean control and status word from the signal handler's FP env (unless the signal handler has uncleaned them). > Both approaches have their merits. FreeBSD's approach provides > better semantics. Library functions, round-to-integer on most > CPUs, and other things may temporarily change the rounding mode. > Most programmers don't think about that, but on Linux, if an async > signal were delivered at the wrong time and did a longjmp, the > rounding mode would be in an unexpected state. Most programmers It's state will be clean, i.e., FE_TONEAREST. This is OK for fixing up temporary changes to it, but bad if it was set to another mode using fesetround(). The setting may have been either before or after the setjmp(). I think C99 wants it to be the setting of the most recent fesetround(), but FreeBSD restores the setting to the most recent one before the setjmp(). > don't think about that; even a program that never changes the > rounding mode explicitly could wind up in round-to-zero mode after > jumping out of a signal handler. That would only happen in Linux after an explicit fesetround() to FE_TOWARDZERO in the signal handler. > The main advantage of the alternative approach is that it avoids > the overhead of saving and restoring the floating point control > word. Many programs don't even use floating point, and the > efficiency is important for programs that use longjmp frequently, > e.g., to implement exceptions. > > Either way, note the importance of being consistent: If the FP env > gets clobbered automatically on entry to a signal handler, then > longjmp must restore what the application had before. Personally, > I'm not opposed to changing both signal handlers and longjmp to > match what the rest of the world does, but this isn't just about > the mxcsr, as suggested previously. The rest of the world is already perfectly inconsistent, since it clobbers the env for signal handlers, and I don't see it changing now that C11 encourages the reverse. I think the overhead is unimportant. fnstcw in setjmp() takes 4 cycles (latency) on AthlonXP. fldcw in longjmp() takes 11. Hopefully this is in parallel so it takes less than 1 cycle each (throughput). (But I never got anyway trying to hide the latency of fxsave/fxrstor.) Some other arches have hundreds if not thousands of general registers to save where i386 has only 11, so a few more cycles for FP would be even more in the noise. Bruce