Date: Sun, 12 Oct 1997 12:23:13 +1000 (EST) From: Andrew Reilly <reilly@zeta.org.au> To: tlambert@primenet.com Cc: reilly@zeta.org.au, freebsd-hackers@FreeBSD.ORG Subject: Re: Floating point exceptions Message-ID: <199710120223.MAA00970@gurney.reilly.home> In-Reply-To: <199710110715.AAA17411@usr04.primenet.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 11 Oct, Terry Lambert wrote: >> > Fix: Correct the code to not generate exceptions >> >> This is just plain rude. There are a bunch of exceptions that are Sorry about that. I was out of line. In my defence, it was over 32 degrees C in my office yesterday afternoon. Scorcher. > Actually, I find this strange. It kind of assumes that all hardware has > equivalent precision. I can guarantee you that code that works fine > on UniCOS will have problems on an Intel-based PC if it expects 128 bit > precision. 8-(. Well, in this case the code was written with the deliberate understanding that the precision would vary between implementations. Continual scaling ensures that we are getting the maximum dynamic range where it counts, and differences in round-off characteristics results in recognition accuracy variations of a small fraction of a percent across the architectures that it has been run on so far. I take your point, though. It would be easy for this to be a real error, and it would be good to know about it. In this case, the correct fix was to ignore the exception, because that was the original intent of the maths. What caught me off guard was that FreeBSD was the first of about six platforms that signalled this particular exception. The DSP platforms saturate or underflow to zero, and the other Unix platforms must have had this exception masked by default. > To me, this was not a rude response. An exception where an exception > was not an intended result of the calculation is an exception that is > not masked, to my mind, and a useful indicator that all is not right with > the code. It was certainly not intended as a rude remark. I think that it is my problem that I take exception to some of the IEEE floating point semantics: perhaps it is a good thing to try to make a countable, auto-scaling (within a limited range) numeric representation behave more like the set of reals in some cases. I prefer to think of floats as scaled integers, and I get caught out with some of the modern twists. > I think it's better to get an error than to get non-obviously erroneous > results (the alternative). But I am a physics geek at heart, so maybe > I am biased toward useful answers and ugly exceptions vs. useless answers > and no exceptions... depends on your idea of "useless", I suppose... Most of my audio DSP work takes place on fixed-point processors, where the notion of "full scale" and the associated noise floor are ever present. I expect that if I multiply two small, non-zero numbers together the result will sometimes be zero. To me, this is not a useless, or even a wrong result, in the context of a known dynamic range. >> > fpsetmask( 0); >> >> Is there >> a pointer to fpsetmask in any other manual page? > > To be honest, I knew the general name of the function off the top of > my head (I do a lot of event simulation), and I used man -k setmask to > find the specific name. But it is referenced in floatingpoint.h by > prototype... and that header is referenced by most of the FP functions. Given that FreeBSD's behaviour is different from other systems in this regard, perhaps this warrants a pointer in the handbook or FAQ? >> > Worst fix: signal( SIGFPE, SIG_IGN); >> >> Very bad fix, because when I tried it, it just didn't work. I assume >> that the trap handler does not correctly restore the floating point >> state. The program ran to completion, but IEEE error values >> of some sort propagated from the exception point and ruined the results. > > The point of the handler is to localise the errors. Mask those which are > intentional, and fix those that aren't on a case-by-case basis. I had a > number of precision fixes to 21 year old FORTRAN code that resulted from > getting exceptions thrown like this. This is a good strategy. I just didn't know the mechanism for the masking when the error occurred. I do think that it is unfortunate that ignoring the SIGFPE, as described above, does /not/ have the same effect as masking the exception. > An interesting application; you'll note it falls into my "signal processing" > bucket which I designated as a bad thing to need to fix because of the > need for repeatability... I'm not sure what you mean by this comment. Certainly there are audio DSP applications where you would hope for complete repeatability, but speech recognition with HMMs is a stochastic process, and rounding errors in the calculations are not significantly different from noise in the input signal. How about this for an example of non-repeatability: One of the first ports of this code was to a DSP card that used AT&T (now Lucent) DSP32C processors. The recogniser ran as a background (non/soft real time) process, while the signal was buffered in real time, in response to the frame interrupt. The DSP32C has 40-bit floating point accumulators (8 guard bits on the mantissa) and 32-bit memory, and no mechanism to save or restore those guard bits in the interrupt service routine... Talk about noise injection. We couldn't even get the same answer on consecutive runs on test files! Never the less, this did not affect the measured performance of the recogniser more than few tenths of a percent. > Look to the calculation immediately before the compare. That would be it. The previous instruction stored s as a 32-bit float, which would generate an underflow exception if not masked. I guess that if the '87 did not have extended precision floating point registers, then the exception would have occurred some time earlier, when the over precision result was generated. [description of lazy reporting of FP exceptions, and implications for SMP] >> > Continuing from SIGFPE handlers is much harder than masking FP exceptions, >> > at least on i386's. >> >> Yes. I tried doing a signal(SIGFPE,SIG_IGN) at the top of >> main, but that just made it produce totally incorrect results. > > The FPU registers are not saved (or restored) by signal handlers, which > are not expected to execute FPU instructions. If you will look at the > man page, there is actually a *lot* of calls which are not "safe" to use > from a signal handler, according to POSIX. Which man page? I just looked at kill(2) signal(3) and sigaction(2), and did not see a reference to this, although I do not doubt that such restrictions exist. Where in SIG_IGN are floating point instructions used? If there are none, why doesn't it work (i.e., why is the floating point state changed)? On the subject of saving registers on context switches, are there really so many Unix applications that do no floating point at all that it is worth differentiating them? Is it a characteristic of the Intel processors that you can set them to trap on the use of _any_ floating point instruction? -- Andrew "The steady state of disks is full." -- Ken Thompson
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199710120223.MAA00970>