From owner-freebsd-amd64@FreeBSD.ORG Wed Jul 18 00:32:33 2012 Return-Path: Delivered-To: freebsd-amd64@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C6D671065675 for ; Wed, 18 Jul 2012 00:32:33 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail07.syd.optusnet.com.au (mail07.syd.optusnet.com.au [211.29.132.188]) by mx1.freebsd.org (Postfix) with ESMTP id 47DBC8FC0A for ; Wed, 18 Jul 2012 00:32:33 +0000 (UTC) Received: from c122-106-171-246.carlnfd1.nsw.optusnet.com.au (c122-106-171-246.carlnfd1.nsw.optusnet.com.au [122.106.171.246]) by mail07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q6I0WNrK018861 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 18 Jul 2012 10:32:30 +1000 Date: Wed, 18 Jul 2012 10:32:23 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Mark Linimon In-Reply-To: <201207172150.q6HLoFvB096742@freefall.freebsd.org> Message-ID: <20120718103210.E834@besplex.bde.org> References: <201207172150.q6HLoFvB096742@freefall.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-amd64@freebsd.org Subject: Re: amd64/169927: siginfo, si_code for fpe errors when error occurs using the SSE math processor X-BeenThere: freebsd-amd64@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the AMD64 platform List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Jul 2012 00:32:33 -0000 On Tue, 17 Jul 2012, Mark Linimon wrote: > The following reply was made to PR amd64/169927; it has been noted by GNATS. I'm replying to Mark's forwarding of this, but gnats isn't in the Cc list so it will not be noted by gnats, again. > On Wed, Jul 18, 2012 at 02:03:58AM +1000, Bruce Evans wrote: > > Apart from doing the bogus fnclex for T_XMMFLT and the delayed effect of > > i387 status bits, merging or not merging the statuses makes little > > difference, since if a status bit is set and is not masked according > > to its control word, then it will generate a trap soon if it didn't > > genearate the current one. > > The trap number is available for SA_SIGINFO type of handlers with > si_trapno member of siginfo_t. I think this is final argument to have > separate fputrap_{x87,sse} functions. OK. Also, you can tell instruction that faulted, provided there are no spurious faults. I think spurious faults can only occur for IRQ13 exception handling which was never supported for amd64 and is no longer supported for i386. My old test program that is broken by not clobbering the status was mainly for finding and fixing such spurious faults. I never owned an i387 or i486SX and had to break i486 exception handling to test IRQ13. > For amd64, SSE hardware is FPU, so I do not see much wrong with the name. No, FPU is a only part of SSE hardware (or separate from it, with separate interfaces from it to x87 and xmm registers, and these registers possibly layered under futher extensions. T_XMMFLT The names are sort of backwards: - T_XMMFLT: /* SIMD floating-point exception */ If the name is correct, then this can occur for non-FPU things (most xmm instructions are not for the FPU). But more likely the comment is correct, and this can only occur for FPU things from SIMD instructions. - fputrap_x87: This is not an x87 trap under fpu, but an x87-fpu trap under npx. - fputrap_sse: This is not an sse trap under fpu, but one of - an xmm-possibly-non-fpu trap under npx (if T_XMMFLT is correct) - a simd-fpu trap under npx (if the comment is correct) I checked a few xmm instructions in old amd64 manuals and could only find T_XMMFLT for FPU ones. It's spelled #XF or OSXMMEXCPT in the manuals -- CPU manufacturers also have problems naming things according to their layering when the layering is subject to extension. Other relevant exceptions are all common to x87/simd-fpu or cpu. For the maxpd instruction they are: - Invalid opcode, #UD: this mainly happens when simd is not supported or is turned off and an attempt was made to execute an simd instruction. It also happens when there is an #XF but simd is turned off (I don't see how #XF can happen while simd is turened off). - Device not available, #NM: we use a common handler - Stack, #AS - General protection, #GO - Page fault, #FP - SIMD Floating-Point Exception, #XF: I copied these descriptions verbatim. Another CPU manufacturer bug here is that this description is capitalized but none of the others is.) - IE and DE sub-exceptions (no others can occur for MAXPD). > I changed fputrap_sse() according to your suggestion. It's still too large for me :-). > diff --git a/sys/amd64/amd64/fpu.c b/sys/amd64/amd64/fpu.c > index a7812b7..356b3ac 100644 > --- a/sys/amd64/amd64/fpu.c > +++ b/sys/amd64/amd64/fpu.c > @@ -514,11 +513,15 @@ static char fpetable[128] = { > }; > > /* > - * Preserve the FP status word, clear FP exceptions, then generate a SIGFPE. > + * Preserve the FP status word, clear FP exceptions for x87, then > + * generate a SIGFPE. This is out of date about preserving the status word. > + * > + * Clearing exceptions was necessary mainly to avoid IRQ13 bugs and is > + * engraved in our i386 ABI. We now depend on longjmp() restoring a > + * usable state. Restoring the state or examining it might fail if we > + * didn't clear exceptions. > * > - * Clearing exceptions is necessary mainly to avoid IRQ13 bugs. We now > - * depend on longjmp() restoring a usable state. Restoring the state > - * or examining it might fail if we didn't clear exceptions. This should be deleted in a separate commit. I think the comment is more out of date than before: before: - this is amd64, and never supported IRQ13 - this is amd64, and longjmp() always restored a usable state - we don't depend on longjmp() restoring a usable state now, on either amd64 or i386, except for BSD-43 and FreeBSD-4 signal handlers. amd64 never supported either of these, except in the i386 compat layer. Your changes are too large and complete :-). But they are not large and complete enough to make the fnclex clobbering depend on the signal handler. After: - this is amd64, so the i386 ABI is irrelevant except in the compat layer. However, since amd64 has always done the fnclex clobbering, it is really part of the x87 ABI. > + * For SSE exceptions, the exceptions are not cleared. > * > * The error code chosen will be one of the FPE_... macros. It will be > * sent as the second argument to old BSD-style signal handlers and as > @@ -531,8 +534,9 @@ static char fpetable[128] = { > * solution for signals other than SIGFPE. > */ > int > -fputrap() > +fputrap_x87(void) > ... Good cleanups in this function. > +int > +fputrap_sse(void) > +{ > + u_int mxcsr; > + > + critical_enter(); > + > + /* > + * Coomparing with the x87 #MF handler, we do not clear > + * exceptions from the mxcsr. > + */ Spelling error. This is already covered in too much detail in the big comment before the functions. Don't repeat it here. > + if (PCPU_GET(fpcurthread) != curthread) > + mxcsr = curthread->td_pcb->pcb_save->sv_env.en_mxcsr; > + else > + stmxcsr(&mxcsr); > + > + critical_exit(); > + Style bug (extra blank line). The others are necessary, but become unnecessary after removing the comment. > + return (fpetable[(mxcsr & (mxcsr >> 16)) & 0x3f]); > +} > + > /* > * Implement device not available (DNA) exception > * Cleaning this up gives code that is small enogh for me: int fputrap_sse(void) { u_int mxcsr; critical_enter(); if (PCPU_GET(fpcurthread) != curthread) mxcsr = curthread->td_pcb->pcb_save->sv_env.en_mxcsr; else stmxcsr(&mxcsr); critical_exit(); return (fpetable[(mxcsr & (mxcsr >> 16)) & 0x3f]); } I forgot to replace your patch by the newer one before replying. Just noticed another style/efficiency bug: this should use curpcb, not curthread->td_pcb. I don't like the curpcb global, but as long as it exists, it should be used. Every time I looked at removing this global, it seemed better to keep it. C code can easily use curthread->td_pcb instead of curpcb, and this is faster if curthread is cached. But CURPCB is easier to use in asm. Similarly in fputrap_x87(). Not similarly for i386. i386 still uses the GET_FPU_CW/SW() macros, and these take a thread arg (which is named badly as `thread' instead of the normal td) and can't use curpcb. The cleanup here goes with your removal of these macros. I now quote your newer patch for the fputrap_x87() cleanup: > @@ -531,8 +534,9 @@ static char fpetable[128] = { > * solution for signals other than SIGFPE. > */ > int > -fputrap() > +fputrap_x87(void) > { > + struct savefpu *pcb_save; No need for this before or after. Let the compiler cache it. Just use curpcb now. > u_short control, status; > > critical_enter(); > @@ -543,19 +547,40 @@ fputrap() > * wherever they are. > */ > if (PCPU_GET(fpcurthread) != curthread) { > - control = GET_FPU_CW(curthread); > - status = GET_FPU_SW(curthread); > + pcb_save = curthread->td_pcb->pcb_save; > + control = pcb_save->sv_env.en_cw; > + status = pcb_save->sv_env.en_sw; Change to: control = curpcb->pcb_save->sv_env.en_cw; status = curpcb->pcb_save->sv_env.en_sw; ... i386 used curpcb near here in FreeBSD-3, but was changed to use a macro in FreeBSD-4. However, the macro still used curpcb, since it was a special one that was only used in npx_intr() and thus didn't need to start with a general td arg. Bruce