From owner-freebsd-amd64@FreeBSD.ORG Tue Jul 17 15:14:33 2012 Return-Path: Delivered-To: freebsd-amd64@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 37F171065672 for ; Tue, 17 Jul 2012 15:14:33 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail05.syd.optusnet.com.au (mail05.syd.optusnet.com.au [211.29.132.186]) by mx1.freebsd.org (Postfix) with ESMTP id AE3AD8FC15 for ; Tue, 17 Jul 2012 15:14:32 +0000 (UTC) Received: from c122-106-171-246.carlnfd1.nsw.optusnet.com.au (c122-106-171-246.carlnfd1.nsw.optusnet.com.au [122.106.171.246]) by mail05.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q6HFETGI027987 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 18 Jul 2012 01:14:31 +1000 Date: Wed, 18 Jul 2012 01:14:29 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Konstantin Belousov In-Reply-To: <201207171350.q6HDoAJS033797@freefall.freebsd.org> Message-ID: <20120717235622.C7417@besplex.bde.org> References: <201207171350.q6HDoAJS033797@freefall.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-amd64@freebsd.org Subject: Re: amd64/169927: siginfo, si_code for fpe errors when error occurs using the SSE math processor X-BeenThere: freebsd-amd64@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the AMD64 platform List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 Jul 2012 15:14:33 -0000 On Tue, 17 Jul 2012, Konstantin Belousov wrote: > It was on my TODO list for long time. Lets handle amd64 first, both for > native and compat32. > > I think the following should be somewhat better variant. I do leave > the fnclex there for x87. Too large for me. Although the exceptions are different, the exception handlers are essentially the same. Remove the fnclex too. Then the exception handlers will be even more similar. > diff --git a/sys/amd64/amd64/fpu.c b/sys/amd64/amd64/fpu.c > index a7812b7..34cf8d4 100644 > --- a/sys/amd64/amd64/fpu.c > +++ b/sys/amd64/amd64/fpu.c > ... > @@ -113,9 +115,6 @@ void xsave(char *addr, uint64_t mask); > #define start_emulating() load_cr0(rcr0() | CR0_TS) > #define stop_emulating() clts() > =20 > -#define GET_FPU_CW(thread) ((thread)->td_pcb->pcb_save->sv_env.en_cw) > -#define GET_FPU_SW(thread) ((thread)->td_pcb->pcb_save->sv_env.en_sw) > - > CTASSERT(sizeof(struct savefpu) =3D=3D 512); > CTASSERT(sizeof(struct xstate_hdr) =3D=3D 64); > CTASSERT(sizeof(struct savefpu_ymm) =3D=3D 832); Good to remove this. > @@ -514,11 +513,15 @@ static char fpetable[128] =3D { > }; > =20 > /* > - * Preserve the FP status word, clear FP exceptions, then generate a SIGFP= > E. > + * Preserve the FP status word, clear FP exceptions for x87, then > + * generate a SIGFPE. > + * > + * Clearing exceptions was necessary mainly to avoid IRQ13 bugs and is > + * engraved in our i386 ABI. We now depend on longjmp() restoring a > + * usable state. Restoring the state or examining it might fail if we > + * didn't clear exceptions. Hrmph. Someone already broken the ABI and enabled the trap in longjmp() by removing the fninit in longjmp(). Examining the state won't trap, since it is a no-wait instruction (fnstsw or fnstenv). Clearing the state in the kernel doesn't even fix longjmp(), since longjmp() will still trap if it is called before the kernel trap clobbers the status: fldz fld1 fdiv %st,%st(1) # exception; trap pending call longjmp # traps on the non-control fldcw I think this has to be written in asm to cause a problem. Similar C code: foo = one / zero; longjmp(); must translate to something like: fldz fld1 fdiv %st,%st(1) # exception; trap pending fstpl foo # ABI and/or C abstract machine requires # this to be put this away before function # call # trap now; bogus fnclex in kernel call longjmp # no trap later The ABI problem only affects FreeBSD-4-compat signal handlers. FreeBSD-5+ signal handlers get a clean FP state, so they won't trap in longjmp(). The fnclex was a hack to let signal handlers run with n unclean FP state without faulting again. The undefined behaviour that results when a SIGFPE handler returns is not part of an ABI. It seems to be necessary to use fnstenv; copy cw; fldenv to fix longjmp(). I used fninit because it is faster and the ABI and or the C abstract machine used to not support the exception status. > * > - * Clearing exceptions is necessary mainly to avoid IRQ13 bugs. We now > - * depend on longjmp() restoring a usable state. Restoring the state > - * or examining it might fail if we didn't clear exceptions. > + * For SSE exceptions, the exceptions are not cleared. > * > * The error code chosen will be one of the FPE_... macros. It will be > * sent as the second argument to old BSD-style signal handlers and as > @@ -531,8 +534,9 @@ static char fpetable[128] =3D { > * solution for signals other than SIGFPE. > */ > int > -fputrap() > +fputrap_x87(void) > { > + struct savefpu *pcb_save; > u_short control, status; > =20 > critical_enter(); > @@ -543,19 +547,43 @@ fputrap() > * wherever they are. > */ > if (PCPU_GET(fpcurthread) !=3D curthread) { > - control =3D GET_FPU_CW(curthread); > - status =3D GET_FPU_SW(curthread); > + pcb_save =3D curthread->td_pcb->pcb_save; > + control =3D pcb_save->sv_env.en_cw; > + status =3D pcb_save->sv_env.en_sw; > } else { > fnstcw(&control); > fnstsw(&status); > + fnclex(); > } > =20 > - if (PCPU_GET(fpcurthread) =3D=3D curthread) > - fnclex(); > critical_exit(); > return (fpetable[status & ((~control & 0x3f) | 0x40)]); > } Hmm, the separate trap handlers are more needed, yet more broken, than I first thought. I first thouught that it was necessary to merge the statuses like the patch in the PR does. After all, fenv(3) can do no better. The control words had better be the same (after masking with 0x3f), or the signal code will be nonsense. fenv(3) depends on this too. The separate trap handlers allow use to use the control word that matches the status word and the signal code to be only for the hardware that generated the trap. However, this is worse than useless, since the signal code doesn't encode which hardware generated the trap. > =20 > +int > +fputrap_sse(void) The 'fpu' name is more bogus than usual. This is not an fpu trap, but an sse trap. There is a related problem with the trap name. It is T_XMMFLT. But this trap can presumably also be caused by YMM and newer extensions (AVX?). > +{ > + u_int mxcsr; > + u_short control, status; > + > + critical_enter(); > + > + /* > + * Coomparing with the x87 #MF handler, we do not clear > + * exceptions from the mxcsr. > + */ > + if (PCPU_GET(fpcurthread) !=3D curthread) > + mxcsr =3D curthread->td_pcb->pcb_save->sv_env.en_mxcsr; > + else > + stmxcsr(&mxcsr); > + > + critical_exit(); > + > + status =3D mxcsr & 0x3f; > + control =3D (mxcsr >> 16) & 0x3f; > + return (fpetable[status & (~control | 0x40)]); The 0x40 bit doesn't exist in the mxcsr status and ORing it in here has no effect. Replace the last 3 lines by: return (fpetable[(status & (control >> 16)) & 0x3f]; > +} > + > /* > * Implement device not available (DNA) exception > * > diff --git a/sys/amd64/amd64/trap.c b/sys/amd64/amd64/trap.c > index 75e15e0..57d1cc2 100644 > --- a/sys/amd64/amd64/trap.c > +++ b/sys/amd64/amd64/trap.c > @@ -328,7 +328,7 @@ trap(struct trapframe *frame) > break; > =20 > case T_ARITHTRAP: /* arithmetic trap */ > - ucode =3D fputrap(); > + ucode =3D fputrap_x87(); > if (ucode =3D=3D -1) > goto userout; > i =3D SIGFPE; > @@ -442,7 +442,9 @@ trap(struct trapframe *frame) > break; > =20 > case T_XMMFLT: /* SIMD floating-point exception */ > - ucode =3D 0; /* XXX */ > + ucode =3D fputrap_sse(); > + if (ucode =3D=3D -1) > + goto userout; The T_FOO code could be encoded here, but I think there is no space to spare, and any extension would further complicate the decoding. Not many SIGFPE handlers understand the current encoding. > i =3D SIGFPE; > break; > } So I still want a single kernel exception handle that merges the statuses. Bruce