From owner-freebsd-amd64@FreeBSD.ORG Wed Jul 18 03:52:32 2012 Return-Path: Delivered-To: freebsd-amd64@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DAA741065670 for ; Wed, 18 Jul 2012 03:52:32 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 2D1FB8FC0C for ; Wed, 18 Jul 2012 03:52:30 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q6I3qXId066519; Wed, 18 Jul 2012 06:52:33 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q6I3qKeq027862; Wed, 18 Jul 2012 06:52:20 +0300 (EEST) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q6I3qK88027861; Wed, 18 Jul 2012 06:52:20 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 18 Jul 2012 06:52:20 +0300 From: Konstantin Belousov To: Bruce Evans Message-ID: <20120718035220.GO2676@deviant.kiev.zoral.com.ua> References: <201207172150.q6HLoFvB096742@freefall.freebsd.org> <20120718103210.E834@besplex.bde.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="Thv7PGoFpDPJ7Oar" Content-Disposition: inline In-Reply-To: <20120718103210.E834@besplex.bde.org> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: Mark Linimon , freebsd-amd64@freebsd.org Subject: Re: amd64/169927: siginfo, si_code for fpe errors when error occurs using the SSE math processor X-BeenThere: freebsd-amd64@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the AMD64 platform List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Jul 2012 03:52:33 -0000 --Thv7PGoFpDPJ7Oar Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Jul 18, 2012 at 10:32:23AM +1000, Bruce Evans wrote: > On Tue, 17 Jul 2012, Mark Linimon wrote: >=20 > >The following reply was made to PR amd64/169927; it has been noted by=20 > >GNATS. >=20 > I'm replying to Mark's forwarding of this, but gnats isn't in the Cc list > so it will not be noted by gnats, again. >=20 > >On Wed, Jul 18, 2012 at 02:03:58AM +1000, Bruce Evans wrote: > >> Apart from doing the bogus fnclex for T_XMMFLT and the delayed effect = of > >> i387 status bits, merging or not merging the statuses makes little > >> difference, since if a status bit is set and is not masked according > >> to its control word, then it will generate a trap soon if it didn't > >> genearate the current one. > > > >The trap number is available for SA_SIGINFO type of handlers with > >si_trapno member of siginfo_t. I think this is final argument to have > >separate fputrap_{x87,sse} functions. >=20 > OK. Also, you can tell instruction that faulted, provided there are > no spurious faults. I think spurious faults can only occur for IRQ13 > exception handling which was never supported for amd64 and is no longer > supported for i386. My old test program that is broken by not clobbering > the status was mainly for finding and fixing such spurious faults. > I never owned an i387 or i486SX and had to break i486 exception handling > to test IRQ13. 486SX+487 never used irq13 as well, since 487 was the 'full' DX package, which turned off SX socket. =2E.. > >+ > >+ /* > >+ * Coomparing with the x87 #MF handler, we do not clear > >+ * exceptions from the mxcsr. > >+ */ >=20 > Spelling error. >=20 > This is already covered in too much detail in the big comment before > the functions. Don't repeat it here. Removed. >=20 >=20 > >+ if (PCPU_GET(fpcurthread) !=3D curthread) > >+ mxcsr =3D curthread->td_pcb->pcb_save->sv_env.en_mxcsr; > >+ else > >+ stmxcsr(&mxcsr); > >+ > >+ critical_exit(); > >+ >=20 > Style bug (extra blank line). The others are necessary, but become > unnecessary after removing the comment. >=20 > >+ return (fpetable[(mxcsr & (mxcsr >> 16)) & 0x3f]); > >+} > >+ > > /* > > * Implement device not available (DNA) exception > > * >=20 > Cleaning this up gives code that is small enogh for me: >=20 > int > fputrap_sse(void) > { > u_int mxcsr; >=20 > critical_enter(); > if (PCPU_GET(fpcurthread) !=3D curthread) > mxcsr =3D curthread->td_pcb->pcb_save->sv_env.en_mxcsr; > else > stmxcsr(&mxcsr); > critical_exit(); > return (fpetable[(mxcsr & (mxcsr >> 16)) & 0x3f]); > } Ok. >=20 > I forgot to replace your patch by the newer one before replying. >=20 > Just noticed another style/efficiency bug: this should use curpcb, > not curthread->td_pcb. I don't like the curpcb global, but as > long as it exists, it should be used. Every time I looked at > removing this global, it seemed better to keep it. C code can > easily use curthread->td_pcb instead of curpcb, and this is faster > if curthread is cached. But CURPCB is easier to use in asm. I changed to use curpcb (its use is indeed mainly for asm context switch code). I do want to use pcb_save though, see below. >=20 > Similarly in fputrap_x87(). >=20 > Not similarly for i386. i386 still uses the GET_FPU_CW/SW() macros, > and these take a thread arg (which is named badly as `thread' instead > of the normal td) and can't use curpcb. The cleanup here goes with > your removal of these macros. >=20 > I now quote your newer patch for the fputrap_x87() cleanup: >=20 > > @@ -531,8 +534,9 @@ static char fpetable[128] =3D { > > * solution for signals other than SIGFPE. > > */ > > int > > -fputrap() > > +fputrap_x87(void) > > { > > + struct savefpu *pcb_save; >=20 > No need for this before or after. Let the compiler cache it. Just use > curpcb now. The curpcb access shall be spelled as PCPU_GET(curpcb). Please note that compiler is not allowed to cache the accesses, since there is __asm __volatile expression to indirect through %gs-prefixed read. >=20 > > u_short control, status; > > > > critical_enter(); > > @@ -543,19 +547,40 @@ fputrap() > > * wherever they are. > > */ > > if (PCPU_GET(fpcurthread) !=3D curthread) { > > - control =3D GET_FPU_CW(curthread); > > - status =3D GET_FPU_SW(curthread); > > + pcb_save =3D curthread->td_pcb->pcb_save; > > + control =3D pcb_save->sv_env.en_cw; > > + status =3D pcb_save->sv_env.en_sw; >=20 > Change to: >=20 > control =3D curpcb->pcb_save->sv_env.en_cw; > status =3D curpcb->pcb_save->sv_env.en_sw; >=20 > ... >=20 > i386 used curpcb near here in FreeBSD-3, but was changed to use a macro > in FreeBSD-4. However, the macro still used curpcb, since it was a > special one that was only used in npx_intr() and thus didn't need to > start with a general td arg. diff --git a/sys/amd64/amd64/fpu.c b/sys/amd64/amd64/fpu.c index a7812b7..f88aba7 100644 --- a/sys/amd64/amd64/fpu.c +++ b/sys/amd64/amd64/fpu.c @@ -73,6 +73,7 @@ __FBSDID("$FreeBSD$"); #define fxrstor(addr) __asm __volatile("fxrstor %0" : : "m" (*(addr))) #define fxsave(addr) __asm __volatile("fxsave %0" : "=3Dm" (*(addr))) #define ldmxcsr(csr) __asm __volatile("ldmxcsr %0" : : "m" (csr)) +#define stmxcsr(addr) __asm __volatile("stmxcsr %0" : : "m" (*(addr))) =20 static __inline void xrstor(char *addr, uint64_t mask) @@ -105,6 +106,7 @@ void fnstsw(caddr_t addr); void fxsave(caddr_t addr); void fxrstor(caddr_t addr); void ldmxcsr(u_int csr); +void stmxcsr(u_int csr); void xrstor(char *addr, uint64_t mask); void xsave(char *addr, uint64_t mask); =20 @@ -113,9 +115,6 @@ void xsave(char *addr, uint64_t mask); #define start_emulating() load_cr0(rcr0() | CR0_TS) #define stop_emulating() clts() =20 -#define GET_FPU_CW(thread) ((thread)->td_pcb->pcb_save->sv_env.en_cw) -#define GET_FPU_SW(thread) ((thread)->td_pcb->pcb_save->sv_env.en_sw) - CTASSERT(sizeof(struct savefpu) =3D=3D 512); CTASSERT(sizeof(struct xstate_hdr) =3D=3D 64); CTASSERT(sizeof(struct savefpu_ymm) =3D=3D 832); @@ -514,11 +513,15 @@ static char fpetable[128] =3D { }; =20 /* - * Preserve the FP status word, clear FP exceptions, then generate a SIGFP= E. + * Preserve the FP status word, clear FP exceptions for x87, then + * generate a SIGFPE. + * + * Clearing exceptions was necessary mainly to avoid IRQ13 bugs and is + * engraved in our i386 ABI. We now depend on longjmp() restoring a + * usable state. Restoring the state or examining it might fail if we + * didn't clear exceptions. * - * Clearing exceptions is necessary mainly to avoid IRQ13 bugs. We now - * depend on longjmp() restoring a usable state. Restoring the state - * or examining it might fail if we didn't clear exceptions. + * For SSE exceptions, the exceptions are not cleared. * * The error code chosen will be one of the FPE_... macros. It will be * sent as the second argument to old BSD-style signal handlers and as @@ -531,8 +534,9 @@ static char fpetable[128] =3D { * solution for signals other than SIGFPE. */ int -fputrap() +fputrap_x87(void) { + struct savefpu *pcb_save; u_short control, status; =20 critical_enter(); @@ -543,19 +547,33 @@ fputrap() * wherever they are. */ if (PCPU_GET(fpcurthread) !=3D curthread) { - control =3D GET_FPU_CW(curthread); - status =3D GET_FPU_SW(curthread); + pcb_save =3D PCPU_GET(curpcb)->pcb_save; + control =3D pcb_save->sv_env.en_cw; + status =3D pcb_save->sv_env.en_sw; } else { fnstcw(&control); fnstsw(&status); + fnclex(); } =20 - if (PCPU_GET(fpcurthread) =3D=3D curthread) - fnclex(); critical_exit(); return (fpetable[status & ((~control & 0x3f) | 0x40)]); } =20 +int +fputrap_sse(void) +{ + u_int mxcsr; + + critical_enter(); + if (PCPU_GET(fpcurthread) !=3D curthread) + mxcsr =3D PCPU_GET(curpcb)->pcb_save->sv_env.en_mxcsr; + else + stmxcsr(&mxcsr); + critical_exit(); + return (fpetable[(mxcsr & (~mxcsr >> 7)) & 0x3f]); +} + /* * Implement device not available (DNA) exception * diff --git a/sys/amd64/amd64/trap.c b/sys/amd64/amd64/trap.c index 75e15e0..57d1cc2 100644 --- a/sys/amd64/amd64/trap.c +++ b/sys/amd64/amd64/trap.c @@ -328,7 +328,7 @@ trap(struct trapframe *frame) break; =20 case T_ARITHTRAP: /* arithmetic trap */ - ucode =3D fputrap(); + ucode =3D fputrap_x87(); if (ucode =3D=3D -1) goto userout; i =3D SIGFPE; @@ -442,7 +442,9 @@ trap(struct trapframe *frame) break; =20 case T_XMMFLT: /* SIMD floating-point exception */ - ucode =3D 0; /* XXX */ + ucode =3D fputrap_sse(); + if (ucode =3D=3D -1) + goto userout; i =3D SIGFPE; break; } diff --git a/sys/amd64/include/fpu.h b/sys/amd64/include/fpu.h index 98a016b..7d0f0ea 100644 --- a/sys/amd64/include/fpu.h +++ b/sys/amd64/include/fpu.h @@ -62,7 +62,8 @@ int fpusetregs(struct thread *td, struct savefpu *addr, char *xfpustate, size_t xfpustate_size); int fpusetxstate(struct thread *td, char *xfpustate, size_t xfpustate_size); -int fputrap(void); +int fputrap_sse(void); +int fputrap_x87(void); void fpuuserinited(struct thread *td); struct fpu_kern_ctx *fpu_kern_alloc_ctx(u_int flags); void fpu_kern_free_ctx(struct fpu_kern_ctx *ctx); --Thv7PGoFpDPJ7Oar Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEYEARECAAYFAlAGMvQACgkQC3+MBN1Mb4jANACfabMhOJxDnqO0u9dDdYGB13cS FwAAoLLoJI8BmEAqeFdED3836xOMSXef =NSzJ -----END PGP SIGNATURE----- --Thv7PGoFpDPJ7Oar--