Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 18 Jul 2012 06:52:20 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        Mark Linimon <linimon@lonesome.com>, freebsd-amd64@freebsd.org
Subject:   Re: amd64/169927: siginfo, si_code for fpe errors when error occurs using the SSE math processor
Message-ID:  <20120718035220.GO2676@deviant.kiev.zoral.com.ua>
In-Reply-To: <20120718103210.E834@besplex.bde.org>
References:  <201207172150.q6HLoFvB096742@freefall.freebsd.org> <20120718103210.E834@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--Thv7PGoFpDPJ7Oar
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Jul 18, 2012 at 10:32:23AM +1000, Bruce Evans wrote:
> On Tue, 17 Jul 2012, Mark Linimon wrote:
>=20
> >The following reply was made to PR amd64/169927; it has been noted by=20
> >GNATS.
>=20
> I'm replying to Mark's forwarding of this, but gnats isn't in the Cc list
> so it will not be noted by gnats, again.
>=20
> >On Wed, Jul 18, 2012 at 02:03:58AM +1000, Bruce Evans wrote:
> >> Apart from doing the bogus fnclex for T_XMMFLT and the delayed effect =
of
> >> i387 status bits, merging or not merging the statuses makes little
> >> difference, since if a status bit is set and is not masked according
> >> to its control word, then it will generate a trap soon if it didn't
> >> genearate the current one.
> >
> >The trap number is available for SA_SIGINFO type of handlers with
> >si_trapno member of siginfo_t. I think this is final argument to have
> >separate fputrap_{x87,sse} functions.
>=20
> OK.  Also, you can tell instruction that faulted, provided there are
> no spurious faults.  I think spurious faults can only occur for IRQ13
> exception handling which was never supported for amd64 and is no longer
> supported for i386.  My old test program that is broken by not clobbering
> the status was mainly for finding and fixing such spurious faults.
> I never owned an i387 or i486SX and had to break i486 exception handling
> to test IRQ13.
486SX+487 never used irq13 as well, since 487 was the 'full' DX package,
which turned off SX socket.

=2E..
> >+
> >+	/*
> >+	 * Coomparing with the x87 #MF handler, we do not clear
> >+	 * exceptions from the mxcsr.
> >+	 */
>=20
> Spelling error.
>=20
> This is already covered in too much detail in the big comment before
> the functions.  Don't repeat it here.
Removed.

>=20
>=20
> >+	if (PCPU_GET(fpcurthread) !=3D curthread)
> >+		mxcsr =3D curthread->td_pcb->pcb_save->sv_env.en_mxcsr;
> >+	else
> >+		stmxcsr(&mxcsr);
> >+
> >+	critical_exit();
> >+
>=20
> Style bug (extra blank line).  The others are necessary, but become
> unnecessary after removing the comment.
>=20
> >+	return (fpetable[(mxcsr & (mxcsr >> 16)) & 0x3f]);
> >+}
> >+
> > /*
> >  * Implement device not available (DNA) exception
> >  *
>=20
> Cleaning this up gives code that is small enogh for me:
>=20
> int
> fputrap_sse(void)
> {
>   	u_int mxcsr;
>=20
>   	critical_enter();
>   	if (PCPU_GET(fpcurthread) !=3D curthread)
>   		mxcsr =3D curthread->td_pcb->pcb_save->sv_env.en_mxcsr;
>   	else
>   		stmxcsr(&mxcsr);
>   	critical_exit();
>   	return (fpetable[(mxcsr & (mxcsr >> 16)) & 0x3f]);
> }
Ok.

>=20
> I forgot to replace your patch by the newer one before replying.
>=20
> Just noticed another style/efficiency bug: this should use curpcb,
> not curthread->td_pcb.  I don't like the curpcb global, but as
> long as it exists, it should be used.  Every time I looked at
> removing this global, it seemed better to keep it.  C code can
> easily use curthread->td_pcb instead of curpcb, and this is faster
> if curthread is cached.  But CURPCB is easier to use in asm.
I changed to use curpcb (its use is indeed mainly for asm context switch
code). I do want to use pcb_save though, see below.

>=20
> Similarly in fputrap_x87().
>=20
> Not similarly for i386.  i386 still uses the GET_FPU_CW/SW() macros,
> and these take a thread arg (which is named badly as `thread' instead
> of the normal td) and can't use curpcb.  The cleanup here goes with
> your removal of these macros.
>=20
> I now quote your newer patch for the fputrap_x87() cleanup:
>=20
> > @@ -531,8 +534,9 @@ static char fpetable[128] =3D {
> >   * solution for signals other than SIGFPE.
> >   */
> >  int
> > -fputrap()
> > +fputrap_x87(void)
> >  {
> > +	struct savefpu *pcb_save;
>=20
> No need for this before or after.  Let the compiler cache it.  Just use
> curpcb now.
The curpcb access shall be spelled as PCPU_GET(curpcb). Please note that
compiler is not allowed to cache the accesses, since there is __asm
__volatile expression to indirect through %gs-prefixed read.

>=20
> >  	u_short control, status;
> >
> >  	critical_enter();
> > @@ -543,19 +547,40 @@ fputrap()
> >  	 * wherever they are.
> >  	 */
> >  	if (PCPU_GET(fpcurthread) !=3D curthread) {
> > -		control =3D GET_FPU_CW(curthread);
> > -		status =3D GET_FPU_SW(curthread);
> > +		pcb_save =3D curthread->td_pcb->pcb_save;
> > +		control =3D pcb_save->sv_env.en_cw;
> > +		status =3D pcb_save->sv_env.en_sw;
>=20
> Change to:
>=20
>   		control =3D curpcb->pcb_save->sv_env.en_cw;
>   		status =3D curpcb->pcb_save->sv_env.en_sw;
>=20
> ...
>=20
> i386 used curpcb near here in FreeBSD-3, but was changed to use a macro
> in FreeBSD-4.  However, the macro still used curpcb, since it was a
> special one that was only used in npx_intr() and thus didn't need to
> start with a general td arg.


diff --git a/sys/amd64/amd64/fpu.c b/sys/amd64/amd64/fpu.c
index a7812b7..f88aba7 100644
--- a/sys/amd64/amd64/fpu.c
+++ b/sys/amd64/amd64/fpu.c
@@ -73,6 +73,7 @@ __FBSDID("$FreeBSD$");
 #define	fxrstor(addr)		__asm __volatile("fxrstor %0" : : "m" (*(addr)))
 #define	fxsave(addr)		__asm __volatile("fxsave %0" : "=3Dm" (*(addr)))
 #define	ldmxcsr(csr)		__asm __volatile("ldmxcsr %0" : : "m" (csr))
+#define	stmxcsr(addr)		__asm __volatile("stmxcsr %0" : : "m" (*(addr)))
=20
 static __inline void
 xrstor(char *addr, uint64_t mask)
@@ -105,6 +106,7 @@ void	fnstsw(caddr_t addr);
 void	fxsave(caddr_t addr);
 void	fxrstor(caddr_t addr);
 void	ldmxcsr(u_int csr);
+void	stmxcsr(u_int csr);
 void	xrstor(char *addr, uint64_t mask);
 void	xsave(char *addr, uint64_t mask);
=20
@@ -113,9 +115,6 @@ void	xsave(char *addr, uint64_t mask);
 #define	start_emulating()	load_cr0(rcr0() | CR0_TS)
 #define	stop_emulating()	clts()
=20
-#define GET_FPU_CW(thread) ((thread)->td_pcb->pcb_save->sv_env.en_cw)
-#define GET_FPU_SW(thread) ((thread)->td_pcb->pcb_save->sv_env.en_sw)
-
 CTASSERT(sizeof(struct savefpu) =3D=3D 512);
 CTASSERT(sizeof(struct xstate_hdr) =3D=3D 64);
 CTASSERT(sizeof(struct savefpu_ymm) =3D=3D 832);
@@ -514,11 +513,15 @@ static char fpetable[128] =3D {
 };
=20
 /*
- * Preserve the FP status word, clear FP exceptions, then generate a SIGFP=
E.
+ * Preserve the FP status word, clear FP exceptions for x87, then
+ * generate a SIGFPE.
+ *
+ * Clearing exceptions was necessary mainly to avoid IRQ13 bugs and is
+ * engraved in our i386 ABI.  We now depend on longjmp() restoring a
+ * usable state.  Restoring the state or examining it might fail if we
+ * didn't clear exceptions.
  *
- * Clearing exceptions is necessary mainly to avoid IRQ13 bugs.  We now
- * depend on longjmp() restoring a usable state.  Restoring the state
- * or examining it might fail if we didn't clear exceptions.
+ * For SSE exceptions, the exceptions are not cleared.
  *
  * The error code chosen will be one of the FPE_... macros. It will be
  * sent as the second argument to old BSD-style signal handlers and as
@@ -531,8 +534,9 @@ static char fpetable[128] =3D {
  * solution for signals other than SIGFPE.
  */
 int
-fputrap()
+fputrap_x87(void)
 {
+	struct savefpu *pcb_save;
 	u_short control, status;
=20
 	critical_enter();
@@ -543,19 +547,33 @@ fputrap()
 	 * wherever they are.
 	 */
 	if (PCPU_GET(fpcurthread) !=3D curthread) {
-		control =3D GET_FPU_CW(curthread);
-		status =3D GET_FPU_SW(curthread);
+		pcb_save =3D PCPU_GET(curpcb)->pcb_save;
+		control =3D pcb_save->sv_env.en_cw;
+		status =3D pcb_save->sv_env.en_sw;
 	} else {
 		fnstcw(&control);
 		fnstsw(&status);
+		fnclex();
 	}
=20
-	if (PCPU_GET(fpcurthread) =3D=3D curthread)
-		fnclex();
 	critical_exit();
 	return (fpetable[status & ((~control & 0x3f) | 0x40)]);
 }
=20
+int
+fputrap_sse(void)
+{
+	u_int mxcsr;
+
+	critical_enter();
+	if (PCPU_GET(fpcurthread) !=3D curthread)
+		mxcsr =3D PCPU_GET(curpcb)->pcb_save->sv_env.en_mxcsr;
+	else
+		stmxcsr(&mxcsr);
+	critical_exit();
+	return (fpetable[(mxcsr & (~mxcsr >> 7)) & 0x3f]);
+}
+
 /*
  * Implement device not available (DNA) exception
  *
diff --git a/sys/amd64/amd64/trap.c b/sys/amd64/amd64/trap.c
index 75e15e0..57d1cc2 100644
--- a/sys/amd64/amd64/trap.c
+++ b/sys/amd64/amd64/trap.c
@@ -328,7 +328,7 @@ trap(struct trapframe *frame)
 			break;
=20
 		case T_ARITHTRAP:	/* arithmetic trap */
-			ucode =3D fputrap();
+			ucode =3D fputrap_x87();
 			if (ucode =3D=3D -1)
 				goto userout;
 			i =3D SIGFPE;
@@ -442,7 +442,9 @@ trap(struct trapframe *frame)
 			break;
=20
 		case T_XMMFLT:		/* SIMD floating-point exception */
-			ucode =3D 0; /* XXX */
+			ucode =3D fputrap_sse();
+			if (ucode =3D=3D -1)
+				goto userout;
 			i =3D SIGFPE;
 			break;
 		}
diff --git a/sys/amd64/include/fpu.h b/sys/amd64/include/fpu.h
index 98a016b..7d0f0ea 100644
--- a/sys/amd64/include/fpu.h
+++ b/sys/amd64/include/fpu.h
@@ -62,7 +62,8 @@ int	fpusetregs(struct thread *td, struct savefpu *addr,
 	    char *xfpustate, size_t xfpustate_size);
 int	fpusetxstate(struct thread *td, char *xfpustate,
 	    size_t xfpustate_size);
-int	fputrap(void);
+int	fputrap_sse(void);
+int	fputrap_x87(void);
 void	fpuuserinited(struct thread *td);
 struct fpu_kern_ctx *fpu_kern_alloc_ctx(u_int flags);
 void	fpu_kern_free_ctx(struct fpu_kern_ctx *ctx);

--Thv7PGoFpDPJ7Oar
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (FreeBSD)

iEYEARECAAYFAlAGMvQACgkQC3+MBN1Mb4jANACfabMhOJxDnqO0u9dDdYGB13cS
FwAAoLLoJI8BmEAqeFdED3836xOMSXef
=NSzJ
-----END PGP SIGNATURE-----

--Thv7PGoFpDPJ7Oar--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120718035220.GO2676>