Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 18 Jul 2012 01:14:29 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        freebsd-amd64@freebsd.org
Subject:   Re: amd64/169927: siginfo, si_code for fpe errors when error occurs using the SSE math processor
Message-ID:  <20120717235622.C7417@besplex.bde.org>
In-Reply-To: <201207171350.q6HDoAJS033797@freefall.freebsd.org>
References:  <201207171350.q6HDoAJS033797@freefall.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 17 Jul 2012, Konstantin Belousov wrote:

> It was on my TODO list for long time. Lets handle amd64 first, both for
> native and compat32.
>
> I think the following should be somewhat better variant. I do leave
> the fnclex there for x87.

Too large for me.  Although the exceptions are different, the exception
handlers are essentially the same.

Remove the fnclex too.  Then the exception handlers will be even
more similar.

> diff --git a/sys/amd64/amd64/fpu.c b/sys/amd64/amd64/fpu.c
> index a7812b7..34cf8d4 100644
> --- a/sys/amd64/amd64/fpu.c
> +++ b/sys/amd64/amd64/fpu.c
> ...
> @@ -113,9 +115,6 @@ void	xsave(char *addr, uint64_t mask);
>  #define	start_emulating()	load_cr0(rcr0() | CR0_TS)
>  #define	stop_emulating()	clts()
> =20
> -#define GET_FPU_CW(thread) ((thread)->td_pcb->pcb_save->sv_env.en_cw)
> -#define GET_FPU_SW(thread) ((thread)->td_pcb->pcb_save->sv_env.en_sw)
> -
>  CTASSERT(sizeof(struct savefpu) =3D=3D 512);
>  CTASSERT(sizeof(struct xstate_hdr) =3D=3D 64);
>  CTASSERT(sizeof(struct savefpu_ymm) =3D=3D 832);

Good to remove this.

> @@ -514,11 +513,15 @@ static char fpetable[128] =3D {
>  };
> =20
>  /*
> - * Preserve the FP status word, clear FP exceptions, then generate a SIGFP=
> E.
> + * Preserve the FP status word, clear FP exceptions for x87, then
> + * generate a SIGFPE.
> + *
> + * Clearing exceptions was necessary mainly to avoid IRQ13 bugs and is
> + * engraved in our i386 ABI.  We now depend on longjmp() restoring a
> + * usable state.  Restoring the state or examining it might fail if we
> + * didn't clear exceptions.

Hrmph.  Someone already broken the ABI and enabled the trap in longjmp()
by removing the fninit in longjmp().

Examining the state won't trap, since it is a no-wait instruction (fnstsw
or fnstenv).

Clearing the state in the kernel doesn't even fix longjmp(), since
longjmp() will still trap if it is called before the kernel trap
clobbers the status:

 	fldz
 	fld1
 	fdiv	%st,%st(1)	# exception; trap pending
 	call	longjmp		# traps on the non-control fldcw

I think this has to be written in asm to cause a problem.  Similar C code:

 	foo = one / zero;
 	longjmp();

must translate to something like:

 	fldz
 	fld1
 	fdiv	%st,%st(1)	# exception; trap pending
 	fstpl	foo		# ABI and/or C abstract machine requires
 				# this to be put this away before function
 				# call
 				# trap now; bogus fnclex in kernel
 	call	longjmp		# no trap later

The ABI problem only affects FreeBSD-4-compat signal handlers.
FreeBSD-5+ signal handlers get a clean FP state, so they won't trap in
longjmp().  The fnclex was a hack to let signal handlers run with
n unclean FP state without faulting again.  The undefined behaviour
that results when a SIGFPE handler returns is not part of an ABI.

It seems to be necessary to use fnstenv; copy cw; fldenv to fix longjmp().
I used fninit because it is faster and the ABI and or the C abstract
machine used to not support the exception status.

>   *
> - * Clearing exceptions is necessary mainly to avoid IRQ13 bugs.  We now
> - * depend on longjmp() restoring a usable state.  Restoring the state
> - * or examining it might fail if we didn't clear exceptions.
> + * For SSE exceptions, the exceptions are not cleared.
>   *
>   * The error code chosen will be one of the FPE_... macros. It will be
>   * sent as the second argument to old BSD-style signal handlers and as
> @@ -531,8 +534,9 @@ static char fpetable[128] =3D {
>   * solution for signals other than SIGFPE.
>   */
>  int
> -fputrap()
> +fputrap_x87(void)
>  {
> +	struct savefpu *pcb_save;
>  	u_short control, status;
> =20
>  	critical_enter();
> @@ -543,19 +547,43 @@ fputrap()
>  	 * wherever they are.
>  	 */
>  	if (PCPU_GET(fpcurthread) !=3D curthread) {
> -		control =3D GET_FPU_CW(curthread);
> -		status =3D GET_FPU_SW(curthread);
> +		pcb_save =3D curthread->td_pcb->pcb_save;
> +		control =3D pcb_save->sv_env.en_cw;
> +		status =3D pcb_save->sv_env.en_sw;
>  	} else {
>  		fnstcw(&control);
>  		fnstsw(&status);
> +		fnclex();
>  	}
> =20
> -	if (PCPU_GET(fpcurthread) =3D=3D curthread)
> -		fnclex();
>  	critical_exit();
>  	return (fpetable[status & ((~control & 0x3f) | 0x40)]);
>  }

Hmm, the separate trap handlers are more needed, yet more broken, than
I first thought.  I first thouught that it was necessary to merge the
statuses like the patch in the PR does.  After all, fenv(3) can do no
better.  The control words had better be the same (after masking with
0x3f), or the signal code will be nonsense.  fenv(3) depends on this
too.  The separate trap handlers allow use to use the control word
that matches the status word and the signal code to be only for the
hardware that generated the trap.  However, this is worse than useless,
since the signal code doesn't encode which hardware generated the trap.

> =20
> +int
> +fputrap_sse(void)

The 'fpu' name is more bogus than usual.  This is not an fpu trap, but
an sse trap.

There is a related problem with the trap name.  It is T_XMMFLT.  But
this trap can presumably also be caused by YMM and newer extensions
(AVX?).

> +{
> +	u_int mxcsr;
> +	u_short control, status;
> +
> +	critical_enter();
> +
> +	/*
> +	 * Coomparing with the x87 #MF handler, we do not clear
> +	 * exceptions from the mxcsr.
> +	 */
> +	if (PCPU_GET(fpcurthread) !=3D curthread)
> +		mxcsr =3D curthread->td_pcb->pcb_save->sv_env.en_mxcsr;
> +	else
> +		stmxcsr(&mxcsr);
> +
> +	critical_exit();
> +
> +	status =3D mxcsr & 0x3f;
> +	control =3D (mxcsr >> 16) & 0x3f;
> +	return (fpetable[status & (~control | 0x40)]);

The 0x40 bit doesn't exist in the mxcsr status and ORing it in here has
no effect.  Replace the last 3 lines by:

 	return (fpetable[(status & (control >> 16)) & 0x3f];

> +}
> +
>  /*
>   * Implement device not available (DNA) exception
>   *
> diff --git a/sys/amd64/amd64/trap.c b/sys/amd64/amd64/trap.c
> index 75e15e0..57d1cc2 100644
> --- a/sys/amd64/amd64/trap.c
> +++ b/sys/amd64/amd64/trap.c
> @@ -328,7 +328,7 @@ trap(struct trapframe *frame)
>  			break;
> =20
>  		case T_ARITHTRAP:	/* arithmetic trap */
> -			ucode =3D fputrap();
> +			ucode =3D fputrap_x87();
>  			if (ucode =3D=3D -1)
>  				goto userout;
>  			i =3D SIGFPE;
> @@ -442,7 +442,9 @@ trap(struct trapframe *frame)
>  			break;
> =20
>  		case T_XMMFLT:		/* SIMD floating-point exception */
> -			ucode =3D 0; /* XXX */
> +			ucode =3D fputrap_sse();
> +			if (ucode =3D=3D -1)
> +				goto userout;

The T_FOO code could be encoded here, but I think there is no space
to spare, and any extension would further complicate the decoding.
Not many SIGFPE handlers understand the current encoding.

>  			i =3D SIGFPE;
>  			break;
>  		}

So I still want a single kernel exception handle that merges the statuses.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120717235622.C7417>