From owner-freebsd-amd64@FreeBSD.ORG  Wed Jul 18 00:32:33 2012
Return-Path: <owner-freebsd-amd64@FreeBSD.ORG>
Delivered-To: freebsd-amd64@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C6D671065675
	for <freebsd-amd64@freebsd.org>; Wed, 18 Jul 2012 00:32:33 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail07.syd.optusnet.com.au (mail07.syd.optusnet.com.au
	[211.29.132.188])
	by mx1.freebsd.org (Postfix) with ESMTP id 47DBC8FC0A
	for <freebsd-amd64@freebsd.org>; Wed, 18 Jul 2012 00:32:33 +0000 (UTC)
Received: from c122-106-171-246.carlnfd1.nsw.optusnet.com.au
	(c122-106-171-246.carlnfd1.nsw.optusnet.com.au [122.106.171.246])
	by mail07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	q6I0WNrK018861
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Wed, 18 Jul 2012 10:32:30 +1000
Date: Wed, 18 Jul 2012 10:32:23 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Mark Linimon <linimon@lonesome.com>
In-Reply-To: <201207172150.q6HLoFvB096742@freefall.freebsd.org>
Message-ID: <20120718103210.E834@besplex.bde.org>
References: <201207172150.q6HLoFvB096742@freefall.freebsd.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-amd64@freebsd.org
Subject: Re: amd64/169927: siginfo, si_code for fpe errors when error occurs
 using the SSE math processor
X-BeenThere: freebsd-amd64@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Porting FreeBSD to the AMD64 platform <freebsd-amd64.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-amd64>,
	<mailto:freebsd-amd64-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-amd64>
List-Post: <mailto:freebsd-amd64@freebsd.org>
List-Help: <mailto:freebsd-amd64-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-amd64>,
	<mailto:freebsd-amd64-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 18 Jul 2012 00:32:33 -0000

On Tue, 17 Jul 2012, Mark Linimon wrote:

> The following reply was made to PR amd64/169927; it has been noted by GNATS.

I'm replying to Mark's forwarding of this, but gnats isn't in the Cc list
so it will not be noted by gnats, again.

> On Wed, Jul 18, 2012 at 02:03:58AM +1000, Bruce Evans wrote:
> > Apart from doing the bogus fnclex for T_XMMFLT and the delayed effect of
> > i387 status bits, merging or not merging the statuses makes little
> > difference, since if a status bit is set and is not masked according
> > to its control word, then it will generate a trap soon if it didn't
> > genearate the current one.
>
> The trap number is available for SA_SIGINFO type of handlers with
> si_trapno member of siginfo_t. I think this is final argument to have
> separate fputrap_{x87,sse} functions.

OK.  Also, you can tell instruction that faulted, provided there are
no spurious faults.  I think spurious faults can only occur for IRQ13
exception handling which was never supported for amd64 and is no longer
supported for i386.  My old test program that is broken by not clobbering
the status was mainly for finding and fixing such spurious faults.
I never owned an i387 or i486SX and had to break i486 exception handling
to test IRQ13.

> For amd64, SSE hardware is FPU, so I do not see much wrong with the name.

No, FPU is a only part of SSE hardware (or separate from it, with separate
interfaces from it to x87 and xmm registers, and these registers possibly
layered under futher extensions.  T_XMMFLT


The names are sort of backwards:
- T_XMMFLT: /* SIMD floating-point exception */
     If the name is correct, then this can occur for non-FPU things (most
     xmm instructions are not for the FPU).  But more likely the comment
     is correct, and this can only occur for FPU things from SIMD instructions.
- fputrap_x87:
     This is not an x87 trap under fpu, but an x87-fpu trap under npx.
- fputrap_sse:
     This is not an sse trap under fpu, but one of
     - an xmm-possibly-non-fpu trap under npx (if T_XMMFLT is correct)
     - a simd-fpu trap under npx (if the comment is correct)

I checked a few xmm instructions in old amd64 manuals and could only find
T_XMMFLT for FPU ones.  It's spelled #XF or OSXMMEXCPT in the manuals --
CPU manufacturers also have problems naming things according to their
layering when the layering is subject to extension.  Other relevant
exceptions are all common to x87/simd-fpu or cpu.  For the maxpd
instruction they are:

- Invalid opcode, #UD:
       this mainly happens when simd is not supported or is turned off
       and an attempt was made to execute an simd instruction.  It also
       happens when there is an #XF but simd is turned off (I don't see
       how #XF can happen while simd is turened off).
- Device not available, #NM:
       we use a common handler
- Stack, #AS
- General protection, #GO
- Page fault, #FP
- SIMD Floating-Point Exception, #XF:
       I copied these descriptions verbatim.  Another CPU
       manufacturer bug here is that this description is capitalized
       but none of the others is.)
     - IE and DE sub-exceptions (no others can occur for MAXPD).

> I changed fputrap_sse() according to your suggestion.

It's still too large for me :-).

> diff --git a/sys/amd64/amd64/fpu.c b/sys/amd64/amd64/fpu.c
> index a7812b7..356b3ac 100644
> --- a/sys/amd64/amd64/fpu.c
> +++ b/sys/amd64/amd64/fpu.c
> @@ -514,11 +513,15 @@ static char fpetable[128] = {
>  };
>
>  /*
> - * Preserve the FP status word, clear FP exceptions, then generate a SIGFPE.
> + * Preserve the FP status word, clear FP exceptions for x87, then
> + * generate a SIGFPE.

This is out of date about preserving the status word.

> + *
> + * Clearing exceptions was necessary mainly to avoid IRQ13 bugs and is
> + * engraved in our i386 ABI.  We now depend on longjmp() restoring a
> + * usable state.  Restoring the state or examining it might fail if we
> + * didn't clear exceptions.
>   *
> - * Clearing exceptions is necessary mainly to avoid IRQ13 bugs.  We now
> - * depend on longjmp() restoring a usable state.  Restoring the state
> - * or examining it might fail if we didn't clear exceptions.

This should be deleted in a separate commit.

I think the comment is more out of date than before: before:
- this is amd64, and never supported IRQ13
- this is amd64, and longjmp() always restored a usable state
- we don't depend on longjmp() restoring a usable state now, on either
     amd64 or i386, except for BSD-43  and FreeBSD-4 signal handlers.
     amd64 never supported either of these, except in the i386 compat
     layer. Your changes are too large and complete :-).  But they are
     not large and complete enough to make the fnclex clobbering depend
     on the signal handler.
After:
- this is amd64, so the i386 ABI is irrelevant except in the compat layer.
     However, since amd64 has always done the fnclex clobbering, it is really
     part of the x87 ABI.

> + * For SSE exceptions, the exceptions are not cleared.
>   *
>   * The error code chosen will be one of the FPE_... macros. It will be
>   * sent as the second argument to old BSD-style signal handlers and as
> @@ -531,8 +534,9 @@ static char fpetable[128] = {
>   * solution for signals other than SIGFPE.
>   */
>  int
> -fputrap()
> +fputrap_x87(void)
> ...

Good cleanups in this function.

> +int
> +fputrap_sse(void)
> +{
> +	u_int mxcsr;
> +
> +	critical_enter();
> +
> +	/*
> +	 * Coomparing with the x87 #MF handler, we do not clear
> +	 * exceptions from the mxcsr.
> +	 */

Spelling error.

This is already covered in too much detail in the big comment before
the functions.  Don't repeat it here.


> +	if (PCPU_GET(fpcurthread) != curthread)
> +		mxcsr = curthread->td_pcb->pcb_save->sv_env.en_mxcsr;
> +	else
> +		stmxcsr(&mxcsr);
> +
> +	critical_exit();
> +

Style bug (extra blank line).  The others are necessary, but become
unnecessary after removing the comment.

> +	return (fpetable[(mxcsr & (mxcsr >> 16)) & 0x3f]);
> +}
> +
>  /*
>   * Implement device not available (DNA) exception
>   *

Cleaning this up gives code that is small enogh for me:

int
fputrap_sse(void)
{
   	u_int mxcsr;

   	critical_enter();
   	if (PCPU_GET(fpcurthread) != curthread)
   		mxcsr = curthread->td_pcb->pcb_save->sv_env.en_mxcsr;
   	else
   		stmxcsr(&mxcsr);
   	critical_exit();
   	return (fpetable[(mxcsr & (mxcsr >> 16)) & 0x3f]);
}

I forgot to replace your patch by the newer one before replying.

Just noticed another style/efficiency bug: this should use curpcb,
not curthread->td_pcb.  I don't like the curpcb global, but as
long as it exists, it should be used.  Every time I looked at
removing this global, it seemed better to keep it.  C code can
easily use curthread->td_pcb instead of curpcb, and this is faster
if curthread is cached.  But CURPCB is easier to use in asm.

Similarly in fputrap_x87().

Not similarly for i386.  i386 still uses the GET_FPU_CW/SW() macros,
and these take a thread arg (which is named badly as `thread' instead
of the normal td) and can't use curpcb.  The cleanup here goes with
your removal of these macros.

I now quote your newer patch for the fputrap_x87() cleanup:

>  @@ -531,8 +534,9 @@ static char fpetable[128] = {
>    * solution for signals other than SIGFPE.
>    */
>   int
>  -fputrap()
>  +fputrap_x87(void)
>   {
>  +	struct savefpu *pcb_save;

No need for this before or after.  Let the compiler cache it.  Just use
curpcb now.

>   	u_short control, status;
> 
>   	critical_enter();
>  @@ -543,19 +547,40 @@ fputrap()
>   	 * wherever they are.
>   	 */
>   	if (PCPU_GET(fpcurthread) != curthread) {
>  -		control = GET_FPU_CW(curthread);
>  -		status = GET_FPU_SW(curthread);
>  +		pcb_save = curthread->td_pcb->pcb_save;
>  +		control = pcb_save->sv_env.en_cw;
>  +		status = pcb_save->sv_env.en_sw;

Change to:

   		control = curpcb->pcb_save->sv_env.en_cw;
   		status = curpcb->pcb_save->sv_env.en_sw;

...

i386 used curpcb near here in FreeBSD-3, but was changed to use a macro
in FreeBSD-4.  However, the macro still used curpcb, since it was a
special one that was only used in npx_intr() and thus didn't need to
start with a general td arg.

Bruce