Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 10 Jun 2006 01:30:22 GMT
From:      Bruce Evans <bde@zeta.org.au>
To:        freebsd-bugs@FreeBSD.org
Subject:   Re: kern/98460 : [kernel] [patch] fpu_clean_state() cannot be disabled for not AMD processors, those are not vulnerable to FreeBSD-SA-06:14.fpu
Message-ID:  <200606100130.k5A1UMes082482@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help
The following reply was made to PR kern/98460; it has been noted by GNATS.

From: Bruce Evans <bde@zeta.org.au>
To: Rostislav Krasny <rosti.bsd@gmail.com>
Cc: freebsd-gnats-submit@freebsd.org
Subject: Re: kern/98460 : [kernel] [patch] fpu_clean_state() cannot be disabled
 for not AMD processors, those are not vulnerable to FreeBSD-SA-06:14.fpu
Date: Sat, 10 Jun 2006 11:26:20 +1000 (EST)

 On Fri, 9 Jun 2006, Rostislav Krasny wrote:
 
 > On Wed, 7 Jun 2006 12:09:10 +1000 (EST)
 > Bruce Evans <bde@zeta.org.au> wrote:
 >
 >>> And then you want to call the fpu_clean_state() function conditionally,
 >>> like in following example?
 >>>
 >>> if (cpu_fxsr & CPU_FXSR_NEEDCLEAN)
 >>>        fpu_clean_state();
 >>
 >> Not quite like that.  In my version there is no function call -- the code
 >> is excecuted in the one place where it is needed, so there is no function
 >> call overhead or possible branch prediction oferhead for the function call.
 >
 > Could you please explain in more detail how that can be done?
 
 Just do it.  The easiest way is define the new function as inline.
 This just works because the function is defined before it is used.
 
 My version uses open-coded inlining.  This works because in -current
 the function is called twice, but only one of the calls is non-bogus,
 so open-coded inlining of the calls needed doesn't result in multiple
 copies of the code.  In the amd64 version, there are 2 explicit calls,
 with a bogus one in npx_getregs() that just cleans the state already
 owned by the thread.  In the i386 version, there is one call a low-level
 function that is called twice.  (If we really cared about efficiency,
 then we might inline this function too, but it it would be even better
 handle the differences between the fxsr and non-fxsr cases at a higher
 level.)
 
 Here is my version using open-coded inlining.
 
 % Index: npx.c
 % ===================================================================
 % RCS file: /home/ncvs/src/sys/i386/isa/npx.c,v
 % retrieving revision 1.152
 % diff -u -1 -r1.152 npx.c
 % --- npx.c	19 Jun 2004 22:24:16 -0000	1.152
 % +++ npx.c	22 Apr 2006 11:58:31 -0000
 % @@ -173,2 +176,3 @@
 % 
 % +static	float			npx_cleandata;
 %  static	union savefpu		npx_cleanstate;
 
 This also fixes a type mismatch and style bugs in the new variable.
 
 % @@ -796,13 +848,34 @@
 %  	PCPU_SET(fpcurthread, curthread);
 % -	pcb = PCPU_GET(curpcb);
 % 
 % +#ifdef CPU_ENABLE_SSE
 % +	/*
 % +	 * In the fxsr case, do a dummy load to set the last-instruction
 % +	 * pointers and opcode to constant (kernel) values in case
 % +	 * fpurstor() doesn't set them.  Certain AMD CPUs are too lazy
 % +	 * about saving and restoring them, so on these CPUs we have broken
 % +	 * context switching and would have a security hole if we didn't
 % +	 * force a setting here.  First, clear any pending exceptions to
 % +	 * ensure that the load doesn't trap (userland may have left us
 % +	 * with an unmasked pending exception since fxsave doesn't do an
 % +	 * implicit fninit).  The load itself will cause an exception if
 % +	 * userland has left us with a full stack; we let this happen
 % +	 * since it is harmless except for being much slower in the rare
 % +	 * case that it happens.
 % +	 */
 % +	if (cpu_fxsr /* && cpu_fxcsw_broken */) {
 % +		fnstsw(&status);
 % +		if (status & 0x80)
 % +			fnclex();
 % +		__asm("flds %0" : : "m" (npx_cleandata));
 % +	}
 % +#endif /* CPU_ENABLE_SSE */
 % +
 % +	pcb = PCPU_GET(curpcb);
 
 The inline function provides a better place to attach verbose comments.
 It has verbose comments that emphasize different details than the
 verbose comment above.  I would prefer to have less-verbose comments.
 
 The code in the above is identical with that in the function except it
 does the cpu_fxsr check more directly and it doesn't do the ffree (see
 the comment).
 
 Note that the fixup can probably be done much more optimally and/or
 simply by doing it in a less time-critical place than here.  The
 relevant CPUs have out-of-order exceution with the FPUs independent of
 the integer ALUs.  It might not matter that fnclex is slow or that the
 fixup takes any cycles at all, since it runs almost entirely on the
 FPUs and the FPUs can proceed independently.  Unfortunately, placing
 the fixup here almost certainly gives a bottleneck in the FPU instruction
 scheduling.  Userland has just tried to execute an FPU instruction,
 and cannot proceeed with at least that instruction until we have
 executed the fnstsw/[fnclex/]fld/fxrstor sequence here, and the sequence
 must be executed sequentially too.  Userland may be able to proceed
 with integer instructions but it is likely to stall on a dependency
 on the FPU instruction since the FPU instruction takes a long time for
 the trap here, and any addition to the time here is unlikely to be
 hidden by parallelism.  Placing the fixup after the fxsave on context
 switches would probably be more efficient.  On context switches, the
 kernel runs (or can be arranged to run) for a long time doing only
 integer operations after doing the floating point part of the switch,
 so an fxsave/fnstsw/[fnclex/]fld sequence can run in parallel.
 
 Parallel execution also affects the issue of whether FPU state for the
 switched-to thread should be loaded at context switch time instead of
 setting TS and only loading it (in the trap handler just below here) if
 they are used.  A direct load might be free since it can run in parallel.
 The trap handling certainly isn't free since it can't run in parallel.
 
 Fixes for an indirectly-related nearby bug (already fixed in the amd64
 version) that happen to be in the same patch:
 
 %  	if ((pcb->pcb_flags & PCB_NPXINITDONE) == 0) {
 %  		/*
 % -		 * This is the first time this thread has used the FPU or
 % -		 * the PCB doesn't contain a clean FPU state.  Explicitly
 % -		 * initialize the FPU and load the default control word.
 % +		 * This is the first time this thread has used the FPU, or
 % +		 * the PCB doesn't contain a clean FPU state.  Load the
 % +		 * initial state (XXX reword more).
 %  		 */
 % -		fninit();
 % -		control = __INITIAL_NPXCW__;
 % -		fldcw(&control);
 % +		fpurstor(&npx_cleanstate);
 %  		pcb->pcb_flags |= PCB_NPXINITDONE;
 
 Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200606100130.k5A1UMes082482>