Date: Sun, 28 May 2006 23:58:21 +1000 (EST) From: Bruce Evans <bde@zeta.org.au> To: David Xu <davidxu@freebsd.org> Cc: cvs-src@freebsd.org, src-committers@freebsd.org, cvs-all@freebsd.org Subject: Re: cvs commit: src/sys/i386/isa npx.c Message-ID: <20060528223012.H2592@epsplex.bde.org> In-Reply-To: <200605282006.26364.davidxu@freebsd.org> References: <200605280440.k4S4ej96064322@repoman.freebsd.org> <200605281741.52752.davidxu@freebsd.org> <20060528201544.L20679@delplex.bde.org> <200605282006.26364.davidxu@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 28 May 2006, David Xu wrote: > On Sunday 28 May 2006 19:15, Bruce Evans wrote: > >> Starting with a clean FP state like the old code does is safest, but POSIX >> explicitly requires copying the environment, and bugs in the new code >> result in half of the most dangerous part of the environment (the SSE >> half of the exception flags) being copied anyway. >> >> Pending exceptions are not the only problem here. Pending exceptions >> are an i387 thing. Most exceptions are non-pending ones for IEEE inexact. > > how about following patch ? I want one that removes about 30 lines, not one that adds about 20 lines. > Index: npx.c > =================================================================== > RCS file: /home/ncvs/src/sys/i386/isa/npx.c,v > retrieving revision 1.168 > diff -u -u -r1.168 npx.c > --- npx.c 28 May 2006 04:40:45 -0000 1.168 > +++ npx.c 28 May 2006 11:40:58 -0000 > @@ -99,6 +99,7 @@ > #define fxrstor(addr) __asm("fxrstor %0" : : "m" (*(addr))) > #define fxsave(addr) __asm __volatile("fxsave %0" : "=m" (*(addr))) > #define ldmxcsr(__csr) __asm __volatile("ldmxcsr %0" : : "m" (__csr)) > +#define stmxcsr(addr) __asm __volatile("stmxcsr %0" : "=m" (*(addr))) > #endif > #define start_emulating() __asm("smsw %%ax; orb %0,%%al; lmsw %%ax" \ > : : "n" (CR0_TS) : "ax") > @@ -950,7 +951,8 @@ > { > union savefpu *state; > u_int32_t mxcsr; > - u_int32_t cw; > + u_int16_t cw; > + register_t s; > > if (!(td->td_pcb->pcb_flags & PCB_NPXINITDONE)) { > newtd->td_pcb->pcb_flags &= ~PCB_NPXINITDONE; > @@ -958,21 +960,40 @@ > } > > state = &newtd->td_pcb->pcb_save; > - /* get control word */ > - if (npxgetregs(td, state)) > - return; > - if (cpu_fxsr) { > - mxcsr = state->sv_xmm.sv_env.en_mxcsr; > - cw = state->sv_xmm.sv_env.en_cw; > + s = intr_disable(); > + if (curthread == PCPU_GET(fpcurthread)) { > +#ifdef CPU_ENABLE_SSE > + if (cpu_fxsr) { > + stmxcsr(&mxcsr); > + fnstcw(&cw); > + } > + else > +#endif > + { > + mxcsr = 0; > + fnstcw(&cw); > + } > } else { > - cw = state->sv_87.sv_env.en_cw; > - mxcsr = 0; > +#ifdef CPU_ENABLE_SSE > + if (cpu_fxsr) { > + mxcsr = td->td_pcb->pcb_save.sv_xmm.sv_env.en_mxcsr; > + cw = td->td_pcb->pcb_save.sv_87.sv_env.en_cw; > + } > + else > +#endif > + { > + mxcsr = 0; > + cw = td->td_pcb->pcb_save.sv_87.sv_env.en_cw; > + } > } > + intr_restore(s); Save the parent's state using npxsave() as in cpu_fork(). This is much cleaner and and only slightly less efficient. I still think the entire environment in that state needs to be copied. > bcopy(&npx_cleanstate, state, sizeof(*state)); > +#ifdef CPU_ENABLE_SSE > if (cpu_fxsr) { > state->sv_xmm.sv_env.en_cw = cw; > - state->sv_xmm.sv_env.en_mxcsr = mxcsr; > + state->sv_xmm.sv_env.en_mxcsr = mxcsr & ~0x3F; OK if it is correct to not copy the environment. BTW, your other patch that masks the top 16 bits in mxcsr needs to be extended to sometimes mask 0x40 (DAZ). The amd64 arch manual says to mask with mxcsr_mask which is in the fxsave image. Cleared bits in mxcsr_mask indicate reserved bits in a future-proof way. Unfortunately, this isn't past-proof so the following complications are needed: if the CPU stores 0 in mxcsr_mask, it means that mxcsr_mask is not supported; old CPUs (at least amd64 ones) that don't support mxcsr_mask also don't support DAZ, so the correct mask for them is 0xFFBF. These complications are implemented in Linux's i386 so I suppose they are no-ops at worst for non-amd64 CPUs. Linux stores the mask in a global variable. I think this breaks the asymmetric SMP case more than necessary. FreeBSD's amd64 uses mxcsr_mask without fixing up the case where it is 0. So does Linux-2.6.10's x86_64 (it Initializes the global variable but doesn't use it). Behaviour of mxcsr and DAZ on some CPUs: AXP2600 (Barton): mxcsr_mask = 0; attempting to set DAZ traps P3-800 (freefall): same as AXP2600 A64 (Id0xf48 stepping 8): mxcsr_mask = 0xFFFF; setting DAZ works > } else > +#endif vm_machdep.c is missing a DEV_NPX ifdef around the call here. Of course, lots of things would break if the option to not use npx were exercised. CPU_ENABLE_SSE is a silly option too. > state->sv_87.sv_env.en_cw = cw; > newtd->td_pcb->pcb_flags |= PCB_NPXINITDONE; > } > Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060528223012.H2592>