Date: Tue, 22 Feb 2005 15:18:10 -0500 From: David Schultz <das@FreeBSD.ORG> To: Nate Lawson <nate@root.org> Cc: cvs-all@FreeBSD.ORG Subject: Re: cvs commit: src/lib/msun/i387 Makefile.inc e_atan2.S e_atan2f.S s_atan.S Message-ID: <20050222201810.GA37791@VARK.MIT.EDU> In-Reply-To: <421B81E4.6080909@root.org> References: <200502211604.j1LG4NNx037623@repoman.freebsd.org> <421B24E2.7050800@portaone.com> <20050222135251.GB29054@VARK.MIT.EDU> <421B81E4.6080909@root.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Feb 22, 2005, Nate Lawson wrote: > David Schultz wrote: > >By the way, here are some other results for the Pentium 4, all > >without SSE. SSE makes things a bit worse, probably because the > >x87 and SSE registers are shared, and the Pentium 4 imposes a > >large penalty for switching between the two sets. > > I don't believe this is correct. MMX and x87 use the same register > context (hence emms), however the XMM registers (SSE*) are separate. > It's possible gcc is generating MMX instructions though with your SSE > command line switch. Yep, you're right, I was thinking of the MMX register set. I compared the code generated by gcc with and without SSE/SSE2, and found that the only thing it uses SSE2 for is converting from floating point->integer and back (e.g. CVTTSD2SI instead of i387 control word frobbing and FISTL). There was also one place where gcc just got confused and juggled around a bunch of registers on the i387 stack, but I don't think that accounts for the difference. I wonder if CVTTSD2SI and friends are slower than an OR/MOV/FLDCW/FISTL/FLDCW sequence on the Pentium 4 for some bizarre reason, or if I missed something else significant while scanning the diff.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050222201810.GA37791>