From owner-cvs-all@FreeBSD.ORG Tue Feb 22 13:52:57 2005 Return-Path: Delivered-To: cvs-all@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6C93316A4CE; Tue, 22 Feb 2005 13:52:57 +0000 (GMT) Received: from VARK.MIT.EDU (VARK.MIT.EDU [18.95.3.179]) by mx1.FreeBSD.org (Postfix) with ESMTP id F319D43D54; Tue, 22 Feb 2005 13:52:56 +0000 (GMT) (envelope-from das@FreeBSD.ORG) Received: from VARK.MIT.EDU (localhost [127.0.0.1]) by VARK.MIT.EDU (8.13.3/8.13.1) with ESMTP id j1MDqp7r029833; Tue, 22 Feb 2005 08:52:51 -0500 (EST) (envelope-from das@FreeBSD.ORG) Received: (from das@localhost) by VARK.MIT.EDU (8.13.3/8.13.1/Submit) id j1MDqpeS029832; Tue, 22 Feb 2005 08:52:51 -0500 (EST) (envelope-from das@FreeBSD.ORG) Date: Tue, 22 Feb 2005 08:52:51 -0500 From: David Schultz To: Maxim Sobolev Message-ID: <20050222135251.GB29054@VARK.MIT.EDU> Mail-Followup-To: Maxim Sobolev , src-committers@FreeBSD.ORG, cvs-src@FreeBSD.ORG, cvs-all@FreeBSD.ORG References: <200502211604.j1LG4NNx037623@repoman.freebsd.org> <421B24E2.7050800@portaone.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <421B24E2.7050800@portaone.com> cc: cvs-src@FreeBSD.ORG cc: src-committers@FreeBSD.ORG cc: cvs-all@FreeBSD.ORG Subject: Re: cvs commit: src/lib/msun/i387 Makefile.inc e_atan2.S e_atan2f.S s_atan.S X-BeenThere: cvs-all@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: CVS commit messages for the entire tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Feb 2005 13:52:57 -0000 On Tue, Feb 22, 2005, Maxim Sobolev wrote: > David Schultz wrote: > >das 2005-02-21 16:04:23 UTC > > > > FreeBSD src repository > > > > Modified files: > > lib/msun/i387 Makefile.inc > > Removed files: > > lib/msun/i387 e_atan2.S e_atan2f.S s_atan.S > > Log: > > Remove the i387 versions of atan(), atan2(), and atan2f(). > > They are slower than the MI routines on modern hardware, > > except for degenerate cases such as the Pentium 4. > > Well, it is worth probably nothing that 70-80% of machines running > FreeBSD today fall into that degenerate case. How much slower MI ws MD > on p4? Here are the timings for inputs -8, -7, ..., 7, from bde's test program (see PR 67469). The results for !Pentium4 are from bde, although I tested on a Pentium 3 as well and didn't transcribe the results. `asmatan' is the assembly routine, and `fdlatan' is the MI routine: to.486dx2-66 asmatan: nsec per call: 5518 5522 5527 5530 5474 5473 5674 5440 5433 5703 5625 5628 5554 5545 5554 5557 fdlatan: nsec per call: 8128 8126 8127 8132 7990 8352 8910 7667 7557 8723 8272 7929 7913 7926 7915 7921 to.cel366 asmatan: nsec per call: 444 444 444 444 444 444 444 424 424 444 444 444 444 444444 444 fdlatan: nsec per call: 370 370 370 370 370 382 397 323 323 397 382 370 370 370370 370 to.k6-233 asmatan: nsec per call: 827 827 827 827 827 827 857 838 833 853 823 823 823 823823 823 fdlatan: nsec per call: 771 771 771 771 772 801 834 712 707 826 793 763 763 763763 763 to.p3-800 asmatan: nsec per call: 209 209 205 209 209 209 209 200 200 209 209 209 209 209209 209 fdlatan: nsec per call: 175 175 175 176 176 181 179 150 149 178 174 172 171 171172 172 to.axpb-2223 asmatan: nsec per call: 87 87 87 87 87 87 87 78 78 87 87 87 87 87 87 87 fdlatan: nsec per call: 65 65 65 65 65 66 68 51 51 68 66 65 65 65 65 65 asmatan: nsec per call: 68 68 68 68 68 68 69 69 69 68 68 68 68 68 68 68 fdlatan: nsec per call: 71 66 66 66 66 66 65 51 51 65 66 66 66 66 66 66 The results show that the FPATAN instruction (as with most x87 ops) is pretty slow for anything more modern than a 486. The Pentium 4 was an exception in my original tests, but upon fixing a bug, I found that the software version of atan() is faster than the FPATAN instruction, too. ;-) The bug was that bde's test was telling the compiler to schedule instructions for an Athlon. Note that Intel has a continuing trend of making the x87 slower in favor of higher clock speeds and better SSE performance, so in the future, the x87 transcendental instructions are likely to only get worse relative to the software functions. By the way, here are some other results for the Pentium 4, all without SSE. SSE makes things a bit worse, probably because the x87 and SSE registers are shared, and the Pentium 4 imposes a large penalty for switching between the two sets. icc: asmatan: nsec per call: 77 77 77 77 79 77 77 78 78 77 77 77 77 77 77 77 fdlatan: nsec per call: 62 62 62 62 62 63 65 54 55 66 64 62 62 62 62 62 gcc -march=i486: asmatan: nsec per call: 69 69 69 69 69 69 69 70 70 70 72 69 69 69 69 69 fdlatan: nsec per call: 54 54 54 54 54 56 59 49 48 57 55 52 52 52 52 52 gcc -march=pentium4: asmatan: nsec per call: 68 68 68 68 68 68 69 69 69 68 68 68 68 68 68 68 fdlatan: nsec per call: 71 66 66 66 66 66 65 51 51 65 66 66 66 66 66 66 gcc -march=athlon-xp: asmatan: nsec per call: 68 68 68 68 68 68 68 69 69 68 68 68 68 68 68 68 fdlatan: nsec per call: 92 92 93 94 92 95 97 71 71 97 95 92 92 92 92 93 It's funny that gcc generates worse code for a Pentium 4 when told to schedule instructions for an Pentium 4 than when told to schedule for a 486, and in the latter case, it beats icc. I ran some general purpose tests with gcc 3.0 or 3.1 a while ago, and I seem to recall that telling gcc that I had a 486 worked best for my Pentium 3, and telling it I had a Pentium worked best for my Pentium 4.